Lecture 12

Agreement in Distributed
Systems
CS 188
Distributed Systems
February 19, 2015
CS 188,Winter 2015
Lecture 12
Page 1
Introduction
• We frequently want to get a set of
nodes in a distributed system to agree
• Commitment protocols and mutual
exclusion are particular cases
• The approaches we discussed for those
work in limited situations
• In general, when can we reach
agreement in a distributed system?
CS 188,Winter 2015
Lecture 12
Page 2
Basics of Agreement Protocols
• What is agreement?
• What are the necessary conditions for
agreement?
CS 188,Winter 2015
Lecture 12
Page 3
What Do We Mean By
Agreement?
• In simplest case, can n processors
agree that a variable takes on value 0
or 1?
– Only non-faulty processors need
agree
• More complex agreements can be built
from this simple agreement
CS 188,Winter 2015
Lecture 12
Page 4
Conditions for Agreement
Protocols
• Consistency
– All participants agree on same value
and decisions are final
• Validity
– Participants agree on a value at least
one of them wanted
• Termination/Progress
– All participants choose a value in a
finite number of steps
CS 188,Winter 2015
Lecture 12
Page 5
Challenges to Agreement
• Delays
– In message delivery
– In nodes responding to messages
• Failures
– And recovery from failures
• Lies by participants
– Or innocent errors that have similar
effects
CS 188,Winter 2015
Lecture 12
Page 6
Failures and Agreement
• Failures make agreement difficult
– Failed nodes don’t participate
– Failed nodes sometimes recover at
inconvenient times
– At worst, failed nodes participate in
harmful ways
• Real failures are worse than fail-stop
CS 188,Winter 2015
Lecture 12
Page 7
Types of Failures
• Fail-stop
– A nice, clean failure
– Processor stops executing anything
• Realistic failures
– Partitionings
– Arbitrary delays
• Adversarial failures
– Arbitrary bad things happen
CS 188,Winter 2015
Lecture 12
Page 8
Election Algorithms
• If you get everyone to agree a particular
node is in charge,
• Future consensus is easy, since he makes
the decisions
• How do you determine who’s in charge?
– Statically
– Dynamically
CS 188,Winter 2015
Lecture 12
Page 9
Static Leader Selection Methods
• Predefine one process/node as the
leader
• Simple
– Everyone always knows who’s the
leader
• Not very resilient
– If the leader fails, then what?
CS 188,Winter 2015
Lecture 12
Page 10
Dynamic Leader Selection
Methods
• Choose a new leader dynamically
whenever necessary
• More complicated
• But failure of a leader is easy to handle
– Just elect a new one
• Election doesn’t imply voting
– Not necessarily majority-based
CS 188,Winter 2015
Lecture 12
Page 11
Election Algorithms vs.
Mutual Exclusion Algorithms
• Most mutual exclusion algorithms don’t
care much about failures
• Election algorithms are designed to handle
failures
• Also, mutual exclusion algorithms only
need a winner
• Election algorithms need everyone to know
who won
CS 188,Winter 2015
Lecture 12
Page 12
A Typical Use of
Election Algorithms
• A group of processes wants to
periodically take a distributed snapshot
• They don’t want multiple simultaneous
snapshots
• So they want one leader to order them
to take the snapshot
CS 188,Winter 2015
Lecture 12
Page 13
Problems in Election Algorithms
• Some of the nodes may have failed
before the algorithm starts
• Some of the nodes may fail during the
algorithm
• Some nodes may recover from failure
– Possible at inconvenient times
• What about partitions?
CS 188,Winter 2015
Lecture 12
Page 14
Election Algorithms and
the Real Work
• The election algorithm is usually overhead
• There’s a real computation you want to
perform
• The election algorithm chooses someone to
lead it
• Having two leaders while real computation
is going on is bad
CS 188,Winter 2015
Lecture 12
Page 15
The Bully Algorithm
• The biggest kid on the block gets to be
the leader
• But what if the biggest kid on the block
is taking his piano lesson?
• The next biggest kid gets to be leader
– Until the piano lesson is over . . .
CS 188,Winter 2015
Lecture 12
Page 16
The
Spike’s
piano
Mom
lesson
hasn’t
ends
let
him
out yet
Electing a Bully
The kids come out to play
I’mhere,
the leader,
I’m
where
I’m the
I’m leader,
here,Hey,
Butch!
Hey,
and
we’re
playing
Peewee!
Spike!
are Butch!
you
sissies?let’s
whoplay
elsetag!
is?
Spike!
Cuthbert
Cuthbert!
baseball!
CS 188,Winter 2015
Lecture 12
Page 17
Assumptions of the Bully
Algorithm
• A static set of possible participants
– With an agreed-upon order
• All messages are delivered with Tm seconds
• All responses are sent within Tp seconds of
delivery
• These last two imply synchronous behavior
CS 188,Winter 2015
Lecture 12
Page 18
The Basic Idea Behind
the Bully Algorithm
• Possible leaders try to take over
• If they detect a better leader, they agree
to its leadership
• Keep track of state information about
whether you are electing a leader
• Only do real work when you agree on a
leader
CS 188,Winter 2015
Lecture 12
Page 19
The Bully Algorithm and
Timeouts
• Call out the biggest kid’s name
– If he doesn’t answer soon enough,
call out the next biggest kid’s name
– Until you hear an answer
– Or the caller is the biggest kid
– Then take over, by telling everyone
else you’re the leader
CS 188,Winter 2015
Lecture 12
Page 20
The Bully Algorithm At Work
• One node is currently the coordinator
• It expects a certain set of nodes to be up and
participating
• The coordinator asks all other nodes
• If an expected node doesn’t answer, start an
election
– Also if it answers in the negative
• If an unexpected node answers, start an
election
CS 188,Winter 2015
Lecture 12
Page 21
The Practicality of the
Bully Algorithm
• The bully algorithm works reasonably
well if the timeouts are effective
– A timeout occurring really means the
site in question is down
• And there are no partitions at all
– If there are, what happens?
CS 188,Winter 2015
Lecture 12
Page 22
The Invitation Algorithm
• More practical than bully algorithm
– Doesn’t depend on timeouts
• But its results are not as definitive
• An asynchronous algorithm
CS 188,Winter 2015
Lecture 12
Page 23
The Basic Idea Behind the
Invitation Algorithm
• A current coordinator tries to get all
other nodes to agree to his leadership
• If more than one coordinator around,
get together and merge groups
• Use timeouts only to allow progress,
not to make definitive decisions
• No set priorities for who will be
coordinator
CS 188,Winter 2015
Lecture 12
Page 24
The Invitation Algorithm and
Group Numbers
• The invitation algorithm recruits a
group of nodes to work together
– More than one group can exist
simultaneously
• Group numbers identify the group
• Why not identify with coordinator ID?
– Because one node can serially
coordinate many groups
CS 188,Winter 2015
Lecture 12
Page 25
The Basic Operation of the
Invitation Algorithm
• Coordinators in a normal state
periodically check all other nodes
• If any other node is a coordinator, try
to merge the groups
• If timeouts occur, don’t worry about it
– Also don’t worry if a response to
check comes from this or earlier
request
CS 188,Winter 2015
Lecture 12
Page 26
Merging in the Invitation
Algorithm
• Merging always requires forming new
group
– May have same coordinator, but
different group number
• Coordinator who initiates merge asks
all other known coordinators to merge
– They ask their group members
– Original group members also asked
CS 188,Winter 2015
Lecture 12
Page 27
A Simplified Example
UP ={1,2,3,4}
1
3
AreYouCoordinator?
Yes
Accept
Invite Ready
1
13
Node 1 checks
for other
coordinator
Invite on
behalf of node
1
4
2
13
1
So node 1 finds another coordinator
Node 1 asks theNode
other 1coordinator
forms a new
andgroup
his old node to join his group
If all members of UP{} respond, we’re fine
CS 188,Winter 2015
Lecture 12
Page 28
The Reorganization State
• Nodes enter the reorganization state
after getting their answer
• What’s the point of this state?
– Why not just start up the group?
– After all, we all know who’s going
to be a member
• Or do we?
CS 188,Winter 2015
Lecture 12
Page 29
Why We Need Another Round of
Messages
1
3
Invitation
1
1
Invitation
Invitation
4
2
1
1
Who
does no
1 think
will crashes?
join
thealso
group,
Assuming
And what
if
someone
timeouts,
4 will
joinat this point?
2Presumably
and2 3needs not
And
to know
accepting
that the invitation?
CS 188,Winter 2015
Lecture 12
Page 30
Timeouts in the Merge
• Don’t worry too much about them
• Some nodes respond before the
timeout
– Some don’t
• If you don’t catch them this time, you
might the next
CS 188,Winter 2015
Lecture 12
Page 31
Straggler Messages
• This algorithm is asynchronous
– So messages may come in late
• What do we do when messages arrive
late?
• Mostly, reject them
• How do we tell?
– Messages contain group number
CS 188,Winter 2015
Lecture 12
Page 32
Multiple Simultaneous Groups
• The invitation algorithm allows
multiple simultaneous groups to exist
– Each with a proper coordinator
• Is this a good thing?
– No, but what are the alternatives?
• No node ever belongs to more than one
group, at least
CS 188,Winter 2015
Lecture 12
Page 33
Paxos
• A family of algorithms that allow a
distributed system to reach agreement
• In the face of delays and failures
• Can’t perfectly guarantee progress
– But makes progress in realistic conditions
• Does guarantee consistency
• Usually defined to reach consensus on some
value v
CS 188,Winter 2015
Lecture 12
Page 34
Paxos Assumptions
• Processors are of variable speed and may
fail
– Might recover after failure
– But they don’t lie
• Any processor can send a message to any
other processor
• Messages can be lost, arbitrarily delayed,
reordered, or duplicated
– But never corrupted
CS 188,Winter 2015
Lecture 12
Page 35
Paxos Processor Roles
• Client
– Issues a request, waits for a response
• Acceptor/voter
– Remembers things for the protocol
• Proposer (simpler if there’s only one)
– Assists client in getting a response
• Learner
– Actually executes a request
• Leader
– One of the proposers that leads the process
• One processor can play several roles
– Usually, all processes are acceptors, proposers, and
learners
CS 188,Winter 2015
Lecture 12
Page 36
Paxos Quorums
• Collections of acceptors that make decisions
– Several different quorums in system
• Messages are sent to quorums, not single
acceptors
– Messages only effective if all quorum members
receive it
– Similarly, all acceptors in a quorum must send
a message for to be effective
• If any member of the quorum survives, its
decisions survive
CS 188,Winter 2015
Lecture 12
Page 37
Quorum Membership
• All quorums must contain a majority of
all acceptors in the system
• Any two quorums must share at least
one acceptor
• E.g., if there are four acceptors
{1,2,3,4}, quorums might be:
– {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4}
CS 188,Winter 2015
Lecture 12
Page 38
Paxos Rounds
• Paxos proceeds in rounds
• In response to a client request
• If the round reaches agreement, the
client gets a response
• If not, you start another round
• Continue till a round reaches
agreement
CS 188,Winter 2015
Lecture 12
Page 39
A Simple Paxos Round
Vres is a result
chosen by P, if
no promise 4. accept(N,Vres)
had a value 2. prepare(N)
A1
L1
1. request
C
P
If an acceptor ever
promised on this
item before, it
returns the
generation and
value from that run
of Paxos, not null
CS 188,Winter 2015
A2
A3
5. accepted(N,Vmax)
3. promise(N,null)
3. promise(N,null)
3. promise(N, null)
6. response
L2
N is a bigger
number than P has
ever used or seen
before
Lecture 12
Page 40
The Point of Different Paxos
Roles The learners ensure
The client
wants to get
something done
redundant memory of the
result of a decision
Remember!
A
1
L1
C
P
A
One machine
can play
multiple roles
The proposer
coordinates
protocol
activities
CS 188,Winter 2015
2
A3
The acceptors ensure
proper concurrent
behavior and handle
proposer failures
L2
Lecture 12
Page 41
Paxos Error Handling
• Some cases simple, some complex
• A simple case:
– One of the acceptors fails
– If there’s still a quorum, no problem
– Go ahead without him
• Another simple case:
– One of the learners failed
– If any learners are left, they’ll provide the
right response to the client
CS 188,Winter 2015
Lecture 12
Page 42
More Complex Error Cases
• Things like failure of proposer in
middle of a round
• Paxos chooses a new leader and uses
him from this point
• What if old leader comes back?
• Even more complex, but it works out
CS 188,Winter 2015
Lecture 12
Page 43
Paxos and Overheads
• Generally quite expensive
– In messages and thus delays
• Many optimizations possible
– Some don’t alter the protocol
characteristics
– Some trade off handling some error
conditions for better performance
CS 188,Winter 2015
Lecture 12
Page 44
Byzantine Agreement
• Life can be a lot worse than merely
being unable to rely on timeouts
• What if one of the nodes we’re
working with is lying?
• How can we reach agreement if we
can’t trust all the participants?
CS 188,Winter 2015
Lecture 12
Page 45
The Purpose of Byzantine
Agreement
• Well, why would one of our distributed
system components lie?
• It probably wouldn’t
• But it might contain a bug
• If it contains the worst possible bug,
what can it do?
– Essentially, inadvertently lie
CS 188,Winter 2015
Lecture 12
Page 46
The Realism of Byzantine
Agreement
•
•
•
•
It isn’t realistic
It doesn’t really happen
No one really uses it
But it demonstrates a limit on how
badly things can go while still allowing
agreement
CS 188,Winter 2015
Lecture 12
Page 47
Why Is It Called Byzantine?
• After the fall of Rome itself, the
empire lived on in the east
– Called Byzantium
• Byzantium survived for around 1000
years
• The Byzantines were famous for their
treachery and double-dealing
CS 188,Winter 2015
Lecture 12
Page 48
The Byzantine General Problem
• Several Byzantine generals each command
their own army
• They are far apart and communicate with
messengers
• The emperor wants to attack the Turks
• If all generals attack, they’ll win
– Even if a majority attack, they’ll win
– Retreating is OK, if everyone does it
• But the Turks may have bribed some
generals
CS 188,Winter 2015
Lecture 12
Page 49
The Complete Problem
Statement
• Messages are point-to-point
• Messages are reliably delivered, with a
predictable timeout
– Failure to receive message in time
means sender is a traitor
• Traitors can send any messages they
please
– But cannot forge their identities
CS 188,Winter 2015
Lecture 12
Page 50
How Many Traitors Is Too
Many?
• Can all the loyal generals reach
agreement on whether to attack or
retreat?
• Or can the traitors prevent them from
reaching any agreement?
• How many generals must the Turks
bribe before no agreement is possible?
CS 188,Winter 2015
Lecture 12
Page 51
The Answer
• If the Turks bribe 1/3 of the generals,
the remaining 2/3’s cannot reach
agreement
• How can that be?
• Why not just a majority?
• Easiest to consider in the case of a
commander
CS 188,Winter 2015
Lecture 12
Page 52
The 3-General Byzantine
Problem
Commander
Attack
Retreat
Attack
But what if the commander is a traitor?
What if they’re all loyal?
Everyone
attacks
and one
the Turk
is vanquished
One general
attacks,
retreats,
the traitor
pockets the bribe, and the Turks win
CS 188,Winter 2015
Lecture 12
Page 53
Can’t the Loyal Generals Check
Their Orders?
1
Attack
Commander
Retreat
Retreat
2
3
Attack
Generals 2 and 3 check their orders
They figure out 1 is a traitor and come to their own agreement
CS 188,Winter 2015
Lecture 12
Page 54
But What if the Commander
Wasn’t the Traitor?
1
Attack
Commander
Attack
Retreat
2
3
Attack
3 is the traitor, this time
Generals
2 and
check
their orders
But 1 isn’t the traitor,
3 is3the
traitor
They
figure out2 1toisretreat,
a traitor
and
come to their
own agreement
He convinces
1 is
slaughtered
attacking,
and 3
pockets the bribe
CS 188,Winter 2015
Lecture 12
Page 55
Can General 2 Tell Which
Scenario Is Occurring?
When 1 was the traitor, 2 saw:
When 3 was the traitor, 2 saw:
1
1
Attack
Attack
Retreat
2
3
Retreat
2
2 can’t tell the difference, so he can’t decide
whether to attack or retreat
CS 188,Winter 2015
3
Lecture 12
Page 56
What If There Were 4 Generals?
1
Attack
2
Commander
Attack
3
Retreat
4
What if the commander (1) is the traitor?
If he doesn’t send some messages, he’ll be seen as the traitor
But what can he send?
Lecture 12
CS 188,Winter 2015
Page 57
Can the Three Loyal Generals
Reach Agreement?
1
Attack
2
Commander
Attack
3
Retreat
4
They can exchange all the messages and let the majority rule
Since there are only two messages, the commander must
have sent the same message to two nodes
If the commander is loyal and someone else is lying, the
majority represents the loyal commander’s will
CS 188,Winter 2015
Lecture 12
Page 58
But What if There Were Five
Generals?
1 Commander
Attack
Attack
Retreat
Retreat
2
3
4
5
Pre-arrange a tie-breaker
E.g., always retreat on ties
All the loyal generals then retreat
And the traitor must explain his failure to the Turks
CS 188,Winter 2015
Lecture 12
Page 59
What If You Don’t Want a
Commander?
• What if you want everyone to vote?
• And accept the majority?
– With the guarantee that all loyal
nodes abide by the majority?
• Serially treat each node as the
commander
– Reach agreement on his vote
– Then move on to the next node
CS 188,Winter 2015
Lecture 12
Page 60
The Trick Behind Byzantine
Agreement
• Everyone must know what everyone
else thinks about everything else
• Not just what I think the commander
said, but what everyone else claims the
commander said
• Resulting algorithms are tricky and
expensive
– But it could be (and will be) worse
CS 188,Winter 2015
Lecture 12
Page 61
Authenticated Byzantine
Agreement
• What if the messages are signed in an
unforgeable way?
• Then dishonest generals can’t lie about
what honest general told them
• In this case, honest generals reach
agreement regardless of how many are
dishonest
CS 188,Winter 2015
Lecture 12
Page 62