Chapter 8 Fault Tolerance

DISTRIBUTED SYSTEMS
Principles and Paradigms
Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
Chapter 8
Fault Tolerance
Ali Asghar Pourhaji Kazem, Spring 2015
Fault Tolerance Basic Concepts
•
Being fault tolerant is strongly related to
what are called dependable systems
•
Dependability implies the following:
1.
2.
3.
4.
Availability: Readiness for usage
Reliability: Continuity of service delivery
Safety: Very low probability of catastrophes
Maintainability: How easy can a failed
system be repaired
Ali Asghar Pourhaji Kazem, Spring 2015
Type of Errors
•
Transient: Transient faults occur once and then
disappear
•
•
Intermittent: An intermittent fault occurs, then
vanishes of its own accord, then reappears, and
so on
•
•
If the operation is repeated, the fault goes away. A bird flying
through the beam of a microwave transmitter may cause lost
bits on some network
A loose contact on a connector will often cause an intermittent
fault
Permanent: A permanent fault is one that
continues to exist until the faulty component is
replaced
Ali Asghar Pourhaji Kazem, Spring 2015
Failure Models
Figure 8-1. Different types of failures.
Ali Asghar Pourhaji Kazem, Spring 2015
Failure Masking by Redundancy
•
If a system is to be fault tolerant, the best it can
do is to try to hide the occurrence of failures
from other processes
•
The key technique for masking faults is to use
redundancy
•
Three kinds are possible:
•
•
•
information redundancy
Time redundancy
physical redundancy
Ali Asghar Pourhaji Kazem, Spring 2015
Failure Masking by Redundancy
•
Information
redundancy:
With
information
redundancy, extra bits are added to allow recovery from
garbled bits
•
Time redundancy: With time redundancy, an action is
performed, and then if need be, it is performed again
•
Physical redundancy: With physical redundancy, extra
equipment or processes are added to make it possible
for the system as a whole to tolerate the loss or
malfunctioning of some components
Ali Asghar Pourhaji Kazem, Spring 2015
Failure Masking by Redundancy
Figure 8-2. Triple modular redundancy.
Ali Asghar Pourhaji Kazem, Spring 2015
PROCESS RESILIENCE (1)
•
The first topic we discuss is protection against process
failures, which is achieved by replicating processes into
groups
•
The key approach to tolerating a faulty process is to
organize several identical processes into a group
•
The key property that all groups have is that when a
message is sent to the group itself, all members of the
group receive it
•
In this way, if one process in a group fails, hopefully some
other process can take over for it
Ali Asghar Pourhaji Kazem, Spring 2015
PROCESS RESILIENCE (2)
•
Basic issue
•
•
Protect yourself against faulty processes by replicating and
distributing computations in a group
Flat groups
•
•
Good for fault tolerance as information exchange immediately
occurs with all group members; however, may impose more
overhead as control is completely distributed (hard to
implement).
Hierarchical groups
•
All communication through a single coordinator ⇒ not really fault
tolerant and scalable, but relatively easy to implement.
Ali Asghar Pourhaji Kazem, Spring 2015
Flat Groups versus Hierarchical Groups
Figure 8-3. (a) Communication in a flat group.
(b) Communication in a simple hierarchical group.
Ali Asghar Pourhaji Kazem, Spring 2015
Groups and failure masking (1)
•
K-fault tolerant group
•
•
When a group can mask any k concurrent member failures (k is
called degree of fault tolerance).
How large does a k-fault tolerant group need to be?
•
•
•
Assume crash/performance failure semantics ⇒ a total of k +1
members are needed to survive k member failures.
Assume arbitrary failure semantics, and group output defined by
voting ⇒ a total of 2k +1 members are needed to survive k
member failures.
Assumption
•
All members are identical, and process all input in the same order
⇒ only then we sure that they do exactly the same thing.
Ali Asghar Pourhaji Kazem, Spring 2015
Groups and failure masking (2)
•
Scenario
•
•
Assuming arbitrary failure semantics, we need 3k +1 group
members to survive the attacks of k faulty members. This is also
known as Byzantine failures
Essence
•
We are trying to reach a majority vote among the group of
loyalists, in the presence of k traitors ⇒ need 2k +1 loyalists.
Ali Asghar Pourhaji Kazem, Spring 2015
Agreement in Faulty Systems (1)
(a) what they send to each other
(b) what each one got from the other
(c) what each one got in second step
Ali Asghar Pourhaji Kazem, Spring 2015
Agreement in Faulty Systems (2)
Figure 8-6. The same as Fig. 8-5, except now with two correct
process and one faulty process.
Ali Asghar Pourhaji Kazem, Spring 2015
Failure Detection
•
We detect failures through timeout mechanisms
•
Setting timeouts properly
application dependent
•
You cannot distinguish process failures from network
failures
•
We need to consider failure notification throughout the
system:
•
•
is
very
difficult
and
Gossiping (i.e., proactively disseminate a failure detection)
On failure detection, pretend you failed as well
Ali Asghar Pourhaji Kazem, Spring 2015
Reliable Communication
•
So far Concentrated on process resilience (by
means of process groups). What about reliable
communication channels?
•
Error detection
•
•
•
Framing of packets to allow for bit error detection
Use of frame numbering to detect packet loss
Error correction
•
•
Add so much redundancy that corrupted packets can
be automatically corrected
Request retransmission of lost, or last N packets
Ali Asghar Pourhaji Kazem, Spring 2015
RPC Semantics in the
Presence of Failures
Five different classes of failures that can occur in
RPC systems:
1. The client is unable to locate the server.
2. The request message from the client to the
server is lost.
3. The server crashes after receiving a request.
4. The reply message from the server to the client
is lost.
5. The client crashes after sending a request.
Ali Asghar Pourhaji Kazem, Spring 2015
RPC Semantics in the
Presence of Failures
•
Client cannot locate server: Raise an
exception or send a signal to client leading to
loss in transparency.
•
Lost request messages: Start a timer when
sending a request. If timer expires before a reply
is received, send the request again. Server
would need to detect duplicate requests.
Ali Asghar Pourhaji Kazem, Spring 2015
Server Crashes (1)
•
Server crashes: Server crashes before or after
executing the request is indistinguishable from the client
side...
•
We need to decide on what we expect from the server
 At least once semantics
•
The server guarantees it will carry out an operation at least once, no matter
what.
 At most once semantics
•
The server guarantees it will carry out an operation at most once
 Guarantee nothing semantics!
 Exactly once semantics.
Ali Asghar Pourhaji Kazem, Spring 2015
Server Crashes (2)
Figure 8-7. A server in client-server
communication.
(a) The normal case.
(b) Crash after execution.
(c) Crash before execution.
Ali Asghar Pourhaji Kazem, Spring 2015
Server Crashes (3)
Three events that can happen at the server:
• Send the completion message (M),
• Print the text (P),
• Crash (C).
Ali Asghar Pourhaji Kazem, Spring 2015
Server Crashes (4)
These events can occur in six different orderings:
1. M →P →C: A crash occurs after sending the completion
message and printing the text.
2. M →C (→P): A crash happens after sending the
completion message, but before the text could be
printed.
3. P →M →C: A crash occurs after sending the completion
message and printing the text.
4. P→C(→M): The text printed, after which a crash occurs
before the completion message could be sent.
5. C (→P →M): A crash happens before the server could
do anything.
6. C (→M →P): A crash happens before the server could
do anything.
Ali Asghar Pourhaji Kazem, Spring 2015
Server Crashes (5)
Figure 8-8. Different combinations of client and server
strategies in the presence of server crashes.
Ali Asghar Pourhaji Kazem, Spring 2015
RPC Semantics in the
Presence of Failures
•
Lost Reply Messages
•
•
Detecting lost replies can be hard, because it can also be that the server
had crashed. You don’t know whether the server has carried out the
operation
Solution
•
Set a timer on client. If it expires without a reply, then send the request
again. If requests are idempotent, then they can be repeated again
without ill-effects.
Ali Asghar Pourhaji Kazem, Spring 2015
RPC Semantics in the
Presence of Failures
•
Client Crashes:
•
•
Creates orphans. An orphan is an active computation on
the server for which there is no client waiting. Dealing
with orphans.
Extermination.
•
•
Client logs each request in a file before sending it. After a reboot the file is checked and
the orphan is explicitly killed off. Expensive, cannot locate grand-orphans etc.
Reincarnation.
•
•
Divide time into sequentially numbered epochs. When a client reboots, it broadcasts a
message declaring a new epoch. This allows servers to terminate orphan computations.
Gentle reincarnation.
•
•
A server tries to locate the owner of orphans before killing the computation.
Expiration.
•
Each RPC is given a quantum of time to finish its job. If it cannot finish, then it asks for
another quantum. After a crash, a client need only wait for a quantum to make sure all
orphans are gone.
Ali Asghar Pourhaji Kazem, Spring 2015
Reliable Multicasting (1)
•
•
Basic model
We have a multicast channel c with two (possibly overlapping)
groups:
•
•
The sender group SND(c) of processes that submit messages to channel c
The receiver group RCV(c) of processes that can receive messages from channel c
•
Simple reliability: If process P ∈ RCV(c) at the time
message m was submitted to c, and P does not leave
RCV(c), m should be delivered to P
•
Atomic multicast: How can we ensure that a message
m submitted to channel c is delivered to process P ∈
RCV(c) only if m is delivered to all members of RCV(c)
Ali Asghar Pourhaji Kazem, Spring 2015
Reliable Multicasting (2)
Observation
•If we can stick to a local-area network, reliable multicasting is “easy”
Principle
Let the sender log messages submitted to channel c:
•If P sends message m, m is stored in a history buffer
•Each receiver acknowledges the receipt of m, or requests retransmission
at P when noticing message lost
•Sender P removes m from history buffer when everyone has
acknowledged receipt
Question
•Why doesn’t this scale?
Ali Asghar Pourhaji Kazem, Spring 2015
Basic Reliable-Multicasting Schemes
Figure 8-9. A simple solution to reliable multicasting when all
receivers are known and are assumed not to fail.
(a) Message transmission. (b) Reporting feedback.
Ali Asghar Pourhaji Kazem, Spring 2015
Scalability in Reliable Multicasting Feedback suppression
Basic idea
•Let a process P suppress its own feedback when it notices
another process Q is already asking for a retransmission
Assumptions
•All receivers listen to a common feedback channel to which
feedback messages are submitted
•Process P schedules its own feedback message randomly,
and suppresses it when observing another feedback
message
Ali Asghar Pourhaji Kazem, Spring 2015
Nonhierarchical Feedback Control
Figure 8-10. Several receivers have scheduled a request for
retransmission, but the first retransmission request
leads to the suppression of others.
Ali Asghar Pourhaji Kazem, Spring 2015
Hierarchical Feedback Control (1)
Basic solution
•Construct a hierarchical feedback channel in which all
submitted messages are sent only to the root. Intermediate
nodes aggregate feedback messages before passing them
on.
Observation
•Intermediate nodes can easily be used for retransmission
purposes
Ali Asghar Pourhaji Kazem, Spring 2015
Hierarchical Feedback Control (2)
Figure 8-11. The essence of hierarchical reliable multicasting.
Each local coordinator forwards the message to its children and
later handles retransmission requests.
Ali Asghar Pourhaji Kazem, Spring 2015
Virtual Synchrony (1)
Essence
We consider views V ⊆ RCV(c)∪SND(c) Principle
Processes are added or deleted from a view V through view
changes to V∗; a view change is to be executed locally by
each P ∈ V ∩V∗
(1) For each consistent state, there is a unique view on which
all its members agree. Note: implies that all nonfaulty
processes see all view changes in the same order
Ali Asghar Pourhaji Kazem, Spring 2015
Virtual Synchrony (2)
(2) If message m is sent to V before a view change vc to V∗,
then either all P ∈ V that excute vc receive m, or no
processes P ∈ V that execute vc receive m.
Note: all nonfaulty members in the same view get to see the
same set of multicast messages.
(3) A message sent to view V can be delivered only to
processes in V, and is discarded by successive views
Definition
A reliable multicast algorithm satisfying (1)–(3) is virtually
synchronous
Ali Asghar Pourhaji Kazem, Spring 2015
Virtual Synchrony (1)
Figure 8-12. The logical organization of a distributed system to
distinguish between message receipt and message delivery.
Ali Asghar Pourhaji Kazem, Spring 2015
Virtual Synchrony (2)
Figure 8-13. The principle of virtual synchronous multicast.
Ali Asghar Pourhaji Kazem, Spring 2015