DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Ali Asghar Pourhaji Kazem, Spring 2015 Fault Tolerance Basic Concepts • Being fault tolerant is strongly related to what are called dependable systems • Dependability implies the following: 1. 2. 3. 4. Availability: Readiness for usage Reliability: Continuity of service delivery Safety: Very low probability of catastrophes Maintainability: How easy can a failed system be repaired Ali Asghar Pourhaji Kazem, Spring 2015 Type of Errors • Transient: Transient faults occur once and then disappear • • Intermittent: An intermittent fault occurs, then vanishes of its own accord, then reappears, and so on • • If the operation is repeated, the fault goes away. A bird flying through the beam of a microwave transmitter may cause lost bits on some network A loose contact on a connector will often cause an intermittent fault Permanent: A permanent fault is one that continues to exist until the faulty component is replaced Ali Asghar Pourhaji Kazem, Spring 2015 Failure Models Figure 8-1. Different types of failures. Ali Asghar Pourhaji Kazem, Spring 2015 Failure Masking by Redundancy • If a system is to be fault tolerant, the best it can do is to try to hide the occurrence of failures from other processes • The key technique for masking faults is to use redundancy • Three kinds are possible: • • • information redundancy Time redundancy physical redundancy Ali Asghar Pourhaji Kazem, Spring 2015 Failure Masking by Redundancy • Information redundancy: With information redundancy, extra bits are added to allow recovery from garbled bits • Time redundancy: With time redundancy, an action is performed, and then if need be, it is performed again • Physical redundancy: With physical redundancy, extra equipment or processes are added to make it possible for the system as a whole to tolerate the loss or malfunctioning of some components Ali Asghar Pourhaji Kazem, Spring 2015 Failure Masking by Redundancy Figure 8-2. Triple modular redundancy. Ali Asghar Pourhaji Kazem, Spring 2015 PROCESS RESILIENCE (1) • The first topic we discuss is protection against process failures, which is achieved by replicating processes into groups • The key approach to tolerating a faulty process is to organize several identical processes into a group • The key property that all groups have is that when a message is sent to the group itself, all members of the group receive it • In this way, if one process in a group fails, hopefully some other process can take over for it Ali Asghar Pourhaji Kazem, Spring 2015 PROCESS RESILIENCE (2) • Basic issue • • Protect yourself against faulty processes by replicating and distributing computations in a group Flat groups • • Good for fault tolerance as information exchange immediately occurs with all group members; however, may impose more overhead as control is completely distributed (hard to implement). Hierarchical groups • All communication through a single coordinator ⇒ not really fault tolerant and scalable, but relatively easy to implement. Ali Asghar Pourhaji Kazem, Spring 2015 Flat Groups versus Hierarchical Groups Figure 8-3. (a) Communication in a flat group. (b) Communication in a simple hierarchical group. Ali Asghar Pourhaji Kazem, Spring 2015 Groups and failure masking (1) • K-fault tolerant group • • When a group can mask any k concurrent member failures (k is called degree of fault tolerance). How large does a k-fault tolerant group need to be? • • • Assume crash/performance failure semantics ⇒ a total of k +1 members are needed to survive k member failures. Assume arbitrary failure semantics, and group output defined by voting ⇒ a total of 2k +1 members are needed to survive k member failures. Assumption • All members are identical, and process all input in the same order ⇒ only then we sure that they do exactly the same thing. Ali Asghar Pourhaji Kazem, Spring 2015 Groups and failure masking (2) • Scenario • • Assuming arbitrary failure semantics, we need 3k +1 group members to survive the attacks of k faulty members. This is also known as Byzantine failures Essence • We are trying to reach a majority vote among the group of loyalists, in the presence of k traitors ⇒ need 2k +1 loyalists. Ali Asghar Pourhaji Kazem, Spring 2015 Agreement in Faulty Systems (1) (a) what they send to each other (b) what each one got from the other (c) what each one got in second step Ali Asghar Pourhaji Kazem, Spring 2015 Agreement in Faulty Systems (2) Figure 8-6. The same as Fig. 8-5, except now with two correct process and one faulty process. Ali Asghar Pourhaji Kazem, Spring 2015 Failure Detection • We detect failures through timeout mechanisms • Setting timeouts properly application dependent • You cannot distinguish process failures from network failures • We need to consider failure notification throughout the system: • • is very difficult and Gossiping (i.e., proactively disseminate a failure detection) On failure detection, pretend you failed as well Ali Asghar Pourhaji Kazem, Spring 2015 Reliable Communication • So far Concentrated on process resilience (by means of process groups). What about reliable communication channels? • Error detection • • • Framing of packets to allow for bit error detection Use of frame numbering to detect packet loss Error correction • • Add so much redundancy that corrupted packets can be automatically corrected Request retransmission of lost, or last N packets Ali Asghar Pourhaji Kazem, Spring 2015 RPC Semantics in the Presence of Failures Five different classes of failures that can occur in RPC systems: 1. The client is unable to locate the server. 2. The request message from the client to the server is lost. 3. The server crashes after receiving a request. 4. The reply message from the server to the client is lost. 5. The client crashes after sending a request. Ali Asghar Pourhaji Kazem, Spring 2015 RPC Semantics in the Presence of Failures • Client cannot locate server: Raise an exception or send a signal to client leading to loss in transparency. • Lost request messages: Start a timer when sending a request. If timer expires before a reply is received, send the request again. Server would need to detect duplicate requests. Ali Asghar Pourhaji Kazem, Spring 2015 Server Crashes (1) • Server crashes: Server crashes before or after executing the request is indistinguishable from the client side... • We need to decide on what we expect from the server At least once semantics • The server guarantees it will carry out an operation at least once, no matter what. At most once semantics • The server guarantees it will carry out an operation at most once Guarantee nothing semantics! Exactly once semantics. Ali Asghar Pourhaji Kazem, Spring 2015 Server Crashes (2) Figure 8-7. A server in client-server communication. (a) The normal case. (b) Crash after execution. (c) Crash before execution. Ali Asghar Pourhaji Kazem, Spring 2015 Server Crashes (3) Three events that can happen at the server: • Send the completion message (M), • Print the text (P), • Crash (C). Ali Asghar Pourhaji Kazem, Spring 2015 Server Crashes (4) These events can occur in six different orderings: 1. M →P →C: A crash occurs after sending the completion message and printing the text. 2. M →C (→P): A crash happens after sending the completion message, but before the text could be printed. 3. P →M →C: A crash occurs after sending the completion message and printing the text. 4. P→C(→M): The text printed, after which a crash occurs before the completion message could be sent. 5. C (→P →M): A crash happens before the server could do anything. 6. C (→M →P): A crash happens before the server could do anything. Ali Asghar Pourhaji Kazem, Spring 2015 Server Crashes (5) Figure 8-8. Different combinations of client and server strategies in the presence of server crashes. Ali Asghar Pourhaji Kazem, Spring 2015 RPC Semantics in the Presence of Failures • Lost Reply Messages • • Detecting lost replies can be hard, because it can also be that the server had crashed. You don’t know whether the server has carried out the operation Solution • Set a timer on client. If it expires without a reply, then send the request again. If requests are idempotent, then they can be repeated again without ill-effects. Ali Asghar Pourhaji Kazem, Spring 2015 RPC Semantics in the Presence of Failures • Client Crashes: • • Creates orphans. An orphan is an active computation on the server for which there is no client waiting. Dealing with orphans. Extermination. • • Client logs each request in a file before sending it. After a reboot the file is checked and the orphan is explicitly killed off. Expensive, cannot locate grand-orphans etc. Reincarnation. • • Divide time into sequentially numbered epochs. When a client reboots, it broadcasts a message declaring a new epoch. This allows servers to terminate orphan computations. Gentle reincarnation. • • A server tries to locate the owner of orphans before killing the computation. Expiration. • Each RPC is given a quantum of time to finish its job. If it cannot finish, then it asks for another quantum. After a crash, a client need only wait for a quantum to make sure all orphans are gone. Ali Asghar Pourhaji Kazem, Spring 2015 Reliable Multicasting (1) • • Basic model We have a multicast channel c with two (possibly overlapping) groups: • • The sender group SND(c) of processes that submit messages to channel c The receiver group RCV(c) of processes that can receive messages from channel c • Simple reliability: If process P ∈ RCV(c) at the time message m was submitted to c, and P does not leave RCV(c), m should be delivered to P • Atomic multicast: How can we ensure that a message m submitted to channel c is delivered to process P ∈ RCV(c) only if m is delivered to all members of RCV(c) Ali Asghar Pourhaji Kazem, Spring 2015 Reliable Multicasting (2) Observation •If we can stick to a local-area network, reliable multicasting is “easy” Principle Let the sender log messages submitted to channel c: •If P sends message m, m is stored in a history buffer •Each receiver acknowledges the receipt of m, or requests retransmission at P when noticing message lost •Sender P removes m from history buffer when everyone has acknowledged receipt Question •Why doesn’t this scale? Ali Asghar Pourhaji Kazem, Spring 2015 Basic Reliable-Multicasting Schemes Figure 8-9. A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. (a) Message transmission. (b) Reporting feedback. Ali Asghar Pourhaji Kazem, Spring 2015 Scalability in Reliable Multicasting Feedback suppression Basic idea •Let a process P suppress its own feedback when it notices another process Q is already asking for a retransmission Assumptions •All receivers listen to a common feedback channel to which feedback messages are submitted •Process P schedules its own feedback message randomly, and suppresses it when observing another feedback message Ali Asghar Pourhaji Kazem, Spring 2015 Nonhierarchical Feedback Control Figure 8-10. Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others. Ali Asghar Pourhaji Kazem, Spring 2015 Hierarchical Feedback Control (1) Basic solution •Construct a hierarchical feedback channel in which all submitted messages are sent only to the root. Intermediate nodes aggregate feedback messages before passing them on. Observation •Intermediate nodes can easily be used for retransmission purposes Ali Asghar Pourhaji Kazem, Spring 2015 Hierarchical Feedback Control (2) Figure 8-11. The essence of hierarchical reliable multicasting. Each local coordinator forwards the message to its children and later handles retransmission requests. Ali Asghar Pourhaji Kazem, Spring 2015 Virtual Synchrony (1) Essence We consider views V ⊆ RCV(c)∪SND(c) Principle Processes are added or deleted from a view V through view changes to V∗; a view change is to be executed locally by each P ∈ V ∩V∗ (1) For each consistent state, there is a unique view on which all its members agree. Note: implies that all nonfaulty processes see all view changes in the same order Ali Asghar Pourhaji Kazem, Spring 2015 Virtual Synchrony (2) (2) If message m is sent to V before a view change vc to V∗, then either all P ∈ V that excute vc receive m, or no processes P ∈ V that execute vc receive m. Note: all nonfaulty members in the same view get to see the same set of multicast messages. (3) A message sent to view V can be delivered only to processes in V, and is discarded by successive views Definition A reliable multicast algorithm satisfying (1)–(3) is virtually synchronous Ali Asghar Pourhaji Kazem, Spring 2015 Virtual Synchrony (1) Figure 8-12. The logical organization of a distributed system to distinguish between message receipt and message delivery. Ali Asghar Pourhaji Kazem, Spring 2015 Virtual Synchrony (2) Figure 8-13. The principle of virtual synchronous multicast. Ali Asghar Pourhaji Kazem, Spring 2015
© Copyright 2024