Muutke küpsiste eelistusi

E-raamat: Building Dependable Distributed Systems [Wiley Online]

  • Wiley Online
  • Hind: 197,72 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
"This book covers the most essential techniques for designing and building dependable distributed systems. Instead of covering a broad range of research works for each dependability strategy, the book focuses only a selected few (usually the most seminalworks, the most practical approaches, or the first publication of each approach) are included and explained in depth, usually with a comprehensive set of examples. The goal is to dissect each technique thoroughly so that readers who are not familiar withdependable distributed computing can actually grasp the technique after studying the book.The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter fouroutlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithmand a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques"--

This book covers the most essential techniques for designing and building dependable distributed systems. Instead of covering a broad range of research works for each dependability strategy, the book focuses only a selected few (usually the most seminal works, the most practical approaches, or the first publication of each approach) are included and explained in depth, usually with a comprehensive set of examples. The goal is to dissect each technique thoroughly so that readers who are not familiar with dependable distributed computing can actually grasp the technique after studying the book.

The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter four outlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques.

List of Figures
xiii
List of Tables
xxi
Acknowledgments xxiii
Preface xxv
References xxviii
1 Introduction to Dependable Distributed Computing
1(14)
1.1 Basic Concepts and Terminologies
2(7)
1.1.1 System Models
2(1)
1.1.2 Threat Models
3(4)
1.1.3 Dependability Attributes and Evaluation Metrics
7(2)
1.2 Means to Achieve Dependability
9(6)
1.2.1 Fault Avoidance
9(1)
1.2.2 Fault Detection and Diagnosis
10(1)
1.2.3 Fault Removal
10(1)
1.2.4 Fault Tolerance
11(2)
References
13(2)
2 Logging and Checkpointing
15(42)
2.1 System Model
16(5)
2.1.1 Fault Model
17(1)
2.1.2 Process State and Global State
17(3)
2.1.3 Piecewise Deterministic Assumption
20(1)
2.1.4 Output Commit
20(1)
2.1.5 Stable Storage
21(1)
2.2 Checkpoint-Based Protocols
21(13)
2.2.1 Uncoordinated Checkpointing
21(2)
2.2.2 Tamir and Sequin Global Checkpointing Protocol
23(6)
2.2.3 Chandy and Lamport Distributed Snapshot Protocol
29(3)
2.2.4 Discussion
32(2)
2.3 Log Based Protocols
34(23)
2.3.1 Pessimistic Logging
36(9)
2.3.2 Sender-Based Message Logging
45(9)
References
54(3)
3 Recovery-Oriented Computing
57(40)
3.1 System Model
59(3)
3.2 Fault Detection and Localization
62(21)
3.2.1 Component Interactions Modeling and Anomaly Detection
66(4)
3.2.2 Path Shapes Modeling and Root Cause Analysis
70(4)
3.2.3 Inference-Based Fault Diagnosis
74(9)
3.3 Microreboot
83(4)
3.3.1 Microrebootable System Design Guideline
84(1)
3.3.2 Automatic Recovery with Microreboot
85(1)
3.3.3 Implications of the Microrebooting Technique
86(1)
3.4 Overcoming Operator Errors
87(10)
3.4.1 The Operator Undo Model
88(1)
3.4.2 The Operator Undo Framework
89(4)
References
93(4)
4 Data and Service Replication
97(44)
4.1 Service Replication
99(6)
4.1.1 Replication Styles
101(2)
4.1.2 Implementation of Service Replication
103(2)
4.2 Data Replication
105(6)
4.3 Optimistic Replication
111(20)
4.3.1 System Models
112(2)
4.3.2 Establish Ordering among Operations
114(3)
4.3.3 State Transfer Systems
117(4)
4.3.4 Operation Transfer System
121(5)
4.3.5 Update Commitment
126(5)
4.4 CAP Theorem
131(10)
4.4.1 2 out 3
134(1)
4.4.2 Implications of Enabling Partition Tolerance
135(3)
References
138(3)
5 Group Communication Systems
141(52)
5.1 System Model
143(3)
5.2 Sequencer Based Group Communication System
146(14)
5.2.1 Normal Operation
147(4)
5.2.2 Membership Change
151(8)
5.2.3 Proof of Correctness
159(1)
5.3 Sender Based Group Communication System
160(26)
5.3.1 Total Ordering Protocol
161(7)
5.3.2 Membership Change Protocol
168(9)
5.3.3 Recovery Protocol
177(7)
5.3.4 The Flow Control Mechanism
184(2)
5.4 Vector Clock Based Group Communication System
186(7)
References
191(2)
6 Consensus and the Paxos Algorithms
193(46)
6.1 The Consensus Problem
194(2)
6.2 The Paxos Algorithm
196(10)
6.2.1 Algorithm for Choosing a Value
196(2)
6.2.2 Algorithm for Learning a Value
198(1)
6.2.3 Proof of Correctness
198(2)
6.2.4 Reasoning of the Paxos Algorithm
200(6)
6.3 Multi-Paxos
206(4)
6.3.1 Checkpointing and Garbage Collection
207(1)
6.3.2 Leader Election and View Change
208(2)
6.4 Dynamic Paxos
210(11)
6.4.1 Dynamic Paxos
211(3)
6.4.2 Cheap Paxos
214(7)
6.5 Fast Paxos
221(8)
6.5.1 The Basic Steps
222(1)
6.5.2 Collision Recovery, Quorum Requirement, and Value Selection Rule
223(6)
6.6 Implementations of the Paxos Family Algorithms
229(10)
6.6.1 Hard Drive Failures
230(1)
6.6.2 Multiple Coordinators
230(1)
6.6.3 Membership Changes
231(4)
6.6.4 Limited Disk Space for Logging
235(1)
References
236(3)
7 Byzantine Fault Tolerance
239(50)
7.1 The Byzantine Generals Problem
240(15)
7.1.1 System Model
241(3)
7.1.2 The Oral Message Algorithms
244(10)
7.1.3 Proof of Correctness for the Oral Message Algorithms
254(1)
7.2 Practical Byzantine Fault Tolerance
255(16)
7.2.1 System Model
256(1)
7.2.2 Overview of the PBFT Algorithm
257(2)
7.2.3 Normal Operation of PBFT
259(2)
7.2.4 Garbage Collection
261(1)
7.2.5 View Change
262(3)
7.2.6 Proof of Correctness
265(1)
7.2.7 Optimizations
266(5)
7.3 Fast Byzantine Agreement
271(1)
7.4 Speculative Byzantine Fault Tolerance
271(18)
7.4.1 The Agreement Protocol
273(4)
7.4.2 The View Change Protocol
277(5)
7.4.3 The Checkpointing Protocol
282(1)
7.4.4 Proof of Correctness
282(2)
References
284(5)
8 Application-Aware Byzantine Fault Tolerance
289(44)
8.1 High Throughput BFT Systems: Networked File Systems
293(3)
8.2 Exploiting Deep Application Semantics: Web Services Coordination
296(20)
8.2.1 The Web Services Atomic Transactions Standard
297(3)
8.2.2 The Web Services Business Activity Standard
300(3)
8.2.3 Customized BFT Solutions for WS-AT and WS-BA Coordination
303(13)
8.3 Application Nondeterminism Control
316(17)
8.3.1 Classification of Application Nondeterminism
317(7)
8.3.2 Controlling VPRE Type of Nondeterminism
324(1)
8.3.3 Controlling NPRE Type of Nondeterminism
325(1)
8.3.4 Controlling VPOST Type of Nondeterminism
326(2)
8.3.5 Controlling NPOST Type of Nondeterminism
328(2)
References
330(3)
Index 333
Wenbing Zhao received his PhD in electrical and computer engineering from the University of California, Santa Barbara, in 2002. Currently, he is an Associate Professor in the Department of Electrical and Computer Engineering at Cleveland State University. Dr. Zhao has more than 80 academic publications to his credit, and three of his recent research papers in the area of dependable distributed computing have won best paper awards. Dr. Zhao also has a U.S. patent on consistent time service for fault-tolerant distributed systems.