Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Building Dependable Distributed Systems [Wiley Online]

Wenbing Zhao

Formaat: 368 pages
Sari: Performability Engineering Series
Ilmumisaeg: 15-Apr-2014
Kirjastus: Wiley-Scrivener
ISBN-10: 1118912748
ISBN-13: 9781118912744

Teised raamatud teemal:

Wiley Online
Hind: 197,72 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 368 pages
Sari: Performability Engineering Series
Ilmumisaeg: 15-Apr-2014
Kirjastus: Wiley-Scrivener
ISBN-10: 1118912748
ISBN-13: 9781118912744

Teised raamatud teemal:

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118912744

"This book covers the most essential techniques for designing and building dependable distributed systems. Instead of covering a broad range of research works for each dependability strategy, the book focuses only a selected few (usually the most seminalworks, the most practical approaches, or the first publication of each approach) are included and explained in depth, usually with a comprehensive set of examples. The goal is to dissect each technique thoroughly so that readers who are not familiar withdependable distributed computing can actually grasp the technique after studying the book.The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter fouroutlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithmand a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques"--

This book covers the most essential techniques for designing and building dependable distributed systems. Instead of covering a broad range of research works for each dependability strategy, the book focuses only a selected few (usually the most seminal works, the most practical approaches, or the first publication of each approach) are included and explained in depth, usually with a comprehensive set of examples. The goal is to dissect each technique thoroughly so that readers who are not familiar with dependable distributed computing can actually grasp the technique after studying the book.

The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter four outlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques.

List of Figures

xiii

List of Tables

xxi

Acknowledgments

xxiii

Preface

xxv

References

xxviii

1 Introduction to Dependable Distributed Computing

(14)

1.1 Basic Concepts and Terminologies

(7)

1.1.1 System Models

(1)

1.1.2 Threat Models

(4)

1.1.3 Dependability Attributes and Evaluation Metrics

(2)

1.2 Means to Achieve Dependability

(6)

1.2.1 Fault Avoidance

(1)

1.2.2 Fault Detection and Diagnosis

(1)

1.2.3 Fault Removal

(1)

1.2.4 Fault Tolerance

(2)

References

(2)

2 Logging and Checkpointing

(42)

2.1 System Model

(5)

2.1.1 Fault Model

(1)

2.1.2 Process State and Global State

(3)

2.1.3 Piecewise Deterministic Assumption

(1)

2.1.4 Output Commit

(1)

2.1.5 Stable Storage

(1)

2.2 Checkpoint-Based Protocols

(13)

2.2.1 Uncoordinated Checkpointing

(2)

2.2.2 Tamir and Sequin Global Checkpointing Protocol

(6)

2.2.3 Chandy and Lamport Distributed Snapshot Protocol

(3)

2.2.4 Discussion

(2)

2.3 Log Based Protocols

(23)

2.3.1 Pessimistic Logging

(9)

2.3.2 Sender-Based Message Logging

(9)

References

(3)

3 Recovery-Oriented Computing

(40)

3.1 System Model

(3)

3.2 Fault Detection and Localization

(21)

3.2.1 Component Interactions Modeling and Anomaly Detection

(4)

3.2.2 Path Shapes Modeling and Root Cause Analysis

(4)

3.2.3 Inference-Based Fault Diagnosis

(9)

3.3 Microreboot

(4)

3.3.1 Microrebootable System Design Guideline

(1)

3.3.2 Automatic Recovery with Microreboot

(1)

3.3.3 Implications of the Microrebooting Technique

(1)

3.4 Overcoming Operator Errors

(10)

3.4.1 The Operator Undo Model

(1)

3.4.2 The Operator Undo Framework

(4)

References

(4)

4 Data and Service Replication

(44)

4.1 Service Replication

(6)

4.1.1 Replication Styles

101

(2)

4.1.2 Implementation of Service Replication

103

(2)

4.2 Data Replication

105

(6)

4.3 Optimistic Replication

111

(20)

4.3.1 System Models

112

(2)

4.3.2 Establish Ordering among Operations

114

(3)

4.3.3 State Transfer Systems

117

(4)

4.3.4 Operation Transfer System

121

(5)

4.3.5 Update Commitment

126

(5)

4.4 CAP Theorem

131

(10)

4.4.1 2 out 3

134

(1)

4.4.2 Implications of Enabling Partition Tolerance

135

(3)

References

138

(3)

5 Group Communication Systems

141

(52)

5.1 System Model

143

(3)

5.2 Sequencer Based Group Communication System

146

(14)

5.2.1 Normal Operation

147

(4)

5.2.2 Membership Change

151

(8)

5.2.3 Proof of Correctness

159

(1)

5.3 Sender Based Group Communication System

160

(26)

5.3.1 Total Ordering Protocol

161

(7)

5.3.2 Membership Change Protocol

168

(9)

5.3.3 Recovery Protocol

177

(7)

5.3.4 The Flow Control Mechanism

184

(2)

5.4 Vector Clock Based Group Communication System

186

(7)

References

191

(2)

6 Consensus and the Paxos Algorithms

193

(46)

6.1 The Consensus Problem

194

(2)

6.2 The Paxos Algorithm

196

(10)

6.2.1 Algorithm for Choosing a Value

196

(2)

6.2.2 Algorithm for Learning a Value

198

(1)

6.2.3 Proof of Correctness

198

(2)

6.2.4 Reasoning of the Paxos Algorithm

200

(6)

6.3 Multi-Paxos

206

(4)

6.3.1 Checkpointing and Garbage Collection

207

(1)

6.3.2 Leader Election and View Change

208

(2)

6.4 Dynamic Paxos

210

(11)

6.4.1 Dynamic Paxos

211

(3)

6.4.2 Cheap Paxos

214

(7)

6.5 Fast Paxos

221

(8)

6.5.1 The Basic Steps

222

(1)

6.5.2 Collision Recovery, Quorum Requirement, and Value Selection Rule

223

(6)

6.6 Implementations of the Paxos Family Algorithms

229

(10)

6.6.1 Hard Drive Failures

230

(1)

6.6.2 Multiple Coordinators

230

(1)

6.6.3 Membership Changes

231

(4)

6.6.4 Limited Disk Space for Logging

235

(1)

References

236

(3)

7 Byzantine Fault Tolerance

239

(50)

7.1 The Byzantine Generals Problem

240

(15)

7.1.1 System Model

241

(3)

7.1.2 The Oral Message Algorithms

244

(10)

7.1.3 Proof of Correctness for the Oral Message Algorithms

254

(1)

7.2 Practical Byzantine Fault Tolerance

255

(16)

7.2.1 System Model

256

(1)

7.2.2 Overview of the PBFT Algorithm

257

(2)

7.2.3 Normal Operation of PBFT

259

(2)

7.2.4 Garbage Collection

261

(1)

7.2.5 View Change

262

(3)

7.2.6 Proof of Correctness

265

(1)

7.2.7 Optimizations

266

(5)

7.3 Fast Byzantine Agreement

271

(1)

7.4 Speculative Byzantine Fault Tolerance

271

(18)

7.4.1 The Agreement Protocol

273

(4)

7.4.2 The View Change Protocol

277

(5)

7.4.3 The Checkpointing Protocol

282

(1)

7.4.4 Proof of Correctness

282

(2)

References

284

(5)

8 Application-Aware Byzantine Fault Tolerance

289

(44)

8.1 High Throughput BFT Systems: Networked File Systems

293

(3)

8.2 Exploiting Deep Application Semantics: Web Services Coordination

296

(20)

8.2.1 The Web Services Atomic Transactions Standard

297

(3)

8.2.2 The Web Services Business Activity Standard

300

(3)

8.2.3 Customized BFT Solutions for WS-AT and WS-BA Coordination

303

(13)

8.3 Application Nondeterminism Control

316

(17)

8.3.1 Classification of Application Nondeterminism

317

(7)

8.3.2 Controlling VPRE Type of Nondeterminism

324

(1)

8.3.3 Controlling NPRE Type of Nondeterminism

325

(1)

8.3.4 Controlling VPOST Type of Nondeterminism

326

(2)

8.3.5 Controlling NPOST Type of Nondeterminism

328

(2)

References

330

(3)

Index

333

Wenbing Zhao received his PhD in electrical and computer engineering from the University of California, Santa Barbara, in 2002. Currently, he is an Associate Professor in the Department of Electrical and Computer Engineering at Cleveland State University. Dr. Zhao has more than 80 academic publications to his credit, and three of his recent research papers in the area of dependable distributed computing have won best paper awards. Dr. Zhao also has a U.S. patent on consistent time service for fault-tolerant distributed systems.

Püsilink: https://www.kriso.ee/db/9781118912744_pe.html

Märksõnad:

E-raamat: Building Dependable Distributed Systems [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv