Muutke küpsiste eelistusi

E-raamat: Fault-Tolerant Systems

(Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA), (Department of Electrical and Computer Engineering, University of Massachusetts, Amherst)
  • Formaat: PDF+DRM
  • Ilmumisaeg: 01-Sep-2020
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128181065
  • Formaat - PDF+DRM
  • Hind: 99,58 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 01-Sep-2020
  • Kirjastus: Morgan Kaufmann Publishers In
  • Keel: eng
  • ISBN-13: 9780128181065

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Fault-Tolerant Systems, Second Edition, is the first book on fault tolerance design utilizing a systems approach to both hardware and software. No other text takes this approach or offers the comprehensive and up-to-date treatment that Koren and Krishna provide. The book comprehensively covers the design of fault-tolerant hardware and software, use of fault-tolerance techniques to improve manufacturing yields, and design and analysis of networks. Incorporating case studies that highlight six different computer systems with fault-tolerance techniques implemented in their design, the book includes critical material on methods to protect against threats to encryption subsystems used for security purposes.

The text’s updated content will help students and practitioners in electrical and computer engineering and computer science learn how to design reliable computing systems, and how to analyze fault-tolerant computing systems.

  • Delivers the first book on fault tolerance design with a systems approach
  • Offers comprehensive coverage of both hardware and software fault tolerance, as well as information and time redundancy
  • Features fully updated content plus new chapters on failure mechanisms and fault-tolerance in cyber-physical systems
  • Provides a complete ancillary package, including an on-line solutions manual for instructors and PowerPoint slides
Preface to the Second Edition xi
Acknowledgments xiii
Chapter 1 Preliminaries 1(10)
1.1 Fault Classification
1(2)
1.2 Types of Redundancy
3(1)
1.3 Basic Measures of Fault Tolerance
4(2)
1.3.1 Traditional Measures
4(2)
1.3.2 Network Measures
6(1)
1.4 Outline of This Book
6(2)
1.5 Further Reading
8(1)
References
9(2)
Chapter 2 Hardware Fault Tolerance 11(48)
2.1 The Rate of Hardware Failure
11(2)
2.2 Failure Rate, Reliability, and Mean Time to Failure
13(2)
2.3 Hardware Failure Mechanisms
15(4)
2.3.1 Electromigration
16(1)
2.3.2 Stress Migration
16(1)
2.3.3 Negative Bias Temperature Instability
17(1)
2.3.4 Hot Carrier Injection
17(1)
2.3.5 Time-Dependent Dielectric Breakdown
18(1)
2.3.6 Putting It All Together
18(1)
2.4 Common-Mode Failures
19(1)
2.5 Canonical and Resilient Structures
20(13)
2.5.1 Series and Parallel Systems
20(1)
2.5.2 Nonseries/Parallel Systems
21(3)
2.5.3 M-of-N Systems
24(2)
2.5.4 Voters
26(1)
2.5.5 Variations on N-Modular Redundancy
27(3)
2.5.6 Duplex Systems
30(3)
2.6 Other Reliability Evaluation Techniques
33(6)
2.6.1 Poisson Processes
33(2)
2.6.2 Markov Models
35(4)
2.7 Fault-Tolerance Processor-Level Techniques
39(4)
2.7.1 Watchdog Processor
39(2)
2.7.2 Simultaneous Multithreading for Fault Tolerance
41(2)
2.8 Timing Fault Tolerance
43(2)
2.9 Tolerance of Byzantine Failures
45(5)
2.9.1 Byzantine Agreement With Message Authentication
49(1)
2.10 Further Reading
50(1)
2.11 Exercises
51(4)
References
55(4)
Chapter 3 Information Redundancy 59(56)
3.1 Coding
59(27)
3.1.1 Parity Codes
61(5)
3.1.2 Checksum
66(2)
3.1.3 M-of-N Codes
68(1)
3.1.4 Berger Code
68(1)
3.1.5 Cyclic Codes
69(6)
3.1.6 Arithmetic Codes
75(4)
3.1.7 Local Hard and Soft Decisions
79(7)
3.2 Resilient Disk Systems
86(11)
3.2.1 RAID Level 1
86(1)
3.2.2 RAID Level 2
87(1)
3.2.3 RAID Level 3
88(1)
3.2.4 RAID Level 4
89(1)
3.2.5 RAID Level 5
90(1)
3.2.6 Hierarchical RAID
91(1)
3.2.7 Modeling Correlated Failures
92(4)
3.2.8 RAID With Solid-State Disks
96(1)
3.3 Data Replication
97(9)
3.3.1 Voting: Nonhierarchical Organization
98(5)
3.3.2 Voting: Hierarchical Organization
103(1)
3.3.3 Primary-Backup Approach
104(2)
3.4 Algorithm-Based Fault Tolerance
106(2)
3.5 Further Reading
108(1)
3.6 Exercises
109(3)
References
112(3)
Chapter 4 Fault-Tolerant Networks 115(46)
4.1 Measures of Resilience
116(1)
4.1.1 Graph Theoretical Measures
116(1)
4.1.2 Computer Networks Measures
116(1)
4.2 Common Network Topologies and Their Resilience
117(22)
4.2.1 Multistage and Extra-Stage Networks
118(5)
4.2.2 Crossbar Networks
123(2)
4.2.3 Rectangular Mesh and Interstitial Mesh
125(2)
4.2.4 Hypercube Network
127(4)
4.2.5 Cube-Connected Cycles Networks
131(1)
4.2.6 Loop Networks
132(2)
4.2.7 Tree Networks
134(2)
4.2.8 Ad Hoc Point-to-Point Networks
136(3)
4.3 Fault-Tolerant Routing
139(5)
4.3.1 Hypercube Fault-Tolerant Routing
139(2)
4.3.2 Origin-Based Routing in the Mesh
141(3)
4.4 Networks on a Chip
144(5)
4.4.1 Router Fault Tolerance
145(2)
4.4.2 Links
147(1)
4.4.3 Routing in the Presence of Failure
148(1)
4.5 Wireless Sensor Networks
149(4)
4.5.1 Basics
149(1)
4.5.2 Sensor Network Failures
150(1)
4.5.3 Sensor Network Fault Tolerance
150(3)
4.6 Further Reading
153(1)
4.7 Exercises
154(3)
References
157(4)
Chapter 5 Software Fault Tolerance 161(42)
5.1 Acceptance Tests
161(2)
5.2 Single-Version Fault Tolerance
163(10)
5.2.1 Wrappers
163(2)
5.2.2 Software Rejuvenation
165(4)
5.2.3 Data Diversity
169(2)
5.2.4 Software-Implemented Hardware Fault Tolerance (SIHFT)
171(2)
5.3 N-Version Programming
173(8)
5.3.1 Consistent Comparison Problem
174(1)
5.3.2 Version Independence
175(4)
5.3.3 Other Issues in N-Version Programming
179(2)
5.4 Recovery Block Approach
181(4)
5.4.1 Basic Principles
181(1)
5.4.2 Success Probability Calculation
182(1)
5.4.3 Distributed Recovery Blocks
183(2)
5.5 Preconditions, Postconditions, and Assertions
185(1)
5.6 Exception Handling
185(4)
5.6.1 Requirements From Exception Handlers
186(1)
5.6.2 Basics of Exceptions and Exception Handling
186(3)
5.6.3 Language Support
189(1)
5.7 Software Reliability Models
189(5)
5.7.1 Jelinski-Moranda Model
189(1)
5.7.2 Littlewood-Verrall Model
190(1)
5.7.3 Musa-Okumoto Model
191(1)
5.7.4 Ostrand-Weyuker-Bell (OWB) Fault Model
192(1)
5.7.5 Model Selection and Parameter Estimation
193(1)
5.8 Fault-Tolerant Remote Procedure Calls
194(2)
5.8.1 Primary-Backup Approach
194(1)
5.8.2 The Circus Approach
194(2)
5.9 Further Reading
196(1)
5.10 Exercises
197(2)
References
199(4)
Chapter 6 Checkpointing 203(34)
6.1 What Is Checkpointing?
205(2)
6.1.1 Why Is Checkpointing Nontrivial?
206(1)
6.2 Checkpoint Level
207(1)
6.3 Optimal Checkpointing: an Analytical Model
207(6)
6.3.1 Time Between Checkpoints-a First-Order Approximation
208(1)
6.3.2 Optimal Checkpoint Placement
209(1)
6.3.3 Time Between Checkpoints: a More Accurate Model
210(1)
6.3.4 Reducing Overhead
211(1)
6.3.5 Reducing Latency
212(1)
6.4 Cache-Aided Rollback Error Recovery (CARER)
213(1)
6.5 Checkpointing in Distributed Systems
214(9)
6.5.1 The Domino Effect and Livelock
215(2)
6.5.2 A Coordinated Checkpointing Algorithm
217(1)
6.5.3 Time-Based Synchronization
218(1)
6.5.4 Diskless Checkpointing
219(1)
6.5.5 Message Logging
219(4)
6.6 Checkpointing in Shared-Memory Systems
223(2)
6.6.1 Bus-Based Coherence Protocol
223(1)
6.6.2 Directory-Based Protocol
224(1)
6.7 Checkpointing in Real-Time Systems
225(3)
6.8 Checkpointing While Using Cloud Computing Utilities
228(1)
6.9 Emerging Challenges: Petascale and Exascale Computing
228(1)
6.10 Other Uses of Checkpointing
229(1)
6.11 Further Reading
230(1)
6.12 Exercises
231(2)
References
233(4)
Chapter 7 Cyber-Physical Systems 237(26)
7.1 Structure of a Cyber-Physical System
238(2)
7.2 The Controlled Plant State Space
240(2)
7.3 Sensors
242(10)
7.3.1 Calibration
244(1)
7.3.2 Detecting Faulty Sensors
245(5)
7.3.3 Confidence Measures for Intervals
250(2)
7.4 The Cyber Platform
252(4)
7.4.1 Isolation
253(2)
7.4.2 Load Shedding
255(1)
7.4.3 Overrun Absorption
256(1)
7.5 Actuators
256(3)
7.6 Further Reading
259(1)
7.7 Exercises
260(1)
References
261(2)
Chapter 8 Case Studies 263(28)
8.1 Aerospace Systems
263(3)
8.1.1 Protecting Against Radiation
263(1)
8.1.2 Flight Control System: Boeing 777
264(2)
8.2 NonStop Systems
266(6)
8.2.1 Architecture
267(2)
8.2.2 Maintenance and Repair Aids
269(1)
8.2.3 Software
269(1)
8.2.4 Modifications to the NonStop Architecture
270(2)
8.3 Stratus Systems
272(2)
8.4 Cassini Command and Data Subsystem
274(2)
8.5 IBM POWER8
276(1)
8.6 IBM G5
277(1)
8.7 IBM Sysplex
278(2)
8.8 Intel Servers
280(4)
8.8.1 Itanium
280(2)
8.8.2 Xeon
282(2)
8.9 Oracle SPARC M8 Server
284(1)
8.10 Cloud Computing
285(2)
8.10.1 Checkpointing in Response to Spot Pricing
285(1)
8.10.2 Proactive Virtual Machine Migration
286(1)
8.10.3 Fault Tolerance as a Service
286(1)
8.11 Further Reading
287(1)
References
288(3)
Chapter 9 Simulation Techniques 291(50)
9.1 Writing a Simulation Program
291(3)
9.2 Parameter Estimation
294(11)
9.2.1 Point Versus Interval Estimation
294(1)
9.2.2 Method of Moments
295(2)
9.2.3 Method of Maximum Likelihood
297(3)
9.2.4 The Bayesian Approach to Parameter Estimation
300(1)
9.2.5 Confidence Intervals
301(4)
9.3 Variance Reduction Methods
305(11)
9.3.1 Antithetic Variables
305(2)
9.3.2 Using Control Variables
307(1)
9.3.3 Stratified Sampling
307(2)
9.3.4 Importance Sampling
309(7)
9.4 Splitting
316(5)
9.5 Random Number Generation
321(11)
9.5.1 Uniformly Distributed Random Number Generators
321(3)
9.5.2 Testing Uniform Random Number Generators
324(3)
9.5.3 Generating Other Distributions
327(5)
9.6 Fault Injection
332(3)
9.6.1 Types of Fault Injection Techniques
332(2)
9.6.2 Fault Injection Application and Tools
334(1)
9.7 Further Reading
335(1)
9.8 Exercises
336(2)
References
338(3)
Chapter 10 Defect Tolerance in VLSI Circuits 341(32)
10.1 Manufacturing Defects and Circuit Faults
341(2)
10.2 Probability of Failure and Critical Area
343(2)
10.3 Basic Yield Models
345(4)
10.3.1 The Poisson and Compound Poisson Yield Models
345(2)
10.3.2 Variations on the Simple Yield Models
347(2)
10.4 Yield Enhancement Through Redundancy
349(15)
10.4.1 Yield Projection for Chips With Redundancy
349(4)
10.4.2 Memory Arrays With Redundancy
353(6)
10.4.3 Logic Integrated Circuits With Redundancy
359(2)
10.4.4 Modifying the Floorplan
361(3)
10.5 Further Reading
364(2)
10.6 Exercises
366(3)
References
369(4)
Chapter 11 Fault Detection in Cryptographic Systems 373(22)
11.1 Overview of Ciphers
373(10)
11.1.1 Symmetric Key Ciphers
374(7)
11.1.2 Public Key Ciphers
381(2)
11.2 Security Attacks Through Fault Injection
383(2)
11.2.1 Fault Attacks on Symmetric Key Ciphers
384(1)
11.2.2 Fault Attacks on Public (Asymmetric) Key Ciphers
385(1)
11.3 Countermeasures
385(7)
11.3.1 Spatial and Temporal Duplication
386(1)
11.3.2 Error-Detecting Codes
386(3)
11.3.3 Are These Countermeasure Sufficient?
389(2)
11.3.4 Final Comment
391(1)
11.4 Further Reading
392(1)
11.5 Exercises
392(1)
References
393(2)
Index 395
Israel Koren is Professor Emeritus of Electrical and Computer Engineering at the University of Massachusetts, Amherst. Previously, he held positions with the Technion---Israel Institute of Technology, Haifa, the University of California at Berkeley, the University of Southern California, Los Angeles and the University of California, Santa Barbara. He has been a consultant to several companies, including Analog Devices, AMD, Digital Equipment Corp., IBM, Intel, and National Semiconductors. His research interests include fault-tolerant computing, cyber-physical systems, computer architecture, computer arithmetic, and secure cryptographic systems. He has over 300 publications in refereed journals and conferences and served as general chair, program committee chair and program committee member for numerous conferences C. Mani Krishna is Professor of Electrical and Computer Engineering at the University of Massachusetts, Amherst. He received his PhD in Electrical Engineering from the University of Michigan in 1984. He previously received a BTech in Electrical Engineering from the Indian Institute of Technology, Delhi, in 1979, and an MS from the Rensselaer Polytechnic Institute in Troy, NY, in 1980. Dr. Krishna's research interests are in the areas of cyber-physical systems, real-time and fault-tolerant computing, and distributed and networked systems. He has also been an editor on volumes of readings in performance evaluation and real-time systems, and for special issues on real-time systems of IEEE Computer and the Proceedings of the IEEE.