Muutke küpsiste eelistusi

E-raamat: Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2

  • Formaat: 560 pages
  • Ilmumisaeg: 28-Aug-2014
  • Kirjastus: Addison-Wesley Educational Publishers Inc
  • Keel: eng
  • ISBN-13: 9780133478525
  • Formaat - PDF+DRM
  • Hind: 31,68 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 560 pages
  • Ilmumisaeg: 28-Aug-2014
  • Kirjastus: Addison-Wesley Educational Publishers Inc
  • Keel: eng
  • ISBN-13: 9780133478525

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

“There’s an incredible amount of depth and thinking in the practicesdescribed here, and it’s impressive to see it all in one place.”–Win Treese, coauthor of Designing Systems for Internet CommerceThe Practice of Cloud System Administration, Volume 2,focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume,The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:Designing and building modern web and distributed systemsFundamentals of large system designUnderstand the new software engineering implications of cloud administrationMake systems that are resilient to failure and grow and scale dynamicallyImplement DevOps principles and cultural changesIaaS/PaaS/SaaS and virtual platform selectionOperating and running systems using the latest DevOps/SRE strategiesUpgrade production systems with zero down-timeWhat and how to automate; how to decide what not to automateOn-call best practices that improve uptimeWhy distributed systems require fundamentally different system administration techniquesIdentify and resolve resiliency problems before they surprise youAssessing and evaluating your team’s operational effectivenessManage the scientific process of continuous improvementA forty-page, pain-free assessment system you can start using today
Preface xxiii
About the Authors xxix
Introduction 1(6)
Part I Design: Building It
7(138)
1 Designing in a Distributed World
9(22)
1.1 Visibility at Scale
10(1)
1.2 The Importance of Simplicity
11(1)
1.3 Composition
12(5)
1.3.1 Load Balancer with Multiple Backend Replicas
12(2)
1.3.2 Server with Multiple Backends
14(2)
1.3.3 Server Tree
16(1)
1.4 Distributed State
17(4)
1.5 The CAP Principle
21(3)
1.5.1 Consistency
21(1)
1.5.2 Availability
21(1)
1.5.3 Partition Tolerance
22(2)
1.6 Loosely Coupled Systems
24(2)
1.7 Speed
26(3)
1.8 Summary
29(2)
Exercises
30(1)
2 Designing for Operations
31(20)
2.1 Operational Requirements
31(14)
2.1.1 Configuration
33(1)
2.1.2 Startup and Shutdown
34(1)
2.1.3 Queue Draining
35(1)
2.1.4 Software Upgrades
36(1)
2.1.5 Backups and Restores
36(1)
2.1.6 Redundancy
37(1)
2.1.7 Replicated Databases
37(1)
2.1.8 Hot Swaps
38(1)
2.1.9 Toggles for Individual Features
39(1)
2.1.10 Graceful Degradation
39(1)
2.1.11 Access Controls and Rate Limits
40(1)
2.1.12 Data Import Controls
41(1)
2.1.13 Monitoring
42(1)
2.1.14 Auditing
42(1)
2.1.15 Debug Instrumentation
43(1)
2.1.16 Exception Collection
43(1)
2.1.17 Documentation for Operations
44(1)
2.2 Implementing Design for Operations
45(3)
2.2.1 Build Features in from the Beginning
45(1)
2.2.2 Request Features as They Are Identified
46(1)
2.2.3 Write the Features Yourself
47(1)
2.2.4 Work with a Third-Party Vendor
48(1)
2.3 Improving the Model
48(1)
2.4 Summary
49(2)
Exercises
50(1)
3 Selecting a Service Platform
51(18)
3.1 Level of Service Abstraction
52(4)
3.1.1 Infrastructure as a Service
52(2)
3.1.2 Platform as a Service
54(1)
3.1.3 Software as a Service
55(1)
3.2 Type of Machine
56(6)
3.2.1 Physical Machines
57(1)
3.2.2 Virtual Machines
57(3)
3.2.3 Containers
60(2)
3.3 Level of Resource Sharing
62(3)
3.3.1 Compliance
63(1)
3.3.2 Privacy
63(1)
3.3.3 Cost
63(1)
3.3.4 Control
64(1)
3.4 Colocation
65(1)
3.5 Selection Strategies
66(2)
3.6 Summary
68(1)
Exercises
68(1)
4 Application Architectures
69(26)
4.1 Single-Machine Web Server
70(1)
4.2 Three-Tier Web Service
71(6)
4.2.1 Load Balancer Types
72(2)
4.2.2 Load Balancing Methods
74(1)
4.2.3 Load Balancing with Shared State
75(1)
4.2.4 User Identity
76(1)
4.2.5 Scaling
76(1)
4.3 Four-Tier Web Service
77(3)
4.3.1 Frontends
78(1)
4.3.2 Application Servers
79(1)
4.3.3 Configuration Options
80(1)
4.4 Reverse Proxy Service
80(1)
4.5 Cloud-Scale Service
80(5)
4.5.1 Global Load Balancer
81(1)
4.5.2 Global Load Balancing Methods
82(1)
4.5.3 Global Load Balancing with User-Specific Data
82(1)
4.5.4 Internal Backbone
83(2)
4.6 Message Bus Architectures
85(5)
4.6.1 Message Bus Designs
86(1)
4.6.2 Message Bus Reliability
87(1)
4.6.3 Example 1: Link-Shortening Site
87(2)
4.6.4 Example 2: Employee Human Resources Data Updates
89(1)
4.7 Service-Oriented Architecture
90(2)
4.7.1 Flexibility
91(1)
4.7.2 Support
91(1)
4.7.3 Best Practices
91(1)
4.8 Summary
92(3)
Exercises
93(2)
5 Design Patterns for Scaling
95(24)
5.1 General Strategy
96(2)
5.1.1 Identify Bottlenecks
96(1)
5.1.2 Reengineer Components
97(1)
5.1.3 Measure Results
97(1)
5.1.4 Be Proactive
97(1)
5.2 Scaling Up
98(1)
5.3 The AKF Scaling Cube
99(5)
5.3.1 x: Horizontal Duplication
99(2)
5.3.2 y: Functional or Service Splits
101(1)
5.3.3 z: Lookup-Oriented Split
102(2)
5.3.4 Combinations
104(1)
5.4 Caching
104(6)
5.4.1 Cache Effectiveness
105(1)
5.4.2 Cache Placement
106(1)
5.4.3 Cache Persistence
106(1)
5.4.4 Cache Replacement Algorithms
107(1)
5.4.5 Cache Entry Invalidation
108(1)
5.4.6 Cache Size
109(1)
5.5 Data Sharding
110(2)
5.6 Threading
112(1)
5.7 Queueing
113(1)
5.7.1 Benefits
113(1)
5.7.2 Variations
113(1)
5.8 Content Delivery Networks
114(2)
5.9 Summary
116(3)
Exercises
116(3)
6 Design Patterns for Resiliency
119(26)
6.1 Software Resiliency Beats Hardware Reliability
120(1)
6.2 Everything Malfunctions Eventually
121(3)
6.2.1 MTBF in Distributed Systems
121(1)
6.2.2 The Traditional Approach
122(1)
6.2.3 The Distributed Computing Approach
123(1)
6.3 Resiliency through Spare Capacity
124(2)
6.3.1 How Much Spare Capacity
125(1)
6.3.2 Load Sharing versus Hot Spares
126(1)
6.4 Failure Domains
126(2)
6.5 Software Failures
128(3)
6.5.1 Software Crashes
128(1)
6.5.2 Software Hangs
129(1)
6.5.3 Query of Death
130(1)
6.6 Physical Failures
131(7)
6.6.1 Parts and Components
131(3)
6.6.2 Machines
134(1)
6.6.3 Load Balancers
134(2)
6.6.4 Racks
136(1)
6.6.5 Datacenters
137(1)
6.7 Overload Failures
138(3)
6.7.1 Traffic Surges
138(2)
6.7.2 DoS and DDoS Attacks
140(1)
6.7.3 Scraping Attacks
140(1)
6.8 Human Error
141(1)
6.9 Summary
142(3)
Exercises
143(2)
Part II Operations: Running It
145(274)
7 Operations in a Distributed World
147(24)
7.1 Distributed Systems Operations
148(7)
7.1.1 SRE versus Traditional Enterprise IT
148(1)
7.1.2 Change versus Stability
149(2)
7.1.3 Defining SRE
151(1)
7.1.4 Operations at Scale
152(3)
7.2 Service Life Cycle
155(5)
7.2.1 Service Launches
156(4)
7.2.2 Service Decommissioning
160(1)
7.3 Organizing Strategy for Operational Teams
160(6)
7.3.1 Team Member Day Types
162(3)
7.3.2 Other Strategies
165(1)
7.4 Virtual Office
166(1)
7.4.1 Communication Mechanisms
166(1)
7.4.2 Communication Policies
167(1)
7.5 Summary
167(4)
Exercises
168(3)
8 DevOps Culture
171(24)
8.1 What Is DevOps?
172(4)
8.1.1 The Traditional Approach
173(2)
8.1.2 The DevOps Approach
175(1)
8.2 The Three Ways of DevOps
176(4)
8.2.1 The First Way: Workflow
176(1)
8.2.2 The Second Way: Improve Feedback
177(1)
8.2.3 The Third Way: Continual Experimentation and Learning
178(1)
8.2.4 Small Batches Are Better
178(1)
8.2.5 Adopting the Strategies
179(1)
8.3 History of DevOps
180(1)
8.3.1 Evolution
180(1)
8.3.2 Site Reliability Engineering
181(1)
8.4 DevOps Values and Principles
181(5)
8.4.1 Relationships
182(1)
8.4.2 Integration
182(1)
8.4.3 Automation
182(1)
8.4.4 Continuous Improvement
183(1)
8.4.5 Common Nontechnical DevOps Practices
183(1)
8.4.6 Common Technical DevOps Practices
184(2)
8.4.7 Release Engineering DevOps Practices
186(1)
8.5 Converting to DevOps
186(2)
8.5.1 Getting Started
187(1)
8.5.2 DevOps at the Business Level
187(1)
8.6 Agile and Continuous Delivery
188(4)
8.6.1 What Is Agile?
188(1)
8.6.2 What Is Continuous Delivery?
189(3)
8.7 Summary
192(3)
Exercises
193(2)
9 Service Delivery: The Build Phase
195(16)
9.1 Service Delivery Strategies
197(3)
9.1.1 Pattern: Modern DevOps Methodology
197(2)
9.1.2 Anti-pattern: Waterfall Methodology
199(1)
9.2 The Virtuous Cycle of Quality
200(2)
9.3 Build-Phase Steps
202(3)
9.3.1 Develop
202(1)
9.3.2 Commit
202(1)
9.3.3 Build
203(1)
9.3.4 Package
204(1)
9.3.5 Register
204(1)
9.4 Build Console
205(1)
9.5 Continuous Integration
205(2)
9.6 Packages as Handoff Interface
207(1)
9.7 Summary
208(3)
Exercises
209(2)
10 Service Delivery: The Deployment Phase
211(14)
10.1 Deployment-Phase Steps
211(3)
10.1.1 Promotion
212(1)
10.1.2 Installation
212(1)
10.1.3 Configuration
213(1)
10.2 Testing and Approval
214(3)
10.2.1 Testing
215(1)
10.2.2 Approval
216(1)
10.3 Operations Console
217(1)
10.4 Infrastructure Automation Strategies
217(4)
10.4.1 Preparing Physical Machines
217(1)
10.4.2 Preparing Virtual Machines
218(1)
10.4.3 Installing OS and Services
219(2)
10.5 Continuous Delivery
221(1)
10.6 Infrastructure as Code
221(1)
10.7 Other Platform Services
222(1)
10.8 Summary
222(3)
Exercises
223(2)
11 Upgrading Live Services
225(18)
11.1 Taking the Service Down for Upgrading
225(1)
11.2 Rolling Upgrades
226(1)
11.3 Canary
227(2)
11.4 Phased Roll-outs
229(1)
11.5 Proportional Shedding
230(1)
11.6 Blue-Green Deployment
230(1)
11.7 Toggling Features
230(4)
11.8 Live Schema Changes
234(2)
11.9 Live Code Changes
236(1)
11.10 Continuous Deployment
236(3)
11.11 Dealing with Failed Code Pushes
239(1)
11.12 Release Atomicity
240(1)
11.13 Summary
241(2)
Exercises
241(2)
12 Automation
243(32)
12.1 Approaches to Automation
244(6)
12.1.1 The Left-Over Principle
245(1)
12.1.2 The Compensatory Principle
246(1)
12.1.3 The Complementarity Principle
247(1)
12.1.4 Automation for System Administration
248(1)
12.1.5 Lessons Learned
249(1)
12.2 Tool Building versus Automation
250(2)
12.2.1 Example: Auto Manufacturing
251(1)
12.2.2 Example: Machine Configuration
251(1)
12.2.3 Example: Account Creation
251(1)
12.2.4 Tools Are Good, But Automation Is Better
252(1)
12.3 Goals of Automation
252(3)
12.4 Creating Automation
255(3)
12.4.1 Making Time to Automate
256(1)
12.4.2 Reducing Toil
257(1)
12.4.3 Determining What to Automate First
257(1)
12.5 How to Automate
258(1)
12.6 Language Tools
258(4)
12.6.1 Shell Scripting Languages
259(1)
12.6.2 Scripting Languages
259(1)
12.6.3 Compiled Languages
260(1)
12.6.4 Configuration Management Languages
260(2)
12.7 Software Engineering Tools and Techniques
262(8)
12.7.1 Issue Tracking Systems
263(2)
12.7.2 Version Control Systems
265(1)
12.7.3 Software Packaging
266(1)
12.7.4 Style Guides
266(1)
12.7.5 Test-Driven Development
267(1)
12.7.6 Code Reviews
268(1)
12.7.7 Writing Just Enough Code
269(1)
12.8 Multitenant Systems
270(1)
12.9 Summary
271(4)
Exercises
272(3)
13 Design Documents
275(10)
13.1 Design Documents Overview
275(2)
13.1.1 Documenting Changes and Rationale
276(1)
13.1.2 Documentation as a Repository of Past Decisions
276(1)
13.2 Design Document Anatomy
277(2)
13.3 Template
279(1)
13.4 Document Archive
279(1)
13.5 Review Workflows
280(2)
13.5.1 Reviewers and Approvers
281(1)
13.5.2 Achieving Sign-off
281(1)
13.6 Adopting Design Documents
282(1)
13.7 Summary
283(2)
Exercises
284(1)
14 Oncall
285(22)
14.1 Designing Oncall
285(9)
14.1.1 Start with the SLA
286(1)
14.1.2 Oncall Roster
287(1)
14.1.3 Onduty
288(1)
14.1.4 Oncall Schedule Design
288(2)
14.1.5 The Oncall Calendar
290(1)
14.1.6 Oncall Frequency
291(1)
14.1.7 Types of Notifications
292(2)
14.1.8 After-Hours Maintenance Coordination
294(1)
14.2 Being Oncall
294(5)
14.2.1 Pre-shift Responsibilities
294(1)
14.2.2 Regular Oncall Responsibilities
294(1)
14.2.3 Alert Responsibilities
295(1)
14.2.4 Observe, Orient, Decide, Act (OODA)
296(1)
14.2.5 Oncall Playbook
297(1)
14.2.6 Third-Party Escalation
298(1)
14.2.7 End-of-Shift Responsibilities
299(1)
14.3 Between Oncall Shifts
299(3)
14.3.1 Long-Term Fixes
299(1)
14.3.2 Postmortems
300(2)
14.4 Periodic Review of Alerts
302(2)
14.5 Being Paged Too Much
304(1)
14.6 Summary
305(2)
Exercises
306(1)
15 Disaster Preparedness
307(24)
15.1 Mindset
308(3)
15.1.1 Antifragile Systems
308(1)
15.1.2 Reducing Risk
309(2)
15.2 Individual Training: Wheel of Misfortune
311(1)
15.3 Team Training: Fire Drills
312(3)
15.3.1 Service Testing
313(1)
15.3.2 Random Testing
314(1)
15.4 Training for Organizations: Game Day/DiRT
315(8)
15.4.1 Getting Started
316(1)
15.4.2 Increasing Scope
317(1)
15.4.3 Implementation and Logistics
318(2)
15.4.4 Experiencing a DiRT Test
320(3)
15.5 Incident Command System
323(6)
15.5.1 How It Works: Public Safety Arena
325(1)
15.5.2 How It Works: IT Operations Arena
326(1)
15.5.3 Incident Action Plan
326(1)
15.5.4 Best Practices
327(1)
15.5.5 ICS Example
328(1)
15.6 Summary
329(2)
Exercises
330(1)
16 Monitoring Fundamentals
331(14)
16.1 Overview
332(2)
16.1.1 Uses of Monitoring
333(1)
16.1.2 Service Management
334(1)
16.2 Consumers of Monitoring Information
334(2)
16.3 What to Monitor
336(2)
16.4 Retention
338(1)
16.5 Meta-monitoring
339(1)
16.6 Logs
340(2)
16.6.1 Approach
341(1)
16.6.2 Timestamps
341(1)
16.7 Summary
342(3)
Exercises
342(3)
17 Monitoring Architecture and Practice
345(20)
17.1 Sensing and Measurement
346(4)
17.1.1 Blackbox versus Whitebox Monitoring
346(1)
17.1.2 Direct versus Synthesized Measurements
347(1)
17.1.3 Rate versus Capability Monitoring
348(1)
17.1.4 Gauges versus Counters
348(2)
17.2 Collection
350(3)
17.2.1 Push versus Pull
350(1)
17.2.2 Protocol Selection
351(1)
17.2.3 Server Component versus Agent versus Poller
352(1)
17.2.4 Central versus Regional Collectors
352(1)
17.3 Analysis and Computation
353(1)
17.4 Alerting and Escalation Manager
354(4)
17.4.1 Alerting, Escalation, and Acknowledgments
355(1)
17.4.2 Silence versus Inhibit
356(2)
17.5 Visualization
358(4)
17.5.1 Percentiles
359(1)
17.5.2 Stack Ranking
360(1)
17.5.3 Histograms
361(1)
17.6 Storage
362(1)
17.7 Configuration
362(1)
17.8 Summary
363(2)
Exercises
364(1)
18 Capacity Planning
365(22)
18.1 Standard Capacity Planning
366(5)
18.1.1 Current Usage
368(1)
18.1.2 Normal Growth
369(1)
18.1.3 Planned Growth
369(1)
18.1.4 Headroom
370(1)
18.1.5 Resiliency
370(1)
18.1.6 Timetable
371(1)
18.2 Advanced Capacity Planning
371(10)
18.2.1 Identifying Your Primary Resources
372(1)
18.2.2 Knowing Your Capacity Limits
372(1)
18.2.3 Identifying Your Core Drivers
373(1)
18.2.4 Measuring Engagement
374(1)
18.2.5 Analyzing the Data
375(5)
18.2.6 Monitoring the Key Indicators
380(1)
18.2.7 Delegating Capacity Planning
381(1)
18.3 Resource Regression
381(1)
18.4 Launching New Services
382(2)
18.5 Reduce Provisioning Time
384(1)
18.6 Summary
385(2)
Exercises
386(1)
19 Creating KPIs
387(14)
19.1 What Is a KPI?
388(1)
19.2 Creating KPIs
389(4)
19.2.1 Step 1: Envision the Ideal
390(1)
19.2.2 Step 2: Quantify Distance to the Ideal
390(1)
19.2.3 Step 3: Imagine How Behavior Will Change
390(1)
19.2.4 Step 4: Revise and Select
391(1)
19.2.5 Step 5: Deploy the KPI
392(1)
19.3 Example KPI: Machine Allocation
393(3)
19.3.1 The First Pass
393(1)
19.3.2 The Second Pass
394(2)
19.3.3 Evaluating the KPI
396(1)
19.4 Case Study: Error Budget
396(3)
19.4.1 Conflicting Goals
396(1)
19.4.2 A Unified Goal
397(1)
19.4.3 Everyone Benefits
398(1)
19.5 Summary
399(2)
Exercises
399(2)
20 Operational Excellence
401(18)
20.1 What Does Operational Excellence Look Like?
401(1)
20.2 How to Measure Greatness
402(1)
20.3 Assessment Methodology
403(4)
20.3.1 Operational Responsibilities
403(2)
20.3.2 Assessment Levels
405(2)
20.3.3 Assessment Questions and Look-For's
407(1)
20.4 Service Assessments
407(4)
20.4.1 Identifying What to Assess
408(1)
20.4.2 Assessing Each Service
408(1)
20.4.3 Comparing Results across Services
409(1)
20.4.4 Acting on the Results
410(1)
20.4.5 Assessment and Project Planning Frequencies
410(1)
20.5 Organizational Assessments
411(1)
20.6 Levels of Improvement
412(1)
20.7 Getting Started
413(1)
20.8 Summary
414(5)
Exercises
415(1)
Epilogue
416(3)
Part III Appendices
419(2)
A Assessments
421(30)
A.1 Regular Tasks (RT)
423(3)
A.2 Emergency Response (ER)
426(2)
A.3 Monitoring and Metrics (MM)
428(3)
A.4 Capacity Planning (CP)
431(2)
A.5 Change Management (CM)
433(2)
A.6 New Product Introduction and Removal (NPI/NPR)
435(2)
A.7 Service Deployment and Decommissioning (SDD)
437(2)
A.8 Performance and Efficiency (PE)
439(3)
A.9 Service Delivery: The Build Phase
442(2)
A.10 Service Delivery: The Deployment Phase
444(2)
A.11 Toil Reduction
446(2)
A.12 Disaster Preparedness
448(3)
B The Origins and Future of Distributed Computing and Clouds
451(24)
B.1 The Pre-Web Era (1985--1994)
452(3)
Availability Requirements
452(1)
Technology
453(1)
Scaling
454(1)
High Availability
454(1)
Costs
454(1)
B.2 The First Web Era: The Bubble (1995--2000)
455(4)
Availability Requirements
455(1)
Technology
455(1)
Scaling
456(1)
High Availability
457(2)
Costs
459(1)
B.3 The Dot-Bomb Era (2000--2003)
459(6)
Availability Requirements
460(1)
Technology
460(1)
High Availability
461(1)
Scaling
462(2)
Costs
464(1)
B.4 The Second Web Era (2003--2010)
465(4)
Availability Requirements
465(1)
Technology
465(1)
High Availability
466(1)
Scaling
467(1)
Costs
468(1)
B.5 The Cloud Computing Era (2010--present)
469(3)
Availability Requirements
469(1)
Costs
469(2)
Scaling and High Availability
471(1)
Technology
472(1)
B.6 Conclusion
472(3)
Exercises
473(2)
C Scaling Terminology and Concepts
475(6)
C.1 Constant, Linear, and Exponential Scaling
475(1)
C.2 Big O Notation
476(2)
C.3 Limitations of Big O Notation
478(3)
D Templates and Examples
481(6)
D.1 Design Document Template
481(1)
D.2 Design Document Example
482(2)
D.3 Sample Postmortem Template
484(3)
E Recommended Reading
487(4)
Bibliography 491(8)
Index 499
Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator with more than twenty years of experience at companies like Google, Bell Labs, and StackExchange.com.

 

Strata R. Chalup has more than twenty-five years of experience in Silicon Valley, focusing on IT strategy, best-practices, and scalable infrastructures at firms that include Apple, Sun, Cisco, McAfee, and Palm.

 

Christina J. Hogan has more than twenty years of experience in system administration and network engineering, from Silicon Valley to Italy and Switzerland. She has a masters degree in computer science, a doctorate in aeronautical engineering, and has been part of a Formula 1 racing team.