Foreword |
|
xxv | |
Preface |
|
xxvii | |
Acknowledgments |
|
xxxiii | |
Author |
|
xxxv | |
|
SECTION I AN AVAILABILITY PRIMER |
|
|
|
1 Preamble: A View from 30,000 Feet |
|
|
3 | (10) |
|
|
3 | (1) |
|
Availability in Perspective |
|
|
4 | (6) |
|
Murphy's Law of Availability |
|
|
4 | (1) |
|
Availability Drivers in Flux: What Percentage of Business Is Critical? |
|
|
4 | (2) |
|
Historical View of Availability: The First 7 × 24 Requirements? |
|
|
6 | (2) |
|
Historical Availability Scenarios |
|
|
8 | (1) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
9 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
10 | (3) |
|
2 Reliability and Availability |
|
|
13 | (26) |
|
Introduction to Reliability, Availability, and Serviceability |
|
|
13 | (5) |
|
RAS Moves Beyond Hardware |
|
|
14 | (1) |
|
Availability: An Overview |
|
|
15 | (1) |
|
|
15 | (1) |
|
Quantitative Availability |
|
|
16 | (1) |
|
Availability: 7 R's (SNIA) |
|
|
16 | (2) |
|
|
18 | (4) |
|
|
19 | (1) |
|
Software: Effect of Change |
|
|
20 | (1) |
|
Operations: Effect of Change |
|
|
20 | (1) |
|
|
20 | (2) |
|
Automation: The Solution? |
|
|
22 | (2) |
|
|
22 | (1) |
|
Network Change/Configuration Automation |
|
|
23 | (1) |
|
|
23 | (1) |
|
|
24 | (4) |
|
|
24 | (1) |
|
Duke of York Availability |
|
|
25 | (1) |
|
|
26 | (1) |
|
|
26 | (1) |
|
|
27 | (1) |
|
Types of Nonavailability (Outages) |
|
|
28 | (3) |
|
|
29 | (2) |
|
|
31 | (1) |
|
Planning for Availability and Recovery |
|
|
31 | (2) |
|
|
31 | (1) |
|
What Is a Business Continuity Plan? |
|
|
31 | (1) |
|
|
32 | (1) |
|
|
33 | (1) |
|
Relationships: BC, BIA, and DR |
|
|
33 | (1) |
|
|
33 | (1) |
|
|
34 | (1) |
|
Downtime: Who or What Is to Blame? |
|
|
34 | (1) |
|
Elements of Failure: Interaction of the Wares |
|
|
35 | (2) |
|
|
37 | (1) |
|
|
37 | (2) |
|
3 Reliability: Background and Basics |
|
|
39 | (14) |
|
|
39 | (1) |
|
|
40 | (1) |
|
IT Structure---Hardware Overview |
|
|
40 | (2) |
|
|
42 | (1) |
|
Service Level Agreements: The Dawn of Realism |
|
|
42 | (1) |
|
|
43 | (1) |
|
|
43 | (1) |
|
|
43 | (2) |
|
|
45 | (1) |
|
Elements of Service Management |
|
|
45 | (4) |
|
|
45 | (1) |
|
Scope of Service Management |
|
|
46 | (1) |
|
|
46 | (1) |
|
|
46 | (1) |
|
|
47 | (1) |
|
Service Management Hierarchy |
|
|
47 | (1) |
|
|
48 | (1) |
|
|
49 | (1) |
|
|
49 | (3) |
|
|
49 | (1) |
|
|
50 | (2) |
|
|
52 | (1) |
|
4 What Is High Availability? |
|
|
53 | (52) |
|
|
53 | (1) |
|
Availability Classification |
|
|
54 | (11) |
|
Availability: Outage Analogy |
|
|
56 | (1) |
|
|
56 | (1) |
|
|
57 | (1) |
|
Availability: Fault Tolerance |
|
|
57 | (1) |
|
Sample List of Availability Requirements |
|
|
57 | (1) |
|
|
57 | (1) |
|
Availability: Single Node |
|
|
58 | (1) |
|
Dynamic Reconfiguration/Hot Repair of System Components |
|
|
58 | (1) |
|
Disaster Backup and Recovery |
|
|
58 | (1) |
|
System Administration Facilities |
|
|
59 | (1) |
|
HA Costs Money, So Why Bother? |
|
|
59 | (1) |
|
|
59 | (1) |
|
|
60 | (1) |
|
Penalty for Nonavailability |
|
|
60 | (1) |
|
Organizations: Attitude toward HA |
|
|
60 | (1) |
|
Aberdeen Group Study: February 2012 |
|
|
61 | (1) |
|
Outage Loss Factors (Percentage of Loss) |
|
|
62 | (1) |
|
|
62 | (2) |
|
|
64 | (1) |
|
Performance and Availability |
|
|
64 | (1) |
|
HA Design: Top 10 Mistakes |
|
|
65 | (1) |
|
|
65 | (4) |
|
|
65 | (2) |
|
Systems and Subsystems Development |
|
|
67 | (1) |
|
|
67 | (2) |
|
Availability Architectures |
|
|
69 | (3) |
|
|
69 | (1) |
|
|
69 | (1) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
71 | (1) |
|
|
71 | (1) |
|
|
72 | (1) |
|
Outline of Server Domain Architecture |
|
|
72 | (2) |
|
|
72 | (1) |
|
|
73 | (1) |
|
Outline of Cluster Architecture |
|
|
74 | (1) |
|
Cluster Configurations: Commercial Cluster |
|
|
74 | (1) |
|
|
74 | (3) |
|
|
74 | (1) |
|
|
75 | (1) |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
77 | (5) |
|
|
77 | (1) |
|
HPC Cluster: Oscar Configuration |
|
|
78 | (1) |
|
HPC Cluster: Availability |
|
|
79 | (1) |
|
HPC Cluster: Applications |
|
|
79 | (1) |
|
HA in Scientific Computing |
|
|
80 | (1) |
|
Topics in HPC Reliability: Summary |
|
|
80 | (1) |
|
Errors in Cluster HA Design |
|
|
81 | (1) |
|
Outline of Grid Computing |
|
|
82 | (1) |
|
|
82 | (1) |
|
Commercial Grid Computing |
|
|
83 | (1) |
|
Outline of RAID Architecture |
|
|
83 | (11) |
|
|
83 | (1) |
|
RAID Architecture and Levels |
|
|
84 | (1) |
|
|
84 | (1) |
|
|
85 | (1) |
|
Hardware versus Software RAID |
|
|
85 | (1) |
|
RAID Striping: Fundamental to RAID |
|
|
85 | (1) |
|
|
86 | (1) |
|
|
86 | (1) |
|
|
86 | (1) |
|
|
87 | (1) |
|
|
87 | (1) |
|
|
87 | (1) |
|
|
87 | (1) |
|
|
88 | (1) |
|
|
88 | (1) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
89 | (1) |
|
|
89 | (1) |
|
|
89 | (1) |
|
|
89 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
90 | (1) |
|
Standard RAID Storage Efficiency |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
93 | (1) |
|
|
93 | (1) |
|
|
93 | (1) |
|
|
94 | (3) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
95 | (1) |
|
|
96 | (1) |
|
|
97 | (4) |
|
RAID Successor Qualifications |
|
|
97 | (1) |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
100 | (1) |
|
|
101 | (4) |
|
SECTION II AVAILABILITY THEORY AND PRACTICE |
|
|
|
5 High Availability: Theory |
|
|
105 | (36) |
|
|
105 | (4) |
|
Guide to Reliability Graphs |
|
|
105 | (1) |
|
Probability Density Function |
|
|
105 | (2) |
|
Cumulative Distribution Function |
|
|
107 | (1) |
|
Availability Probabilities |
|
|
107 | (1) |
|
|
108 | (1) |
|
|
109 | (2) |
|
Hardware Reliability: The Bathtub Curve |
|
|
109 | (1) |
|
Software Reliability: The Bathtub Curve |
|
|
110 | (1) |
|
Simple Math of Availability |
|
|
111 | (13) |
|
|
111 | (1) |
|
|
112 | (1) |
|
Mean Time between Failures |
|
|
112 | (1) |
|
|
112 | (1) |
|
|
113 | (1) |
|
Availability Equation I: Time Factors in an Outage |
|
|
114 | (2) |
|
|
116 | (1) |
|
Effect of Redundant Blocks on Availability |
|
|
117 | (1) |
|
Parallel (Redundant) Components |
|
|
118 | (1) |
|
Two Parallel Blocks: Example |
|
|
118 | (1) |
|
Combinations of Series and Parallel Blocks |
|
|
119 | (1) |
|
|
120 | (1) |
|
System Failure Combinations |
|
|
120 | (1) |
|
Complex Systems Solution Methods |
|
|
121 | (1) |
|
Real-Life Example: Cisco Network Configuration |
|
|
121 | (1) |
|
|
121 | (1) |
|
|
122 | (1) |
|
Summary of Block Considerations |
|
|
123 | (1) |
|
Sample Availability Calculations versus Costs |
|
|
124 | (1) |
|
Calculation 1 Server Is 99% Available |
|
|
124 | (1) |
|
Calculation 2 Server Is 99.99% Available |
|
|
124 | (1) |
|
Availability: MTBFs and Failure Rate |
|
|
124 | (15) |
|
|
125 | (1) |
|
Planned versus Unplanned Outages |
|
|
125 | (1) |
|
Planned Downtime: Planned Downtime Breakdown |
|
|
126 | (2) |
|
|
128 | (1) |
|
Security: The New Downtime |
|
|
128 | (1) |
|
Disasters: Breakdown of Causes |
|
|
128 | (1) |
|
|
129 | (1) |
|
|
129 | (1) |
|
|
130 | (1) |
|
External Electromagnetic Radiation Addendum |
|
|
131 | (1) |
|
Power: Recovery Timescales for Uninterruptible Power Supply |
|
|
131 | (1) |
|
|
132 | (1) |
|
Pandemics? Disaster Waiting to Happen? |
|
|
133 | (1) |
|
Disasters: Learning the Hard Way |
|
|
133 | (1) |
|
|
133 | (2) |
|
Downtime Gotchas: Survey Paper |
|
|
135 | (1) |
|
Downtime Reduction Initiatives |
|
|
135 | (1) |
|
|
135 | (1) |
|
Availability: A Lesson in Design |
|
|
136 | (1) |
|
Availability: Humor in an Outage---Part I |
|
|
137 | (1) |
|
Availability: Humor in an Outage---Part II |
|
|
137 | (1) |
|
|
137 | (1) |
|
Application Nonavailability |
|
|
137 | (1) |
|
Traditional Outage Reasons |
|
|
138 | (1) |
|
|
138 | (1) |
|
|
139 | (2) |
|
6 High Availability: Practice |
|
|
141 | (56) |
|
|
141 | (1) |
|
|
141 | (3) |
|
Sample Domain Architecture |
|
|
143 | (1) |
|
Planning for Availability---Starting Point |
|
|
144 | (1) |
|
|
145 | (18) |
|
Availability by Systems Design/Modification |
|
|
145 | (1) |
|
Availability by Engineering Design |
|
|
145 | (1) |
|
Self-Healing Hardware and Software |
|
|
145 | (1) |
|
Self-Healing and Other Items |
|
|
146 | (1) |
|
Availability by Application Design: Poor Application Design |
|
|
147 | (1) |
|
|
147 | (1) |
|
|
147 | (2) |
|
Availability by Configuration |
|
|
149 | (1) |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
150 | (1) |
|
|
150 | (1) |
|
|
150 | (1) |
|
Availability by Outside Consultancy |
|
|
151 | (1) |
|
Availability by Vendor Support |
|
|
151 | (1) |
|
Availability by Proactive Monitoring |
|
|
151 | (1) |
|
Availability by Technical Support Excellence |
|
|
152 | (1) |
|
Availability by Operations Excellence |
|
|
152 | (1) |
|
|
153 | (1) |
|
|
153 | (1) |
|
|
154 | (1) |
|
|
154 | (1) |
|
|
154 | (1) |
|
Availability by Retrospective Analysis |
|
|
154 | (1) |
|
Availability by Application Monitoring |
|
|
155 | (1) |
|
Availability by Automation |
|
|
155 | (1) |
|
Availability by Reactive Recovery |
|
|
156 | (1) |
|
Availability by Partnerships |
|
|
157 | (1) |
|
Availability by Change Management |
|
|
158 | (1) |
|
Availability by Performance/Capacity Management |
|
|
158 | (1) |
|
Availability by Monitoring |
|
|
159 | (1) |
|
Availability by Cleanliness |
|
|
159 | (1) |
|
Availability by Anticipation |
|
|
159 | (1) |
|
|
159 | (1) |
|
|
160 | (1) |
|
Availability by Organization |
|
|
160 | (1) |
|
Availability by Eternal Vigilance |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
162 | (1) |
|
Network Reliability/Availability |
|
|
163 | (6) |
|
|
163 | (1) |
|
|
164 | (1) |
|
|
164 | (1) |
|
Network Design for Availability |
|
|
165 | (1) |
|
|
166 | (1) |
|
File Transfer Reliability |
|
|
167 | (2) |
|
|
169 | (1) |
|
|
169 | (6) |
|
|
169 | (1) |
|
Software: Output Verification |
|
|
170 | (1) |
|
|
171 | (1) |
|
|
171 | (1) |
|
|
171 | (1) |
|
Software Reliability: Problem Flow |
|
|
171 | (1) |
|
|
172 | (1) |
|
|
173 | (1) |
|
|
173 | (2) |
|
Software Reliability---Models |
|
|
175 | (10) |
|
|
175 | (1) |
|
|
175 | (1) |
|
|
176 | (1) |
|
SRE Models: Shape Characterization |
|
|
177 | (1) |
|
SRE Models: Time-Based versus Defect-Based |
|
|
178 | (1) |
|
Software Reliability Growth Model |
|
|
178 | (2) |
|
Software Reliability Model: Defect Count |
|
|
180 | (1) |
|
Software Reliability: IEEE Standard 1633-2008 |
|
|
181 | (1) |
|
Software Reliability: Hardening |
|
|
182 | (1) |
|
Software Reliability: Installation |
|
|
182 | (1) |
|
Software Reliability: Version Control |
|
|
183 | (1) |
|
Software: Penetration Testing |
|
|
183 | (1) |
|
Software: Fault Tolerance |
|
|
184 | (1) |
|
Software Error Classification |
|
|
185 | (1) |
|
|
185 | (1) |
|
|
185 | (1) |
|
Reliability Properties of Software |
|
|
186 | (1) |
|
|
186 | (1) |
|
|
186 | (1) |
|
Software Reliability: Current Status |
|
|
187 | (1) |
|
Software Reliability: Assessment Questions |
|
|
188 | (1) |
|
Software Universe and Summary |
|
|
188 | (1) |
|
|
189 | (5) |
|
Hardware Outside the Server |
|
|
189 | (1) |
|
Disk Subsystem Reliability |
|
|
190 | (1) |
|
|
190 | (1) |
|
|
191 | (1) |
|
Availability: Other Peripherals |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
193 | (1) |
|
|
194 | (3) |
|
Be Prepared for Big Brother! |
|
|
195 | (2) |
|
7 High Availability: SLAs, Management, and Methods |
|
|
197 | (52) |
|
|
197 | (1) |
|
|
198 | (2) |
|
Pre-Production Activities |
|
|
198 | (1) |
|
|
199 | (1) |
|
|
199 | (1) |
|
|
200 | (1) |
|
|
201 | (8) |
|
|
201 | (1) |
|
SLA: Availability and QoS |
|
|
201 | (1) |
|
|
201 | (2) |
|
|
203 | (1) |
|
Potential Business Benefits of SLAs |
|
|
203 | (1) |
|
Potential IT Benefits of SLAs |
|
|
204 | (1) |
|
|
204 | (1) |
|
SLA: Structure and Samples |
|
|
205 | (1) |
|
SLA: How Do We Quantify Availability? |
|
|
206 | (1) |
|
SLA: Reporting of Availability |
|
|
206 | (1) |
|
|
207 | (2) |
|
HA Management: The Project |
|
|
209 | (14) |
|
Start-Up and Design Phase |
|
|
209 | (1) |
|
|
210 | (1) |
|
|
210 | (1) |
|
Project Definition Workshop |
|
|
210 | (2) |
|
|
212 | (1) |
|
|
212 | (1) |
|
Project Initiation Document |
|
|
213 | (1) |
|
PID Structure and Purpose |
|
|
213 | (2) |
|
|
215 | (1) |
|
Delphi Techniques and Intensive Planning |
|
|
215 | (1) |
|
|
215 | (1) |
|
|
216 | (1) |
|
|
217 | (1) |
|
|
217 | (1) |
|
|
218 | (1) |
|
|
218 | (1) |
|
FMECA = FMEA + Criticality |
|
|
219 | (1) |
|
Risk Evaluation and Priority: Risk Evaluation Methods |
|
|
219 | (1) |
|
Component Failure Impact Analysis |
|
|
220 | (1) |
|
CFIA Development---A Walkthrough and Risk Analysis |
|
|
220 | (1) |
|
|
221 | (1) |
|
|
222 | (1) |
|
|
222 | (1) |
|
Management of Operations Phase |
|
|
223 | (2) |
|
Failure Reporting and Corrective Action System |
|
|
223 | (1) |
|
|
223 | (1) |
|
FRACAS: Steps for Handling Failures |
|
|
223 | (2) |
|
HA Operations: Supporting Disciplines |
|
|
225 | (14) |
|
|
225 | (1) |
|
|
225 | (1) |
|
|
225 | (1) |
|
Change/Configuration Management |
|
|
226 | (1) |
|
Change Management and Control: Best Practice |
|
|
226 | (1) |
|
|
227 | (1) |
|
|
228 | (1) |
|
|
229 | (1) |
|
|
229 | (1) |
|
|
229 | (1) |
|
|
230 | (1) |
|
Security: Threats or Posturing? |
|
|
230 | (1) |
|
|
231 | (1) |
|
|
231 | (1) |
|
|
232 | (1) |
|
Problems: After the Event |
|
|
232 | (1) |
|
|
233 | (1) |
|
|
233 | (1) |
|
Faults and What to Do about Them |
|
|
233 | (1) |
|
System Failure: The Response Stages |
|
|
234 | (1) |
|
|
235 | (1) |
|
|
235 | (1) |
|
|
235 | (1) |
|
What? IT Problem Recovery without IT? |
|
|
235 | (1) |
|
Faults and What Not to Do |
|
|
236 | (1) |
|
Outages: Areas for Inaction |
|
|
236 | (1) |
|
|
237 | (1) |
|
|
237 | (1) |
|
|
237 | (1) |
|
Help Desk Architecture and Implementation |
|
|
238 | (1) |
|
|
238 | (1) |
|
|
238 | (1) |
|
|
239 | (6) |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
241 | (1) |
|
Synthetic Workload: Generic Requirements |
|
|
241 | (1) |
|
|
242 | (1) |
|
|
243 | (1) |
|
|
243 | (1) |
|
Availability: Related Monitors |
|
|
244 | (1) |
|
|
244 | (1) |
|
The Viewpoint Approach to Documentation |
|
|
245 | (1) |
|
|
245 | (4) |
|
SECTION III VENDORS AND HIGH AVAILABILITY |
|
|
|
8 High Availability: Vendor Products |
|
|
249 | (18) |
|
IBM Availability and Reliability |
|
|
250 | (4) |
|
|
250 | (1) |
|
|
251 | (1) |
|
|
251 | (1) |
|
|
251 | (1) |
|
|
251 | (1) |
|
Z Series Parallel Sysplex |
|
|
251 | (1) |
|
Sysplex Structure and Purpose |
|
|
252 | (1) |
|
Parallel Sysplex Schematic |
|
|
252 | (1) |
|
IBM: High Availability Services |
|
|
253 | (1) |
|
|
253 | (1) |
|
|
254 | (2) |
|
|
254 | (1) |
|
|
254 | (1) |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
255 | (1) |
|
|
256 | (4) |
|
|
256 | (1) |
|
|
256 | (1) |
|
|
256 | (1) |
|
|
256 | (1) |
|
Servers: Integrity Servers |
|
|
257 | (1) |
|
HP NonStop Integrity Servers |
|
|
258 | (1) |
|
NonStop Architecture and Stack |
|
|
258 | (1) |
|
|
259 | (1) |
|
|
260 | (1) |
|
|
260 | (1) |
|
ActiveService Architecture |
|
|
261 | (1) |
|
|
261 | (3) |
|
Veritas Clusters (Symantec) |
|
|
261 | (1) |
|
|
261 | (1) |
|
Databases, Applications, and Replicators |
|
|
262 | (1) |
|
|
262 | (1) |
|
|
262 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
263 | (1) |
|
|
264 | (1) |
|
|
264 | (1) |
|
Service Availability Software |
|
|
264 | (1) |
|
|
265 | (1) |
|
Continuity Software: Services |
|
|
265 | (1) |
|
|
265 | (2) |
|
9 High Availability: Transaction Processing and Databases |
|
|
267 | (26) |
|
Transaction Processing Systems |
|
|
267 | (1) |
|
Some TP Systems: OLTP Availability Requirements |
|
|
268 | (1) |
|
TP Systems with Databases |
|
|
268 | (3) |
|
The X/Open Distributed Transaction Processing Model: XA and XA+ Concepts |
|
|
269 | (1) |
|
|
270 | (1) |
|
Relational Database Systems |
|
|
271 | (1) |
|
|
271 | (1) |
|
|
271 | (1) |
|
|
272 | (3) |
|
Microsoft SQL Server 2014 Community Technology Preview 1 |
|
|
273 | (1) |
|
|
273 | (1) |
|
SQL Server AlwaysOn Solutions |
|
|
273 | (1) |
|
Failover Cluster Instances |
|
|
273 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
|
274 | (1) |
|
|
275 | (2) |
|
|
275 | (1) |
|
|
275 | (1) |
|
|
275 | (1) |
|
|
276 | (1) |
|
|
276 | (1) |
|
Oracle High Availability Playing Field |
|
|
276 | (1) |
|
|
277 | (1) |
|
|
278 | (1) |
|
MySQL: HA Services and Support |
|
|
278 | (1) |
|
|
278 | (2) |
|
DB2 for Windows, UNIX, and Linux |
|
|
279 | (1) |
|
|
279 | (1) |
|
|
279 | (1) |
|
DB2 Replication: SQL and Q Replication |
|
|
280 | (1) |
|
|
280 | (1) |
|
|
280 | (1) |
|
|
280 | (1) |
|
InfoSphere Replication Server for z/OS |
|
|
281 | (1) |
|
DB2 Cross Platform Development |
|
|
281 | (1) |
|
IBM Informix Database and HA |
|
|
281 | (3) |
|
Introduction (Informix 11.70) |
|
|
281 | (1) |
|
|
282 | (1) |
|
|
282 | (1) |
|
Informix MACH 11 Clusters |
|
|
282 | (1) |
|
|
283 | (1) |
|
|
283 | (1) |
|
|
284 | (1) |
|
|
284 | (1) |
|
Ingres High Availability Option |
|
|
284 | (1) |
|
|
285 | (3) |
|
Sybase High Availability Option |
|
|
285 | (1) |
|
|
285 | (1) |
|
|
286 | (1) |
|
|
286 | (1) |
|
|
286 | (1) |
|
Business Continuity with SAP Sybase |
|
|
287 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
|
289 | (4) |
|
SECTION IV CLOUDS AND VIRTUALIZATION |
|
|
|
10 High Availability: The Cloud and Virtualization |
|
|
293 | (14) |
|
|
293 | (5) |
|
|
294 | (1) |
|
|
294 | (1) |
|
|
294 | (1) |
|
|
295 | (1) |
|
|
296 | (1) |
|
Resource Management in the Cloud |
|
|
297 | (1) |
|
|
297 | (1) |
|
Cloud Availability and Security |
|
|
298 | (2) |
|
|
298 | (1) |
|
|
298 | (1) |
|
Aberdeen: Cloud Storage Outages |
|
|
299 | (1) |
|
|
299 | (1) |
|
|
300 | (3) |
|
|
300 | (1) |
|
|
301 | (1) |
|
|
302 | (1) |
|
Security Risks in Virtual Environments |
|
|
303 | (1) |
|
Vendors and Virtualization |
|
|
303 | (3) |
|
|
303 | (1) |
|
|
304 | (1) |
|
VMware VSphere, ESX, and ESXi |
|
|
304 | (1) |
|
|
304 | (1) |
|
HP Integrity Virtual Machines |
|
|
304 | (1) |
|
|
304 | (1) |
|
|
304 | (1) |
|
|
305 | (1) |
|
|
305 | (1) |
|
Virtualization Information Sources |
|
|
306 | (1) |
|
|
306 | (1) |
|
11 Disaster Recovery Overview |
|
|
307 | (28) |
|
|
307 | (4) |
|
|
307 | (1) |
|
Disasters Are Rare Aren't They? |
|
|
308 | (1) |
|
|
308 | (1) |
|
DR Invocation Reasons: Forrester Survey |
|
|
309 | (1) |
|
DR Testing: Kaseya Survey |
|
|
310 | (1) |
|
|
310 | (1) |
|
|
311 | (7) |
|
|
311 | (1) |
|
|
311 | (1) |
|
|
311 | (1) |
|
|
311 | (1) |
|
|
311 | (1) |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
312 | (1) |
|
Multilevel Incremental Backup |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
312 | (1) |
|
|
313 | (1) |
|
|
314 | (1) |
|
|
315 | (1) |
|
|
315 | (1) |
|
|
316 | (1) |
|
Heterogeneous Replication |
|
|
316 | (1) |
|
|
316 | (1) |
|
DR Recovery Time Objective: WAN Optimization |
|
|
317 | (1) |
|
Backup Product Assessments |
|
|
318 | (3) |
|
|
318 | (1) |
|
Gartner Quadrant Analysis |
|
|
318 | (1) |
|
Backup/Archive: Tape or Disk? |
|
|
319 | (1) |
|
|
319 | (1) |
|
|
320 | (1) |
|
DR Concepts and Considerations |
|
|
321 | (3) |
|
|
321 | (1) |
|
|
321 | (1) |
|
|
322 | (1) |
|
|
322 | (1) |
|
|
323 | (1) |
|
|
323 | (1) |
|
|
324 | (1) |
|
|
324 | (6) |
|
|
324 | (1) |
|
|
324 | (3) |
|
DR Requirements in Operations |
|
|
327 | (1) |
|
|
327 | (1) |
|
|
327 | (1) |
|
|
327 | (1) |
|
|
327 | (1) |
|
|
328 | (1) |
|
|
328 | (1) |
|
Third-Party DR (Outsourcing) |
|
|
329 | (1) |
|
|
329 | (1) |
|
|
329 | (1) |
|
Disaster Recovery Templates |
|
|
330 | (1) |
|
|
330 | (5) |
|
SECTION V APPENDICES AND HARD SUMS |
|
|
|
|
335 | (38) |
|
Reliability and Availability: Terminology |
|
|
335 | (36) |
|
|
371 | (2) |
|
|
373 | (14) |
|
Availability: MTBF/MTTF/MTTR Discussion |
|
|
373 | (8) |
|
|
373 | (2) |
|
|
375 | (1) |
|
|
375 | (1) |
|
MTTF and MTBF---The Difference |
|
|
375 | (2) |
|
|
377 | (1) |
|
Serial Blocks and Availability---NB |
|
|
378 | (1) |
|
|
379 | (1) |
|
Gathering MTTF/MTBF Figures |
|
|
380 | (1) |
|
Outage Records and MTTx Figures |
|
|
380 | (1) |
|
MTTF and MTTR Interpretation |
|
|
381 | (6) |
|
|
381 | (1) |
|
|
381 | (1) |
|
|
382 | (1) |
|
|
382 | (1) |
|
Forrester/Zenoss MTxx Definitions |
|
|
383 | (1) |
|
|
384 | (3) |
|
|
387 | (18) |
|
Your HA/DR Route Map and Kitbag |
|
|
387 | (16) |
|
|
387 | (1) |
|
|
387 | (4) |
|
|
391 | (1) |
|
HA and DR: Total Cost of Ownership |
|
|
392 | (1) |
|
|
392 | (1) |
|
|
393 | (1) |
|
|
394 | (1) |
|
Risk Assessment and Management |
|
|
394 | (1) |
|
Who Are the Risk Stakeholders? |
|
|
395 | (1) |
|
|
395 | (1) |
|
|
395 | (1) |
|
Availability: Project Risk Management |
|
|
396 | (4) |
|
Availability: Deliverables Risk Management |
|
|
400 | (2) |
|
Deliverables Risk Management Plan: Specific Risk Areas |
|
|
402 | (1) |
|
|
403 | (1) |
|
|
403 | (2) |
|
|
405 | (56) |
|
Availability: Math and Other Topics |
|
|
405 | (53) |
|
Lesson 1 Multiplication, Summation, and Integration Symbols |
|
|
405 | (1) |
|
Mathematical Distributions |
|
|
405 | (1) |
|
Lesson 2 General Theory of Reliability and Availability |
|
|
406 | (1) |
|
Reliability Distributions |
|
|
406 | (4) |
|
Lesson 3 Parallel Components (Blocks) |
|
|
410 | (1) |
|
Availability: m-from-n Components |
|
|
410 | (1) |
|
|
410 | (1) |
|
|
410 | (1) |
|
m-from-n Redundant Blocks |
|
|
411 | (1) |
|
Active and Standby Redundancy |
|
|
412 | (1) |
|
|
412 | (1) |
|
Summary of Redundancy Systems |
|
|
412 | (1) |
|
|
413 | (1) |
|
|
414 | (1) |
|
Math of m-from-n Configurations |
|
|
415 | (1) |
|
|
415 | (1) |
|
An Example of These Equations |
|
|
415 | (1) |
|
Online Tool for Parallel Components: Typical Calculation |
|
|
416 | (1) |
|
NB: Realistic IT Redundancy |
|
|
417 | (1) |
|
Overall Availability Graphs |
|
|
418 | (1) |
|
Try This Availability Test |
|
|
419 | (1) |
|
Lesson 4 Cluster Speedup Formulae |
|
|
419 | (1) |
|
|
420 | (1) |
|
|
421 | (2) |
|
|
423 | (1) |
|
|
424 | (1) |
|
|
425 | (1) |
|
Lesson 5 Some RAID and EC Math |
|
|
426 | (1) |
|
|
426 | (3) |
|
|
429 | (3) |
|
Lesson 6 Math of Monitoring |
|
|
432 | (1) |
|
|
432 | (3) |
|
|
435 | (1) |
|
Lesson 7 Software Reliability/Availability |
|
|
435 | (1) |
|
|
435 | (1) |
|
Software Reliability Theory |
|
|
436 | (1) |
|
The Failure/Defect Density Models |
|
|
437 | (7) |
|
Lesson 8 Additional RAS Features |
|
|
444 | (1) |
|
|
444 | (1) |
|
|
444 | (1) |
|
|
445 | (1) |
|
|
445 | (1) |
|
Fault Detection and Isolation |
|
|
445 | (1) |
|
Clocks and Service Processor |
|
|
446 | (1) |
|
|
446 | (1) |
|
Predictive Failure Analysis |
|
|
447 | (1) |
|
Lesson 9 Triple Modular Redundancy |
|
|
447 | (1) |
|
Lesson 10 Cyber Crime, Security, and Availability |
|
|
448 | (1) |
|
|
448 | (1) |
|
|
449 | (1) |
|
|
449 | (1) |
|
Zero Trust Security Model |
|
|
449 | (1) |
|
Security Information Event Management |
|
|
450 | (1) |
|
|
450 | (1) |
|
|
451 | (1) |
|
Security: Denial of Service |
|
|
452 | (1) |
|
Security: Insider Threats |
|
|
452 | (1) |
|
Security: Mobile Devices (BYOD) |
|
|
453 | (1) |
|
|
454 | (1) |
|
Security: WiFi in the Enterprise |
|
|
455 | (1) |
|
|
455 | (1) |
|
|
456 | (1) |
|
|
456 | (1) |
|
|
457 | (1) |
|
Cost of Cyber Crime Prevention versus Risk |
|
|
457 | (1) |
|
|
458 | (1) |
|
|
458 | (3) |
|
|
461 | (18) |
|
Availability: Organizations and References |
|
|
461 | (18) |
|
Reliability/Availability Organizations |
|
|
461 | (1) |
|
Reliability Information Analysis Center |
|
|
462 | (1) |
|
|
462 | (1) |
|
|
462 | (1) |
|
Storage Networking Industry Association |
|
|
463 | (1) |
|
|
463 | (1) |
|
Service Availability Forum |
|
|
463 | (1) |
|
Carnegie Mellon Software Engineering Institute |
|
|
464 | (1) |
|
ROC Project---Software Resilience |
|
|
465 | (1) |
|
Business Continuity Today |
|
|
465 | (1) |
|
Disaster Recovery Institute |
|
|
465 | (1) |
|
Business Continuity Institute |
|
|
466 | (1) |
|
Information Availability Institute |
|
|
466 | (1) |
|
International Working Group on Cloud Computing Resiliency |
|
|
466 | (1) |
|
|
466 | (1) |
|
Center for Software Reliability |
|
|
467 | (1) |
|
|
467 | (1) |
|
|
467 | (1) |
|
Security? I Can't Be Bothered |
|
|
467 | (1) |
|
|
468 | (1) |
|
|
468 | (1) |
|
|
469 | (1) |
|
Cyber Security and Information Systems IAC |
|
|
469 | (1) |
|
Center for International Security and Cooperation |
|
|
469 | (1) |
|
Other Reliability/Security Resources |
|
|
469 | (1) |
|
Books, Articles, and Websites |
|
|
469 | (1) |
|
Major Reliability/Availability Information Sources |
|
|
469 | (1) |
|
Other Information Sources |
|
|
470 | (9) |
|
|
479 | (10) |
|
Service Management: Where Next? |
|
|
479 | (10) |
|
Information Technology Infrastructure Library |
|
|
479 | (1) |
|
ITIL Availability Management |
|
|
480 | (1) |
|
|
480 | (3) |
|
|
483 | (1) |
|
Availability Architectures: HA Documentation |
|
|
483 | (1) |
|
|
484 | (5) |
|
|
489 | (2) |
Index |
|
491 | |