Foreword |
|
xi | |
Preface |
|
xiii | |
Introducing Modern System Administration |
|
xix | |
|
Part I Reasoning About Systems |
|
|
|
1 Patterns and Interconnections |
|
|
1 | (10) |
|
|
2 | (3) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (2) |
|
|
8 | (1) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
9 | (2) |
|
|
11 | (12) |
|
|
11 | (2) |
|
Choosing the Location of Your Workloads |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
14 | (1) |
|
|
15 | (1) |
|
|
15 | (1) |
|
|
16 | (2) |
|
|
18 | (1) |
|
Guidelines for Choosing Compute |
|
|
19 | (2) |
|
|
21 | (2) |
|
|
23 | (16) |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
|
28 | (1) |
|
|
29 | (2) |
|
Considerations for Your Storage Strategy |
|
|
31 | (2) |
|
Anticipate Your Capacity and Latency Requirements |
|
|
33 | (1) |
|
Retain Your Data as Long as Is Reasonably Necessary |
|
|
33 | (1) |
|
Respect the Privacy Concerns of Your Users |
|
|
34 | (1) |
|
|
35 | (1) |
|
Be Prepared to Handle Disaster Recovery Situations |
|
|
36 | (1) |
|
|
37 | (2) |
|
|
39 | (12) |
|
|
39 | (1) |
|
Key Characteristics of Networks |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (1) |
|
Software-Defined Networks |
|
|
43 | (1) |
|
Content Distribution Networks |
|
|
44 | (2) |
|
Guidelines to Your Network Strategy |
|
|
46 | (1) |
|
|
47 | (4) |
|
|
|
|
51 | (10) |
|
What Is Your Digital Toolkit? |
|
|
51 | (2) |
|
The Components of Your Toolkit |
|
|
53 | (1) |
|
|
53 | (2) |
|
Choosing Programming Languages |
|
|
55 | (2) |
|
|
57 | (1) |
|
|
57 | (3) |
|
|
60 | (1) |
|
|
61 | (6) |
|
|
62 | (1) |
|
Benefits of Version Control |
|
|
63 | (1) |
|
Organizing Infra Projects |
|
|
64 | (2) |
|
|
66 | (1) |
|
|
67 | (12) |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
68 | (2) |
|
|
70 | (1) |
|
|
71 | (1) |
|
|
71 | (1) |
|
Explicit Testing Strategy |
|
|
72 | (3) |
|
Improving Your Tests; Learning from Failure |
|
|
75 | (2) |
|
|
77 | (1) |
|
|
77 | (2) |
|
8 Infrastructure Security |
|
|
79 | (10) |
|
What Is Infrastructure Security? |
|
|
79 | (1) |
|
Share Security Responsibilities |
|
|
80 | (2) |
|
|
82 | (2) |
|
Design for Security Operability |
|
|
84 | (2) |
|
Categorize Discovered Issues |
|
|
86 | (1) |
|
|
87 | (2) |
|
|
89 | (8) |
|
|
89 | (2) |
|
Dimensions of Documentation |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
92 | (1) |
|
|
93 | (1) |
|
Recommendations for Quality Documentation |
|
|
93 | (2) |
|
|
95 | (2) |
|
|
97 | (22) |
|
|
97 | (3) |
|
|
100 | (1) |
|
|
101 | (1) |
|
|
102 | (1) |
|
Case #1 Charts Are Worth a Thousand Words |
|
|
103 | (1) |
|
Case #2 Telling the Same Story with a Different Audience |
|
|
104 | (4) |
|
|
108 | (1) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
109 | (4) |
|
Recommended Visualization Practices |
|
|
113 | (1) |
|
|
114 | (5) |
|
Part III Assembling the System |
|
|
|
11 Scripting Infrastructure |
|
|
119 | (10) |
|
Why Script Your Infrastructure? |
|
|
119 | (2) |
|
Three Lenses to Model Your Infrastructure |
|
|
121 | (2) |
|
Code to Build Machine Images |
|
|
123 | (1) |
|
Code to Provision Infrastructure |
|
|
124 | (2) |
|
Code to Configure Infrastructure |
|
|
126 | (1) |
|
|
127 | (1) |
|
|
128 | (1) |
|
12 Managing Your Infrastructure |
|
|
129 | (14) |
|
|
129 | (5) |
|
Treating Your Infrastructure as Data |
|
|
134 | (1) |
|
Getting Started with Infrastructure Management |
|
|
135 | (3) |
|
|
138 | (1) |
|
|
138 | (1) |
|
Writing Integration Tests |
|
|
139 | (1) |
|
|
139 | (1) |
|
|
140 | (3) |
|
13 Securing Your Infrastructure |
|
|
143 | (18) |
|
|
144 | (1) |
|
Manage Identity and Access |
|
|
145 | (1) |
|
How Should You Control Access to Your System? |
|
|
146 | (1) |
|
Who Should Have Access to Your System? |
|
|
147 | (1) |
|
|
148 | (1) |
|
Password Managers and Secret Management Software |
|
|
149 | (1) |
|
Defending Secrets and Monitoring Usage |
|
|
150 | (1) |
|
Securing Your Computing Environment |
|
|
151 | (2) |
|
|
153 | (2) |
|
Security Recommendations for Your Infrastructure Management |
|
|
155 | (1) |
|
|
156 | (5) |
|
Part IV Monitoring the System |
|
|
|
|
161 | (10) |
|
|
161 | (2) |
|
How Do Monitoring and Observability Differ? |
|
|
163 | (1) |
|
Monitoring Building Blocks |
|
|
164 | (1) |
|
|
164 | (1) |
|
|
164 | (1) |
|
Data: Metrics, Logs, and Tracing |
|
|
165 | (1) |
|
|
165 | (1) |
|
|
166 | (1) |
|
|
166 | (1) |
|
|
167 | (1) |
|
|
167 | (1) |
|
|
168 | (1) |
|
|
168 | (1) |
|
|
169 | (2) |
|
15 Compute and Software Monitoring in Practice |
|
|
171 | (14) |
|
Identify Your Desired Outputs |
|
|
171 | (2) |
|
|
173 | (1) |
|
|
173 | (1) |
|
|
174 | (1) |
|
Plan for a Monitoring Project |
|
|
175 | (3) |
|
What Alerts Should You Set? |
|
|
178 | (2) |
|
Examine Monitoring Platforms |
|
|
180 | (1) |
|
Choose a Monitoring Tool or Platform |
|
|
181 | (2) |
|
|
183 | (2) |
|
16 Managing Monitoring Data |
|
|
185 | (8) |
|
|
186 | (1) |
|
|
186 | (1) |
|
|
187 | (1) |
|
|
187 | (1) |
|
|
188 | (1) |
|
|
188 | (1) |
|
|
189 | (1) |
|
|
190 | (1) |
|
|
190 | (1) |
|
|
191 | (1) |
|
|
192 | (1) |
|
|
193 | (12) |
|
Why Should You Monitor Your Work? |
|
|
193 | (2) |
|
Manage Your Work with Kanban |
|
|
195 | (3) |
|
|
198 | (2) |
|
Find the Interesting Information |
|
|
200 | (1) |
|
|
201 | (4) |
|
Part V Scaling the System |
|
|
|
|
205 | (14) |
|
|
205 | (1) |
|
The Capacity Management Model |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
208 | (1) |
|
|
209 | (5) |
|
|
214 | (1) |
|
The Framework for Capacity Planning |
|
|
214 | (2) |
|
Do You Need Capacity Planning with Cloud Computing? |
|
|
216 | (1) |
|
|
217 | (2) |
|
19 Developing On-Call Resilience |
|
|
219 | (16) |
|
|
219 | (1) |
|
|
220 | (1) |
|
Check Your On-Call Policies |
|
|
221 | (1) |
|
|
222 | (2) |
|
|
224 | (1) |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
227 | (2) |
|
|
229 | (1) |
|
Monitor the On-Call Experience |
|
|
230 | (3) |
|
|
233 | (2) |
|
|
235 | (16) |
|
|
236 | (1) |
|
What Is Incident Management? |
|
|
237 | (1) |
|
Planning and Preparing for Incidents |
|
|
238 | (1) |
|
Set Up and Document Communication Channels |
|
|
238 | (1) |
|
Train for Effective Communication |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
241 | (1) |
|
|
241 | (1) |
|
Clearly Define Roles and Responsibilities |
|
|
241 | (1) |
|
Understand Severity Levels and Escalation Protocols |
|
|
242 | (1) |
|
|
243 | (1) |
|
Learning from the Incident |
|
|
244 | (1) |
|
|
244 | (2) |
|
|
246 | (1) |
|
Documenting Incidents Effectively |
|
|
246 | (1) |
|
Distributing the Information |
|
|
247 | (1) |
|
|
248 | (1) |
|
|
248 | (3) |
|
21 Leading Sustainable Teams |
|
|
251 | (16) |
|
|
251 | (2) |
|
Adopt a Whole-Team Approach |
|
|
253 | (1) |
|
Build Resilient On-Call Teams |
|
|
253 | (2) |
|
|
255 | (1) |
|
|
256 | (1) |
|
|
256 | (2) |
|
|
258 | (3) |
|
Measure Impact on the Team |
|
|
261 | (2) |
|
Support Team Infrastructure with Documentation |
|
|
263 | (1) |
|
Budget a Learning Culture |
|
|
264 | (1) |
|
|
265 | (1) |
|
|
266 | (1) |
Conclusion |
|
267 | (2) |
A Protocols in Practice |
|
269 | (8) |
B Resolving Test Failures |
|
277 | (6) |
Index |
|
283 | |