Preface |
|
xv | |
Acknowledgments |
|
xvii | |
About This Book |
|
xix | |
About The Authors |
|
xxiii | |
Part 1 Getting Started With Streaming |
|
1 | (152) |
|
1 Welcome to Grokking Streaming Systems |
|
|
3 | (18) |
|
What is stream processing? |
|
|
4 | (1) |
|
Streaming system examples |
|
|
5 | (1) |
|
Streaming systems and real time |
|
|
6 | (1) |
|
How a streaming system works |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
Inside a batch processing system |
|
|
12 | (1) |
|
Stream processing systems |
|
|
13 | (1) |
|
Inside a stream processing system |
|
|
14 | (1) |
|
The advantages of multi-stage architecture |
|
|
15 | (1) |
|
The multi-stage architecture in batch and stream processing systems |
|
|
16 | (1) |
|
|
17 | (1) |
|
A model stream processing system |
|
|
18 | (3) |
|
2 Hello, streaming systems! |
|
|
21 | (32) |
|
The chief needs a fancy tollbooth |
|
|
22 | (1) |
|
It started as HTTP requests, and it failed |
|
|
23 | (1) |
|
AJ and Miranda take time to reflect |
|
|
24 | (1) |
|
AJ ponders about streaming systems |
|
|
25 | (1) |
|
Comparing backend service and streaming |
|
|
26 | (1) |
|
How a streaming system could fit |
|
|
27 | (1) |
|
Queues: A foundational concept |
|
|
28 | (1) |
|
|
29 | (1) |
|
Our streaming framework (the start of it) |
|
|
30 | (1) |
|
The Streamwork framework overview |
|
|
31 | (1) |
|
Zooming in on the Streamwork engine |
|
|
32 | (1) |
|
|
33 | (1) |
|
More details of the concepts |
|
|
34 | (1) |
|
The streaming job execution flow |
|
|
35 | (1) |
|
|
36 | (6) |
|
|
42 | (1) |
|
Inspecting the job execution |
|
|
43 | (1) |
|
|
44 | (4) |
|
|
48 | (1) |
|
The life of a data element |
|
|
49 | (1) |
|
Reviewing streaming concepts |
|
|
50 | (3) |
|
3 Parallelization and data grouping |
|
|
53 | (28) |
|
The sensor is emitting more events |
|
|
54 | (1) |
|
Even in streaming, real time is hard |
|
|
55 | (1) |
|
New concepts: Parallelism is important |
|
|
56 | (1) |
|
New concepts: Data parallelism |
|
|
57 | (1) |
|
New concepts: Data execution independence |
|
|
58 | (1) |
|
New concepts: Task parallelism |
|
|
59 | (1) |
|
Data parallelism vs. task parallelism |
|
|
60 | (1) |
|
Parallelism and concurrency |
|
|
61 | (1) |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
64 | (1) |
|
|
65 | (1) |
|
|
66 | (1) |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (1) |
|
|
71 | (1) |
|
Shuffle grouping: Under the hood |
|
|
72 | (1) |
|
|
73 | (1) |
|
Fields grouping: Under the hood |
|
|
74 | (1) |
|
|
75 | (1) |
|
Look inside the engine: Event dispatcher |
|
|
76 | (1) |
|
Applying fields grouping in your job |
|
|
77 | (1) |
|
|
78 | (1) |
|
Comparing grouping behaviors |
|
|
79 | (2) |
|
|
81 | (28) |
|
A credit card fraud detection system |
|
|
82 | (1) |
|
More about the credit card fraud detection system |
|
|
83 | (1) |
|
The fraud detection business |
|
|
84 | (1) |
|
Streaming isn't always a straight line |
|
|
85 | (1) |
|
|
86 | (1) |
|
The fraud detection job in detail |
|
|
87 | (1) |
|
|
88 | (1) |
|
Upstream and downstream components |
|
|
89 | (1) |
|
Stream fan-out and fan-in |
|
|
90 | (1) |
|
Graph, directed graph, and DAG |
|
|
91 | (1) |
|
DAG in stream processing systems |
|
|
92 | (1) |
|
All new concepts in one page |
|
|
93 | (1) |
|
Stream fan-out to the analyzers |
|
|
94 | (1) |
|
|
95 | (1) |
|
There is a problem: Efficiency |
|
|
96 | (1) |
|
Stream fan-out with different streams |
|
|
97 | (1) |
|
Look inside the engine again |
|
|
98 | (1) |
|
Communication between the components via channels |
|
|
99 | (1) |
|
|
100 | (1) |
|
Stream fan-in to the score aggregator |
|
|
101 | (1) |
|
Stream fan-in in the engine |
|
|
102 | (1) |
|
A brief introduction to another stream fan-in: Join |
|
|
103 | (1) |
|
|
104 | (1) |
|
|
105 | (1) |
|
|
106 | (3) |
|
|
109 | (32) |
|
The latency requirement of the fraud detection system |
|
|
110 | (1) |
|
Revisit the fraud detection job |
|
|
111 | (1) |
|
|
112 | (1) |
|
|
113 | (1) |
|
A new streaming job to monitor system usage |
|
|
114 | (1) |
|
|
115 | (1) |
|
The requirements of the new system usage job |
|
|
116 | (1) |
|
New concepts: (The number of) times delivered and times processed |
|
|
117 | (1) |
|
New concept: Delivery semantics |
|
|
118 | (1) |
|
Choosing the right semantics |
|
|
119 | (1) |
|
|
120 | (1) |
|
|
121 | (1) |
|
|
122 | (1) |
|
At-least-once with acknowledging |
|
|
123 | (1) |
|
|
124 | (1) |
|
Handle event processing failures |
|
|
125 | (1) |
|
|
126 | (1) |
|
Acknowledging code in components |
|
|
127 | (1) |
|
New concept: Checkpointing |
|
|
128 | (1) |
|
|
129 | (1) |
|
Checkpointing in the system usage job for the at-least-once semantic |
|
|
130 | (1) |
|
Checkpointing and state manipulation functions |
|
|
131 | (1) |
|
State handling code in the transaction source component |
|
|
132 | (1) |
|
Exactly-once or effectively-once? |
|
|
133 | (1) |
|
Bonus concept: Idempotent operation |
|
|
134 | (1) |
|
|
135 | (1) |
|
State handling code in the system usage analyzer component |
|
|
136 | (1) |
|
Comparing the delivery semantics again |
|
|
137 | (2) |
|
|
139 | (2) |
|
6 Streaming systems review and a glimpse ahead |
|
|
141 | (12) |
|
|
142 | (1) |
|
Parallelization and event grouping |
|
|
143 | (1) |
|
|
144 | (1) |
|
Delivery semantics (guarantees) |
|
|
145 | (1) |
|
Delivery semantics used in the credit card fraud detection system |
|
|
146 | (1) |
|
Which way to go from here |
|
|
147 | (1) |
|
|
148 | (1) |
|
Joining data in real time |
|
|
149 | (1) |
|
|
150 | (1) |
|
Stateless and stateful computations |
|
|
151 | (2) |
Part 2 Stepping Up |
|
153 | (126) |
|
|
155 | (30) |
|
Slicing up real-time data |
|
|
156 | (1) |
|
Breaking down the problem in detail |
|
|
157 | (1) |
|
Breaking down the problem in detail (continued) |
|
|
158 | (1) |
|
|
159 | (1) |
|
Windowing in the fraud detection job |
|
|
160 | (1) |
|
What exactly are windows? |
|
|
161 | (1) |
|
Looking closer into the window |
|
|
162 | (1) |
|
New concept: Windowing strategy |
|
|
163 | (1) |
|
|
164 | (1) |
|
Fixed windows in the windowed proximity analyzer |
|
|
165 | (1) |
|
Detecting fraud with a fixed time window |
|
|
166 | (1) |
|
Fixed windows: Time vs. count |
|
|
167 | (1) |
|
|
168 | (1) |
|
Sliding windows: Windowed proximity analyzer |
|
|
169 | (1) |
|
Detecting fraud with a sliding window |
|
|
170 | (1) |
|
|
171 | (1) |
|
Session windows (continued) |
|
|
172 | (1) |
|
Detecting fraud with session windows |
|
|
173 | (1) |
|
Summary of windowing strategies |
|
|
174 | (1) |
|
Slicing an event stream into data sets |
|
|
175 | (1) |
|
Windowing: Concept or implementation |
|
|
176 | (1) |
|
|
177 | (1) |
|
|
178 | (1) |
|
Implement the windowed proximity analyzer |
|
|
179 | (1) |
|
Event time and other times for events |
|
|
180 | (1) |
|
|
181 | (1) |
|
|
182 | (3) |
|
|
185 | (26) |
|
Joining emission data on the fly |
|
|
186 | (1) |
|
The emissions job version 1 |
|
|
187 | (1) |
|
|
188 | (1) |
|
Accuracy becomes an issue |
|
|
189 | (1) |
|
The enhanced emissions job |
|
|
190 | (1) |
|
|
191 | (1) |
|
|
192 | (1) |
|
How the stream join works |
|
|
193 | (1) |
|
Stream join is a different kind of fan-in |
|
|
194 | (1) |
|
Vehicle events vs. temperature events |
|
|
195 | (1) |
|
Table: A materialized view of streaming |
|
|
196 | (1) |
|
Vehicle events are less efficient to be materialized |
|
|
197 | (1) |
|
Data integrity quickly became an issue |
|
|
198 | (1) |
|
What's the problem with this join operator? |
|
|
199 | (1) |
|
|
200 | (1) |
|
|
201 | (1) |
|
The inner join vs. outer join |
|
|
202 | (1) |
|
|
203 | (1) |
|
Outer joins in streaming systems |
|
|
204 | (1) |
|
A new issue: Weak connection |
|
|
205 | (1) |
|
|
206 | (1) |
|
Joining two tables instead of joining a stream and table |
|
|
207 | (1) |
|
Revisiting the materialized view |
|
|
208 | (3) |
|
|
211 | (24) |
|
|
212 | (1) |
|
|
213 | (1) |
|
Streamlining streaming jobs |
|
|
214 | (1) |
|
New concepts: Capacity, utilization, and headroom |
|
|
215 | (1) |
|
More about utilization and headroom |
|
|
216 | (1) |
|
New concept: Backpressure |
|
|
217 | (1) |
|
Measure capacity utilization |
|
|
218 | (1) |
|
Backpressure in the Streamwork engine |
|
|
219 | (1) |
|
Backpressure in the Streamwork engine: Propagation |
|
|
220 | (1) |
|
Our streaming job during a backpressure |
|
|
221 | (1) |
|
Backpressure in distributed systems |
|
|
222 | (5) |
|
New concept: Backpressure watermarks |
|
|
227 | (1) |
|
Another approach to handle lagging instances: Dropping events |
|
|
228 | (1) |
|
Why do we want to drop events? |
|
|
229 | (1) |
|
Backpressure could be a symptom when the underlying issue is permanent |
|
|
230 | (1) |
|
Stopping and resuming may lead to thrashing if the issue is permanent |
|
|
231 | (1) |
|
|
232 | (3) |
|
|
235 | (24) |
|
The migration of the streaming jobs |
|
|
236 | (1) |
|
Stateful components in the system usage job |
|
|
237 | (1) |
|
|
238 | (1) |
|
The states in different components |
|
|
239 | (1) |
|
State data vs. temporary data |
|
|
240 | (1) |
|
Stateful vs. stateless components: The code |
|
|
241 | (1) |
|
The stateful source and operator in the system usage job |
|
|
242 | (1) |
|
|
243 | (1) |
|
Checkpoint creation: Timing is hard |
|
|
244 | (1) |
|
|
245 | (1) |
|
Creating checkpoints with checkpoint events |
|
|
246 | (1) |
|
A checkpoint event is handled by instance executors |
|
|
247 | (1) |
|
A checkpoint event flowing through a job |
|
|
248 | (1) |
|
Creating checkpoints with checkpoint events at the instance level |
|
|
249 | (1) |
|
Checkpoint event synchronization |
|
|
250 | (1) |
|
Checkpoint loading and backward compatibility |
|
|
251 | (1) |
|
|
252 | (1) |
|
Stateful vs. stateless components |
|
|
253 | (1) |
|
Manually managed instance states |
|
|
254 | (1) |
|
|
255 | (4) |
|
11 Wrap-up: Advanced concepts in streaming systems |
|
|
259 | (20) |
|
|
260 | (1) |
|
|
261 | (1) |
|
|
262 | (1) |
|
Joining data in real time |
|
|
263 | (1) |
|
|
264 | (1) |
|
Inner joins vs. outer joins |
|
|
265 | (1) |
|
Unexpected things can happen in streaming systems |
|
|
266 | (1) |
|
Backpressure: Slow down sources or upstream components |
|
|
267 | (1) |
|
Another approach to handle lagging instances: Dropping events |
|
|
268 | (1) |
|
Backpressure can be a symptom when the underlying issue is permanent |
|
|
269 | (1) |
|
Stateful components with checkpoints |
|
|
270 | (1) |
|
|
271 | (1) |
|
Stateful vs. stateless components |
|
|
272 | (1) |
|
|
273 | (2) |
|
Key concepts covered in this book |
|
|
275 | (4) |
Index |
|
279 | |