Muutke küpsiste eelistusi

E-raamat: Grokking Streaming Systems: Real-time event processing

  • Formaat: 312 pages
  • Ilmumisaeg: 19-Apr-2022
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638356493
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 51,64 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 312 pages
  • Ilmumisaeg: 19-Apr-2022
  • Kirjastus: Manning Publications
  • Keel: eng
  • ISBN-13: 9781638356493
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own!

In Grokking Streaming Systems you will learn how to:

    Implement and troubleshoot streaming systems
    Design streaming systems for complex functionalities
    Assess parallelization requirements
    Spot networking bottlenecks and resolve back pressure
    Group data for high-performance systems
    Handle delayed events in real-time systems

Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that’s a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities!

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand.

About the book
Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you’ll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services.

What's inside

    Implement and troubleshoot streaming systems
    Design streaming systems for complex functionalities
    Spot networking bottlenecks and resolve backpressure
    Group data for high-performance systems

About the reader
No prior experience with streaming systems is assumed. Examples in Java.

About the author
Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine.

Table of Contents
PART 1 GETTING STARTED WITH STREAMING
1 Welcome to Grokking Streaming Systems
2 Hello, streaming systems!
3 Parallelization and data grouping
4 Stream graph
5 Delivery semantics
6 Streaming systems review and a glimpse ahead
PART 2 STEPPING UP
7 Windowed computations
8 Join operations
9 Backpressure
10 Stateful computation
11 Wrap-up: Advanced concepts in streaming systems

Arvustused

A must-read for engineers and solution architects who work with streams on a regular basis. Damian Esteban

This is the best introduction to streaming systems I've read! KentR. Spillner

For any newcomer starting with streaming systems, this is yourbook. Miguel Montalvo

Provides a concise, complete, and engaging introduction todesigning, implementing, and maintaining streaming systems. Chris Lundberg

A highly approachable book for people eager to learn about streaming systems. Beau Bender

Provides a lot of useful information that I'd wish to have had when I first started working with Event Based systems. Sebastián Palma

A refreshing, framework-neutral approach to teaching that is very effective. Anupam Sengupta

Preface xv
Acknowledgments xvii
About This Book xix
About The Authors xxiii
Part 1 Getting Started With Streaming 1(152)
1 Welcome to Grokking Streaming Systems
3(18)
What is stream processing?
4(1)
Streaming system examples
5(1)
Streaming systems and real time
6(1)
How a streaming system works
7(1)
Applications
8(1)
Backend services
9(1)
Inside a backend service
10(1)
Batch processing systems
11(1)
Inside a batch processing system
12(1)
Stream processing systems
13(1)
Inside a stream processing system
14(1)
The advantages of multi-stage architecture
15(1)
The multi-stage architecture in batch and stream processing systems
16(1)
Compare the systems
17(1)
A model stream processing system
18(3)
2 Hello, streaming systems!
21(32)
The chief needs a fancy tollbooth
22(1)
It started as HTTP requests, and it failed
23(1)
AJ and Miranda take time to reflect
24(1)
AJ ponders about streaming systems
25(1)
Comparing backend service and streaming
26(1)
How a streaming system could fit
27(1)
Queues: A foundational concept
28(1)
Data transfer via queues
29(1)
Our streaming framework (the start of it)
30(1)
The Streamwork framework overview
31(1)
Zooming in on the Streamwork engine
32(1)
Core streaming concepts
33(1)
More details of the concepts
34(1)
The streaming job execution flow
35(1)
Your first streaming job
36(6)
Executing the job
42(1)
Inspecting the job execution
43(1)
Look inside the engine
44(4)
Keep events moving
48(1)
The life of a data element
49(1)
Reviewing streaming concepts
50(3)
3 Parallelization and data grouping
53(28)
The sensor is emitting more events
54(1)
Even in streaming, real time is hard
55(1)
New concepts: Parallelism is important
56(1)
New concepts: Data parallelism
57(1)
New concepts: Data execution independence
58(1)
New concepts: Task parallelism
59(1)
Data parallelism vs. task parallelism
60(1)
Parallelism and concurrency
61(1)
Parallelizing the job
62(1)
Parallelizing components
63(1)
Parallelizing sources
64(1)
Viewing job output
65(1)
Parallelizing operators
66(1)
Viewing job output
67(1)
Events and instances
68(1)
Event ordering
69(1)
Event grouping
70(1)
Shuffle grouping
71(1)
Shuffle grouping: Under the hood
72(1)
Fields grouping
73(1)
Fields grouping: Under the hood
74(1)
Event grouping execution
75(1)
Look inside the engine: Event dispatcher
76(1)
Applying fields grouping in your job
77(1)
Event ordering
78(1)
Comparing grouping behaviors
79(2)
4 Stream graph
81(28)
A credit card fraud detection system
82(1)
More about the credit card fraud detection system
83(1)
The fraud detection business
84(1)
Streaming isn't always a straight line
85(1)
Zoom into the system
86(1)
The fraud detection job in detail
87(1)
New concepts
88(1)
Upstream and downstream components
89(1)
Stream fan-out and fan-in
90(1)
Graph, directed graph, and DAG
91(1)
DAG in stream processing systems
92(1)
All new concepts in one page
93(1)
Stream fan-out to the analyzers
94(1)
Look inside the engine
95(1)
There is a problem: Efficiency
96(1)
Stream fan-out with different streams
97(1)
Look inside the engine again
98(1)
Communication between the components via channels
99(1)
Multiple channels
100(1)
Stream fan-in to the score aggregator
101(1)
Stream fan-in in the engine
102(1)
A brief introduction to another stream fan-in: Join
103(1)
Look at the whole system
104(1)
Graph and streaming jobs
105(1)
The example systems
106(3)
5 Delivery semantics
109(32)
The latency requirement of the fraud detection system
110(1)
Revisit the fraud detection job
111(1)
About accuracy
112(1)
Partial result
113(1)
A new streaming job to monitor system usage
114(1)
The new system usage job
115(1)
The requirements of the new system usage job
116(1)
New concepts: (The number of) times delivered and times processed
117(1)
New concept: Delivery semantics
118(1)
Choosing the right semantics
119(1)
At-most-once
120(1)
The fraud detection job
121(1)
At-least-once
122(1)
At-least-once with acknowledging
123(1)
Track events
124(1)
Handle event processing failures
125(1)
Track early out events
126(1)
Acknowledging code in components
127(1)
New concept: Checkpointing
128(1)
New concept: State
129(1)
Checkpointing in the system usage job for the at-least-once semantic
130(1)
Checkpointing and state manipulation functions
131(1)
State handling code in the transaction source component
132(1)
Exactly-once or effectively-once?
133(1)
Bonus concept: Idempotent operation
134(1)
Exactly-once, finally
135(1)
State handling code in the system usage analyzer component
136(1)
Comparing the delivery semantics again
137(2)
Up next ...
139(2)
6 Streaming systems review and a glimpse ahead
141(12)
Streaming system pieces
142(1)
Parallelization and event grouping
143(1)
DAGs and streaming jobs
144(1)
Delivery semantics (guarantees)
145(1)
Delivery semantics used in the credit card fraud detection system
146(1)
Which way to go from here
147(1)
Windowed computations
148(1)
Joining data in real time
149(1)
Backpressure
150(1)
Stateless and stateful computations
151(2)
Part 2 Stepping Up 153(126)
7 Windowed computations
155(30)
Slicing up real-time data
156(1)
Breaking down the problem in detail
157(1)
Breaking down the problem in detail (continued)
158(1)
Two different contexts
159(1)
Windowing in the fraud detection job
160(1)
What exactly are windows?
161(1)
Looking closer into the window
162(1)
New concept: Windowing strategy
163(1)
Fixed windows
164(1)
Fixed windows in the windowed proximity analyzer
165(1)
Detecting fraud with a fixed time window
166(1)
Fixed windows: Time vs. count
167(1)
Sliding windows
168(1)
Sliding windows: Windowed proximity analyzer
169(1)
Detecting fraud with a sliding window
170(1)
Session windows
171(1)
Session windows (continued)
172(1)
Detecting fraud with session windows
173(1)
Summary of windowing strategies
174(1)
Slicing an event stream into data sets
175(1)
Windowing: Concept or implementation
176(1)
Another look
177(1)
Key-value store 101
178(1)
Implement the windowed proximity analyzer
179(1)
Event time and other times for events
180(1)
Windowing watermark
181(1)
Late events
182(3)
8 Join operations
185(26)
Joining emission data on the fly
186(1)
The emissions job version 1
187(1)
The emission resolver
188(1)
Accuracy becomes an issue
189(1)
The enhanced emissions job
190(1)
Focusing on the join
191(1)
What is a join again?
192(1)
How the stream join works
193(1)
Stream join is a different kind of fan-in
194(1)
Vehicle events vs. temperature events
195(1)
Table: A materialized view of streaming
196(1)
Vehicle events are less efficient to be materialized
197(1)
Data integrity quickly became an issue
198(1)
What's the problem with this join operator?
199(1)
Inner join
200(1)
Outer join
201(1)
The inner join vs. outer join
202(1)
Different types of joins
203(1)
Outer joins in streaming systems
204(1)
A new issue: Weak connection
205(1)
Windowed joins
206(1)
Joining two tables instead of joining a stream and table
207(1)
Revisiting the materialized view
208(3)
9 Backpressure
211(24)
Reliability is critical
212(1)
Review the system
213(1)
Streamlining streaming jobs
214(1)
New concepts: Capacity, utilization, and headroom
215(1)
More about utilization and headroom
216(1)
New concept: Backpressure
217(1)
Measure capacity utilization
218(1)
Backpressure in the Streamwork engine
219(1)
Backpressure in the Streamwork engine: Propagation
220(1)
Our streaming job during a backpressure
221(1)
Backpressure in distributed systems
222(5)
New concept: Backpressure watermarks
227(1)
Another approach to handle lagging instances: Dropping events
228(1)
Why do we want to drop events?
229(1)
Backpressure could be a symptom when the underlying issue is permanent
230(1)
Stopping and resuming may lead to thrashing if the issue is permanent
231(1)
Handle thrashing
232(3)
10 Stateful computation
235(24)
The migration of the streaming jobs
236(1)
Stateful components in the system usage job
237(1)
Revisit: State
238(1)
The states in different components
239(1)
State data vs. temporary data
240(1)
Stateful vs. stateless components: The code
241(1)
The stateful source and operator in the system usage job
242(1)
States and checkpoints
243(1)
Checkpoint creation: Timing is hard
244(1)
Event-based timing
245(1)
Creating checkpoints with checkpoint events
246(1)
A checkpoint event is handled by instance executors
247(1)
A checkpoint event flowing through a job
248(1)
Creating checkpoints with checkpoint events at the instance level
249(1)
Checkpoint event synchronization
250(1)
Checkpoint loading and backward compatibility
251(1)
Checkpoint storage
252(1)
Stateful vs. stateless components
253(1)
Manually managed instance states
254(1)
Lambda architecture
255(4)
11 Wrap-up: Advanced concepts in streaming systems
259(20)
Is this really the end?
260(1)
Windowed computations
261(1)
The major window types
262(1)
Joining data in real time
263(1)
SQL vs. stream joins
264(1)
Inner joins vs. outer joins
265(1)
Unexpected things can happen in streaming systems
266(1)
Backpressure: Slow down sources or upstream components
267(1)
Another approach to handle lagging instances: Dropping events
268(1)
Backpressure can be a symptom when the underlying issue is permanent
269(1)
Stateful components with checkpoints
270(1)
Event-based timing
271(1)
Stateful vs. stateless components
272(1)
You did it!
273(2)
Key concepts covered in this book
275(4)
Index 279
Josh Fischer and Ning Wang are both Apache Committers, and part of the project management committee for the Apache Heron distributed stream processing engine. Josh is a software engineer at Scotcro and has worked with moving large datasets in real time for organizations such as 1904 labsand Bayer. Ning is a software engineer at Amplitude building real-time data pipelines. He was a key contributor of Apache Heron in Twitter's Real-time Compute team.