Muutke küpsiste eelistusi

Fundamentals of Stream Processing: Application Design, Systems, and Analytics [Kõva köide]

, (Bilkent University, Ankara),
  • Formaat: Hardback, 558 pages, kõrgus x laius x paksus: 244x170x30 mm, kaal: 1130 g, Worked examples or Exercises; 17 Tables, black and white; 191 Line drawings, unspecified
  • Ilmumisaeg: 13-Feb-2014
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1107015545
  • ISBN-13: 9781107015548
  • Formaat: Hardback, 558 pages, kõrgus x laius x paksus: 244x170x30 mm, kaal: 1130 g, Worked examples or Exercises; 17 Tables, black and white; 191 Line drawings, unspecified
  • Ilmumisaeg: 13-Feb-2014
  • Kirjastus: Cambridge University Press
  • ISBN-10: 1107015545
  • ISBN-13: 9781107015548
Stream processing is a novel distributed computing paradigm that supports the gathering, processing and analysis of high-volume, heterogeneous, continuous data streams, to extract insights and actionable results in real time. This comprehensive, hands-on guide combining the fundamental building blocks and emerging research in stream processing is ideal for application designers, system builders, analytic developers, as well as students and researchers in the field. This book introduces the key components of the stream computing paradigm, including the distributed system infrastructure, the programming model, design patterns and streaming analytics. The explanation of the underlying theoretical principles, illustrative examples and implementations using the IBM InfoSphere Streams SPL language and real-world case studies provide students and practitioners with a comprehensive understanding of such applications and the middleware that supports them.

Muu info

This book teaches fundamentals of stream processing, covering application design, distributed systems infrastructure, and continuous analytic algorithms.
Preface xiii
Foreword xix
Acknowledgements xxi
List of acronyms xxii
Part I Fundamentals 1(74)
1 What brought us here?
3(30)
1.1 Overview
3(1)
1.2 Towards continuous data processing: the requirements
3(3)
1.3 Stream processing foundations
6(16)
1.3.1 Data management technologies
8(5)
1.3.2 Parallel and distributed systems
13(3)
1.3.3 Signal processing, statistics, and data mining
16(2)
1.3.4 Optimization theory
18(4)
1.4 Stream processing - tying it all together
22(2)
References
24(9)
2 Introduction to stream processing
33(42)
2.1 Overview
33(1)
2.2 Stream Processing Applications
33(7)
2.2.1 Network monitoring for cybersecurity
34(2)
2.2.2 Transportation grid monitoring and optimization
36(2)
2.2.3 Healthcare and patient monitoring
38(2)
2.2.4 Discussion
40(1)
2.3 Information flow processing technologies
40(5)
2.3.1 Active databases
41(1)
2.3.2 Continuous queries
42(1)
2.3.3 Publish-subscribe systems
42(1)
2.3.4 Complex event processing systems
43(1)
2.3.5 ETL and SCADA systems
44(1)
2.4 Stream Processing Systems
45(23)
2.4.1 Data
45(4)
2.4.2 Processing
49(4)
2.4.3 System architecture
53(3)
2.4.4 Implementations
56(10)
2.4.5 Discussion
66(2)
2.5 Concluding remarks
68(1)
2.6 Exercises
69(1)
References
70(5)
Part II Application development 75(126)
3 Application development - the basics
77(29)
3.1 Overview
77(1)
3.2 Characteristics of SPAS
77(3)
3.3 Stream processing languages
80(6)
3.3.1 Features of stream processing languages
80(3)
3.3.2 Approaches to stream processing language design
83(3)
3.4 Introduction to SPL
86(6)
3.4.1 Language origins
86(1)
3.4.2 A "Hello World" application in SPL
87(5)
3.5 Common stream processing operators
92(9)
3.5.1 Stream relational operators
92(4)
3.5.2 Utility operators
96(1)
3.5.3 Edge adapter operators
97(4)
3.6 Concluding remarks
101(1)
3.7 Programming exercises
101(2)
References
103(3)
4 Application development - data flow programming
106(42)
4.1 Overview
106(1)
4.2 Flow composition
106(22)
4.2.1 Static composition
108(4)
4.2.2 Dynamic composition
112(10)
4.2.3 Nested composition
122(6)
4.3 Flow manipulation
128(16)
4.3.1 Operator state
128(3)
4.3.2 Selectivity and arity
131(1)
4.3.3 Using parameters
132(2)
4.3.4 Output assignments and output functions
134(2)
4.3.5 Punctuations
136(2)
4.3.6 Windowing
138(6)
4.4 Concluding remarks
144(1)
4.5 Programming exercises
144(3)
References
147(1)
5 Large-scale development - modularity, extensibility, and distribution
148(30)
5.1 Overview
148(1)
5.2 Modularity and extensibility
148(16)
5.2.1 Types
149(2)
5.2.2 Functions
151(2)
5.2.3 Primitive operators
153(8)
5.2.4 Composite and custom operators
161(3)
5.3 Distributed programming
164(8)
5.3.1 Logical versus physical flow graphs
164(2)
5.3.2 Placement
166(4)
5.3.3 Transport
170(2)
5.4 Concluding remarks
172(1)
5.5 Programming exercises
173(3)
References
176(2)
6 Visualization and debugging
178(23)
6.1 Overview
178(1)
6.2 Visualization
178(10)
6.2.1 Topology visualization
179(5)
6.2.2 Metrics visualization
184(1)
6.2.3 Status visualization
185(1)
6.2.4 Data visualization
186(2)
6.3 Debugging
188(11)
6.3.1 Semantic debugging
189(5)
6.3.2 User-defined operator debugging
194(1)
6.3.3 Deployment debugging
194(1)
6.3.4 Performance debugging
195(4)
6.4 Concluding remarks
199(1)
References
200(1)
Part III System architecture 201(72)
7 Architecture of a stream processing system
203(15)
7.1 Overview
203(1)
7.2 Architectural building blocks
203(4)
7.2.1 Computational environment
204(1)
7.2.2 Entities
204(2)
7.2.3 Services
206(1)
7.3 Architecture overview
207(8)
7.3.1 Job management
207(1)
7.3.2 Resource management
208(1)
7.3.3 Scheduling
209(1)
7.3.4 Monitoring
210(1)
7.3.5 Data transport
211(1)
7.3.6 Fault tolerance
212(1)
7.3.7 Logging and error reporting
213(1)
7.3.8 Security and access control
213(1)
7.3.9 Debugging
214(1)
7.3.10 Visualization
214(1)
7.4 Interaction with the system architecture
215(1)
7.5 Concluding remarks
215(1)
References
215(3)
8 InfoSphere Streams architecture
218(55)
8.1 Overview
218(1)
8.2 Background and history
218(1)
8.3 A user's perspective
219(1)
8.4 Components
220(12)
8.4.1 Runtime instance
222(1)
8.4.2 Instance components
223(4)
8.4.3 Instance backbone
227(2)
8.4.4 Tooling
229(3)
8.5 Services
232(36)
8.5.1 Job management
232(4)
8.5.2 Resource management and monitoring
236(3)
8.5.3 Scheduling
239(2)
8.5.4 Data transport
241(6)
8.5.5 Fault tolerance
247(1)
8.5.6 Logging, tracing, and error reporting
248(3)
8.5.7 Security and access control
251(5)
8.5.8 Application development support
256(3)
8.5.9 Processing element
259(5)
8.5.10 Debugging
264(3)
8.5.11 Visualization
267(1)
8.6 Concluding remarks
268(2)
References
270(3)
Part IV Application design and analytics 273(166)
9 Design principles and patterns for stream processing applications
275(67)
9.1 Overview
275(1)
9.2 Functional design patterns and principles
275(35)
9.2.1 Edge adaptation
275(12)
9.2.2 Flow manipulation
287(14)
9.2.3 Dynamic adaptation
301(9)
9.3 Non-functional principles and design patterns
310(29)
9.3.1 Application design and composition
310(4)
9.3.2 Parallelization
314(11)
9.3.3 Performance optimization
325(8)
9.3.4 Fault tolerance
333(6)
9.4 Concluding remarks
339(1)
References
339(3)
10 Stream analytics: data pre-processing and transformation
342(46)
10.1 Overview
342(1)
10.2 The mining process
342(2)
10.3 Notation
344(1)
10.4 Descriptive statistics
345(8)
10.4.1 Illustrative technique: BasicCounting
348(5)
10.4.2 Advanced reading
353(1)
10.5 Sampling
353(5)
10.5.1 Illustrative technique: reservoir sampling
356(1)
10.5.2 Advanced reading
357(1)
10.6 Sketches
358(5)
10.6.1 Illustrative technique: Count-Min sketch
360(3)
10.6.2 Advanced reading
363(1)
10.7 Quantization
363(7)
10.7.1 Illustrative techniques: binary clipping and moment preserving quantization
366(3)
10.7.2 Advanced reading
369(1)
10.8 Dimensionality reduction
370(5)
10.8.1 Illustrative technique: SPIRIT
373(2)
10.8.2 Advanced reading
375(1)
10.9 Transforms
375(8)
10.9.1 Illustrative technique: the Haar transform
379(4)
10.9.2 Advanced reading
383(1)
10.10 Concluding remarks
383(1)
References
383(5)
11 Stream analytics: modeling and evaluation
388(51)
11.1 Overview
388(1)
11.2 Offline modeling and online evaluation
389(5)
11.3 Data stream classification
394(9)
11.3.1 Illustrative technique: VFDT
398(4)
11.3.2 Advanced reading
402(1)
11.4 Data stream clustering
403(11)
11.4.1 Illustrative technique: CluStream microclustering
409(4)
11.4.2 Advanced reading
413(1)
11.5 Data stream regression
414(6)
11.5.1 Illustrative technique: linear regression with SGD
417(2)
11.5.2 Advanced reading
419(1)
11.6 Data stream frequent pattern mining
420(7)
11.6.1 Illustrative technique: lossy counting
425(1)
11.6.2 Advanced reading
426(1)
11.7 Anomaly detection
427(6)
11.7.1 Illustrative technique: micro-clustering-based anomaly detection
432(1)
11.7.2 Advanced reading
432(1)
11.8 Concluding remarks
433(1)
References
433(6)
Part V Case studies 439(46)
12 Applications
441(44)
12.1 Overview
441(1)
12.2 The Operations Monitoring application
442(12)
12.2.1 Motivation
442(1)
12.2.2 Requirements
443(2)
12.2.3 Design
445(6)
12.2.4 Analytics
451(2)
12.2.5 Fault tolerance
453(1)
12.3 The Patient Monitoring application
454(13)
12.3.1 Motivation
454(1)
12.3.2 Requirements
455(1)
12.3.3 Design
456(7)
12.3.4 Evaluation
463(4)
12.4 The Semiconductor Process Control application
467(15)
12.4.1 Motivation
467(2)
12.4.2 Requirements
469(3)
12.4.3 Design
472(7)
12.4.4 Evaluation
479(2)
12.4.5 User interface
481(1)
12.5 Concluding remarks
482(1)
References
482(3)
Part VI Closing notes 485(15)
13 Conclusion
487(13)
13.1 Book summary
487(1)
13.2 Challenges and open problems
488(8)
13.2.1 Software engineering
488(3)
13.2.2 Integration
491(2)
13.2.3 Scaling up and distributed computing
493(2)
13.2.4 Analytics
495(1)
13.3 Where do we go from here?
496(1)
References
497(3)
Keywords and identifiers index 500(4)
Index 504