About the editors |
|
xiii | |
About the contributors |
|
xv | |
Foreword |
|
xxi | |
Foreword |
|
xxiii | |
Preface |
|
xxv | |
Acknowledgements |
|
xxvii | |
Introduction |
|
xxix | |
1 The impact of Big Data on databases |
|
1 | (36) |
|
|
1.1 The Big Data phenomenon |
|
|
2 | (4) |
|
1.1.1 Big Data Operational and Big Data Analytical |
|
|
3 | (2) |
|
1.1.2 The impact of Big Data on databases |
|
|
5 | (1) |
|
1.2 Scalability in relational databases |
|
|
6 | (3) |
|
1.2.1 Relational databases |
|
|
6 | (1) |
|
1.2.2 The limitations of relational databases |
|
|
7 | (2) |
|
|
9 | (6) |
|
1.3.1 Disadvantages of NoSQL databases |
|
|
11 | (1) |
|
1.3.2 Aggregate-oriented NoSQL databases |
|
|
12 | (1) |
|
1.3.3 MongoDB: an example of documentary database |
|
|
13 | (1) |
|
1.3.4 Cassandra: an example of columnar-oriented database |
|
|
14 | (1) |
|
1.4 Data distribution models |
|
|
15 | (4) |
|
|
15 | (2) |
|
|
17 | (1) |
|
1.4.3 Combining sharding and replication |
|
|
18 | (1) |
|
1.5 Design examples using NoSQL databases |
|
|
19 | (1) |
|
1.6 Design examples using NoSQL databases |
|
|
19 | (11) |
|
|
19 | (2) |
|
|
21 | (5) |
|
|
26 | (3) |
|
|
29 | (1) |
|
|
30 | (1) |
|
|
31 | (6) |
2 Big data processing frameworks and architectures: a survey |
|
37 | (68) |
|
Raghavendra Kumar Chunduri |
|
|
|
|
37 | (2) |
|
2.2 Apache Hadoop framework and Hadoop Ecosystem |
|
|
39 | (14) |
|
2.2.1 Architecture of Hadoop framework |
|
|
39 | (1) |
|
2.2.2 Architecture of MapReduce |
|
|
39 | (7) |
|
2.2.3 Application implemented using MapReduce: concept generation in formal concept analysis (FCA) |
|
|
46 | (7) |
|
|
53 | (3) |
|
2.3.1 Programming model of HaLoop |
|
|
54 | (1) |
|
2.3.2 Task scheduling in HaLoop |
|
|
54 | (1) |
|
|
55 | (1) |
|
|
55 | (1) |
|
2.3.5 Concept generation in FCA using HaLoop |
|
|
56 | (1) |
|
|
56 | (3) |
|
2.4.1 Architecture of Twister framework |
|
|
57 | (1) |
|
2.4.2 Fault tolerance in Twister |
|
|
58 | (1) |
|
|
59 | (2) |
|
2.5.1 Characteristics of Apache Pig |
|
|
59 | (1) |
|
2.5.2 Components of Apache Pig |
|
|
59 | (1) |
|
|
60 | (1) |
|
2.5.4 Word count application using Apache Pig |
|
|
60 | (1) |
|
|
61 | (1) |
|
2.6.1 Apache Mahout features |
|
|
61 | (1) |
|
2.6.2 Applications of Mahout |
|
|
61 | (1) |
|
|
62 | (1) |
|
|
62 | (1) |
|
|
63 | (1) |
|
|
63 | (3) |
|
2.8.1 Advantages of Flume |
|
|
63 | (1) |
|
|
64 | (1) |
|
2.8.3 Components of Flume |
|
|
64 | (2) |
|
|
66 | (1) |
|
|
66 | (1) |
|
|
67 | (18) |
|
|
69 | (1) |
|
|
69 | (1) |
|
|
69 | (1) |
|
2.11.4 Spark cluster manager |
|
|
70 | (1) |
|
|
70 | (1) |
|
2.11.6 Spark resilient distributed datasets (RDDs) |
|
|
70 | (1) |
|
|
71 | (1) |
|
2.11.8 Broadcast variables in spark |
|
|
71 | (1) |
|
|
72 | (1) |
|
2.11.10 Spark System optimization |
|
|
73 | (1) |
|
2.11.11 Memory optimization |
|
|
73 | (1) |
|
|
74 | (1) |
|
2.11.13 Fault tolerance optimization |
|
|
74 | (1) |
|
2.11.14 Data processing in Spark |
|
|
75 | (3) |
|
2.11.15 Spark machine learning support |
|
|
78 | (1) |
|
2.11.16 Spark deep learning support |
|
|
79 | (1) |
|
2.11.17 Programming layer in Spark |
|
|
80 | (2) |
|
2.11.18 Concept generation in formal concept analysis using Spark |
|
|
82 | (3) |
|
2.12 Big data storage systems |
|
|
85 | (5) |
|
2.12.1 Hadoop distributed file system |
|
|
86 | (2) |
|
|
88 | (1) |
|
2.12.3 Amazon Simple Storage Services-S3 |
|
|
88 | (1) |
|
2.12.4 Microsoft Azure Blob Storage-WASB |
|
|
88 | (1) |
|
|
88 | (1) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
89 | (1) |
|
2.13 Distributed stream processing engines |
|
|
90 | (4) |
|
|
90 | (2) |
|
|
92 | (2) |
|
|
94 | (3) |
|
2.14.1 The Zookeeper data model |
|
|
95 | (1) |
|
2.14.2 ZDM-access control list |
|
|
96 | (1) |
|
2.15 Open issues and challenges |
|
|
97 | (1) |
|
|
97 | (1) |
|
|
97 | (1) |
|
|
98 | (1) |
|
|
98 | (7) |
3 The role of data lake in big data analytics: recent developments and challenges |
|
105 | (20) |
|
|
|
|
|
105 | (3) |
|
3.1.1 Differences between data warehouses and data lakes |
|
|
107 | (1) |
|
3.1.2 Data lakes pitfalls |
|
|
108 | (1) |
|
3.2 Taxonomy of data lakes |
|
|
108 | (2) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
109 | (1) |
|
|
109 | (1) |
|
3.3 Architecture of a data lake |
|
|
110 | (4) |
|
|
110 | (1) |
|
3.3.2 Data ingestion layer |
|
|
111 | (1) |
|
|
111 | (1) |
|
|
112 | (1) |
|
3.3.5 Responsibilities of data scientists in data lakes |
|
|
112 | (1) |
|
3.3.6 Metadata management |
|
|
113 | (1) |
|
3.3.7 Data lake governance |
|
|
113 | (1) |
|
|
114 | (1) |
|
3.4 Commercial-based data lakes |
|
|
114 | (2) |
|
3.4.1 Azure data lake environment |
|
|
114 | (1) |
|
3.4.2 Developing a data lake with IBM (IBM DL) |
|
|
115 | (1) |
|
3.4.3 Amazon Web Services (AWS) Galaxy data lake (GDL) |
|
|
115 | (1) |
|
3.5 Open source-based data lakes |
|
|
116 | (2) |
|
|
116 | (1) |
|
3.5.2 BIGCONNECT data lake |
|
|
116 | (1) |
|
3.5.3 Best practices for data lakes |
|
|
117 | (1) |
|
|
118 | (2) |
|
3.6.1 Machine learning in data lakes |
|
|
120 | (1) |
|
3.6.2 Data lake challenges |
|
|
120 | (1) |
|
|
120 | (1) |
|
|
121 | (4) |
4 Query optimization strategies for big data |
|
125 | (32) |
|
|
|
|
|
126 | (1) |
|
4.1.1 MapReduce preliminaries |
|
|
127 | (1) |
|
4.1.2 Organization of the chapter |
|
|
127 | (1) |
|
4.2 Multi-way joins using MapReduce |
|
|
127 | (11) |
|
|
129 | (1) |
|
|
130 | (2) |
|
|
132 | (3) |
|
|
135 | (3) |
|
4.3 Graph queries using MapReduce |
|
|
138 | (9) |
|
|
138 | (2) |
|
4.3.2 Subgraph enumeration |
|
|
140 | (7) |
|
4.4 Multi-way spatial join |
|
|
147 | (6) |
|
4.5 Conclusion and future work |
|
|
153 | (1) |
|
|
153 | (4) |
5 Toward real-time data processing: an advanced approach in big data analytics |
|
157 | (18) |
|
|
|
|
|
157 | (2) |
|
5.2 Real-time data processing topology |
|
|
159 | (1) |
|
5.2.1 Choosing the platform |
|
|
159 | (1) |
|
|
159 | (1) |
|
5.2.3 Data processing infrastructure |
|
|
159 | (1) |
|
|
160 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
161 | (1) |
|
|
162 | (1) |
|
5.4.4 Outlier and anomaly detection |
|
|
162 | (1) |
|
|
162 | (1) |
|
5.6 Stream processing approach for big data |
|
|
163 | (9) |
|
|
163 | (1) |
|
|
164 | (3) |
|
|
167 | (1) |
|
|
167 | (1) |
|
|
168 | (1) |
|
|
169 | (3) |
|
5.7 Evaluation of data streaming processing approaches |
|
|
172 | (1) |
|
|
172 | (1) |
|
|
172 | (1) |
|
|
173 | (2) |
6 A survey on data stream analytics |
|
175 | (34) |
|
|
|
|
|
175 | (2) |
|
|
177 | (1) |
|
6.3 Prediction and forecasting |
|
|
178 | (2) |
|
6.3.1 Future direction for prediction and forecasting |
|
|
179 | (1) |
|
|
180 | (3) |
|
6.4.1 Future direction for outlier detection |
|
|
182 | (1) |
|
6.5 Concept drift detection |
|
|
183 | (4) |
|
6.5.1 Future direction for concept drift detection |
|
|
187 | (1) |
|
6.6 Mining frequent item sets in data stream |
|
|
187 | (4) |
|
6.6.1 Future direction for frequent item-set mining |
|
|
190 | (1) |
|
6.7 Computational paradigm |
|
|
191 | (6) |
|
6.7.1 Future direction for computational paradigm |
|
|
196 | (1) |
|
|
197 | (1) |
|
|
198 | (11) |
7 Architectures of big data analytics: scaling out data mining algorithms using Hadoop-MapReduce and Spark |
|
209 | (88) |
|
|
|
|
209 | (2) |
|
7.2 Previous related reviews |
|
|
211 | (3) |
|
|
214 | (3) |
|
7.4 Review of articles in the present work |
|
|
217 | (35) |
|
7.4.1 Association rule mining/pattern mining |
|
|
217 | (7) |
|
7.4.2 Regression/prediction/forecasting |
|
|
224 | (3) |
|
|
227 | (10) |
|
|
237 | (9) |
|
7.4.5 Outlier detection/intrusion detection system |
|
|
246 | (2) |
|
|
248 | (1) |
|
|
249 | (3) |
|
|
252 | (8) |
|
7.6 Conclusion and future directions |
|
|
260 | (10) |
|
|
270 | (27) |
8 A review of fog and edge computing with big data analytics |
|
297 | (20) |
|
|
|
|
|
298 | (1) |
|
|
298 | (1) |
|
8.1.2 Importance of big data in cloud computing |
|
|
299 | (1) |
|
8.1.3 Merits and demerits |
|
|
299 | (1) |
|
8.2 Introduction to cloud computing with IoT applications |
|
|
299 | (6) |
|
8.2.1 Cloud computing importance |
|
|
302 | (1) |
|
8.2.2 Cloud offloading strategies |
|
|
303 | (1) |
|
8.2.3 Applications of IoT |
|
|
303 | (2) |
|
8.2.4 Merits and demerits of IoT application with cloud |
|
|
305 | (1) |
|
8.3 Importance of fog computing |
|
|
305 | (5) |
|
|
306 | (1) |
|
|
307 | (1) |
|
8.3.3 Description of fog architecture |
|
|
307 | (3) |
|
8.3.4 Research direction in fog |
|
|
310 | (1) |
|
8.4 Significance of edge computing |
|
|
310 | (2) |
|
8.4.1 What is edge computing |
|
|
310 | (1) |
|
8.4.2 Benefits of edge computing |
|
|
310 | (1) |
|
8.4.3 How edge computing used in IoT applications |
|
|
311 | (1) |
|
8.4.4 Future of edge computing |
|
|
312 | (1) |
|
8.5 Architecture review with cloud and fog and edge computing with IoT applications |
|
|
312 | (2) |
|
8.5.1 How IoT applications meeting the challenges at edge |
|
|
312 | (1) |
|
8.5.2 Review on cyber threats, latency time and power consumption challenges |
|
|
313 | (1) |
|
8.5.3 Applications and future scope of research |
|
|
314 | (1) |
|
|
314 | (1) |
|
|
314 | (3) |
9 Fog computing framework for Big Data processing using cluster management in a resource-constraint environment |
|
317 | (18) |
|
|
Nagender Kumar Suryadevara |
|
|
|
|
317 | (2) |
|
|
319 | (4) |
|
|
320 | (1) |
|
|
321 | (1) |
|
9.2.3 Peer-to-peer computing |
|
|
321 | (1) |
|
9.2.4 Distributed computing frameworks |
|
|
321 | (2) |
|
9.2.5 Gaps identified in the existing research work |
|
|
323 | (1) |
|
9.2.6 Objectives of the chapter |
|
|
323 | (1) |
|
|
323 | (1) |
|
9.4 Implementation details |
|
|
324 | (5) |
|
9.4.1 Using resource constraint device (Raspberry Pi) |
|
|
324 | (2) |
|
9.4.2 Spark fog cluster evaluation |
|
|
326 | (3) |
|
9.5 Results and discussion |
|
|
329 | (3) |
|
9.6 Conclusion and future work |
|
|
332 | (1) |
|
|
332 | (3) |
10 Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities |
|
335 | (10) |
|
Kundumani Srinivasan Kuppusamy |
|
|
|
335 | (1) |
|
10.2 Rationale for accessibility |
|
|
336 | (1) |
|
10.3 Artificial intelligence for accessibility |
|
|
337 | (3) |
|
10.3.1 Perception porting |
|
|
337 | (1) |
|
10.3.2 Assisting deaf and hard of hearing |
|
|
338 | (1) |
|
10.3.3 AI-based exoskeletons |
|
|
339 | (1) |
|
10.3.4 Accessible data visualization |
|
|
339 | (1) |
|
10.3.5 Enabling smart environment through IoT for persons with disabilities |
|
|
339 | (1) |
|
|
340 | (1) |
|
|
340 | (5) |
Overall conclusions |
|
345 | (2) |
|
|
Index |
|
347 | |