Klienditugi: 7440010 (E-R 10-18)

E-raamat: Handbook of Big Data Analytics: Methodologies, Volume 1

Edited by Aswani Kumar Cherukuri (Vellore Institute of Technology, School of Information Technology and Engineering, Vellore, India), Edited by Vadlamani Ravi (Institute for Development and Research in Banking Technology, Hyderabad, India)

Formaat: EPUB+DRM
Sari: Computing and Networks
Ilmumisaeg: 20-Sep-2021
Kirjastus: Institution of Engineering and Technology
Keel: eng
ISBN-13: 9781839530586

Teised raamatud teemal:

Formaat - EPUB+DRM
Hind: 214,50 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: EPUB+DRM
Sari: Computing and Networks
Ilmumisaeg: 20-Sep-2021
Kirjastus: Institution of Engineering and Technology
Keel: eng
ISBN-13: 9781839530586

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Big Data analytics is the complex process of examining big data to uncover information such as correlations, hidden patterns, trends and user and customer preferences, to allow organizations and businesses to make more informed decisions. These methods and technologies have become ubiquitous in all fields of science, engineering, business and management due to the rise of data-driven models as well as data engineering developments using parallel and distributed computational analytics frameworks, data and algorithm parallelization, and GPGPU programming. However, there remain potential issues that need to be addressed to enable big data processing and analytics in real time.

In the first volume of this comprehensive two-volume handbook, the authors present several methodologies to support Big Data analytics including database management, processing frameworks and architectures, data lakes, query optimization strategies, towards real-time data processing, data stream analytics, Fog and Edge computing, and Artificial Intelligence and Big Data.

The second volume is dedicated to a wide range of applications in secure data storage, privacy-preserving, Software Defined Networks (SDN), Internet of Things (IoTs), behaviour analytics, traffic predictions, gender based classification on e-commerce data, recommender systems, Big Data regression with Apache Spark, visual sentiment analysis, wavelet Neural Network via GPU, stock market movement predictions, and financial reporting.

The two-volume work is aimed at providing a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics.

This comprehensive edited 2-volume handbook provides a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics. The first volume presents methodologies that support Big Data analytics, while the second volume offers a wide range of Big Data analytics applications.

About the editors

xiii

About the contributors

Foreword

xxi

Foreword

xxiii

Preface

xxv

Acknowledgements

xxvii

Introduction

xxix

1 The impact of Big Data on databases

(36)

Antonio Sarasa Cabezuelo

1.1 The Big Data phenomenon

(4)

1.1.1 Big Data Operational and Big Data Analytical

(2)

1.1.2 The impact of Big Data on databases

(1)

1.2 Scalability in relational databases

(3)

1.2.1 Relational databases

(1)

1.2.2 The limitations of relational databases

(2)

1.3 NoSQL databases

(6)

1.3.1 Disadvantages of NoSQL databases

(1)

1.3.2 Aggregate-oriented NoSQL databases

(1)

1.3.3 MongoDB: an example of documentary database

(1)

1.3.4 Cassandra: an example of columnar-oriented database

(1)

1.4 Data distribution models

(4)

1.4.1 Sharding

(2)

1.4.2 Replication

(1)

1.4.3 Combining sharding and replication

(1)

1.5 Design examples using NoSQL databases

(1)

1.6 Design examples using NoSQL databases

(11)

1.6.1 Example 1

(2)

1.6.2 Example 2

(5)

1.6.3 Example 3

(3)

1.6.4 Example 4

(1)

1.7 Conclusions

(1)

References

(6)

2 Big data processing frameworks and architectures: a survey

(68)

Raghavendra Kumar Chunduri

Aswani Kumar Cherukuri

2.1 Introduction

(2)

2.2 Apache Hadoop framework and Hadoop Ecosystem

(14)

2.2.1 Architecture of Hadoop framework

(1)

2.2.2 Architecture of MapReduce

(7)

2.2.3 Application implemented using MapReduce: concept generation in formal concept analysis (FCA)

(7)

2.3 HaLoop framework

(3)

2.3.1 Programming model of HaLoop

(1)

2.3.2 Task scheduling in HaLoop

(1)

2.3.3 Caching in HaLoop

(1)

2.3.4 Fault tolerance

(1)

2.3.5 Concept generation in FCA using HaLoop

(1)

2.4 Twister framework

(3)

2.4.1 Architecture of Twister framework

(1)

2.4.2 Fault tolerance in Twister

(1)

2.5 Apache Pig

(2)

2.5.1 Characteristics of Apache Pig

(1)

2.5.2 Components of Apache Pig

(1)

2.5.3 Pig data model

(1)

2.5.4 Word count application using Apache Pig

(1)

2.6 Apache Mahout

(1)

2.6.1 Apache Mahout features

(1)

2.6.2 Applications of Mahout

(1)

2.7 Apache Sqoop

(1)

2.7.1 Sqoop import

(1)

2.7.2 Export from Sqoop

(1)

2.8 Apache Flume

(3)

2.8.1 Advantages of Flume

(1)

2.8.2 Features of Flume

(1)

2.8.3 Components of Flume

(2)

2.9 Apache Oozie

(1)

2.10 Hadoop 2

(1)

2.11 Apache Spark

(18)

2.11.1 Spark Core

(1)

2.11.2 Driver program

(1)

2.11.3 Spark Context

(1)

2.11.4 Spark cluster manager

(1)

2.11.5 Spark worker node

(1)

2.11.6 Spark resilient distributed datasets (RDDs)

(1)

2.11.7 Caching RDDs

(1)

2.11.8 Broadcast variables in spark

(1)

2.11.9 Spark Datasets

(1)

2.11.10 Spark System optimization

(1)

2.11.11 Memory optimization

(1)

2.11.12 I/O optimization

(1)

2.11.13 Fault tolerance optimization

(1)

2.11.14 Data processing in Spark

(3)

2.11.15 Spark machine learning support

(1)

2.11.16 Spark deep learning support

(1)

2.11.17 Programming layer in Spark

(2)

2.11.18 Concept generation in formal concept analysis using Spark

(3)

2.12 Big data storage systems

(5)

2.12.1 Hadoop distributed file system

(2)

2.12.2 Alluxio

(1)

2.12.3 Amazon Simple Storage Services-S3

(1)

2.12.4 Microsoft Azure Blob Storage-WASB

(1)

2.12.5 HBase

(1)

2.12.6 Amazon Dynamo

(1)

2.12.7 Cassandra

(1)

2.12.8 Hive

(1)

2.13 Distributed stream processing engines

(4)

2.13.1 Apache Storm

(2)

2.13.2 Apache Flink

(2)

2.14 Apache Zookeeper

(3)

2.14.1 The Zookeeper data model

(1)

2.14.2 ZDM-access control list

(1)

2.15 Open issues and challenges

(1)

2.15.1 Memory management

(1)

2.15.2 Failure recovery

(1)

2.16 Conclusion

(1)

References

(7)

3 The role of data lake in big data analytics: recent developments and challenges

105

(20)

T. Ramalingeswara Rao

Pabitra Mitra

Adrijit Goswami

3.1 Introduction

105

(3)

3.1.1 Differences between data warehouses and data lakes

107

(1)

3.1.2 Data lakes pitfalls

108

(1)

3.2 Taxonomy of data lakes

108

(2)

3.2.1 Data silos

108

(1)

3.2.2 Data swamps

109

(1)

3.2.3 Data reservoirs

109

(1)

3.2.4 Big data fabric

109

(1)

3.3 Architecture of a data lake

110

(4)

3.3.1 Raw data layer

110

(1)

3.3.2 Data ingestion layer

111

(1)

3.3.3 Process layer

111

(1)

3.3.4 Ingress layer

112

(1)

3.3.5 Responsibilities of data scientists in data lakes

112

(1)

3.3.6 Metadata management

113

(1)

3.3.7 Data lake governance

113

(1)

3.3.8 Data cataloging

114

(1)

3.4 Commercial-based data lakes

114

(2)

3.4.1 Azure data lake environment

114

(1)

3.4.2 Developing a data lake with IBM (IBM DL)

115

(1)

3.4.3 Amazon Web Services (AWS) Galaxy data lake (GDL)

115

(1)

3.5 Open source-based data lakes

116

(2)

3.5.1 Delta lake

116

(1)

3.5.2 BIGCONNECT data lake

116

(1)

3.5.3 Best practices for data lakes

117

(1)

3.6 Case studies

118

(2)

3.6.1 Machine learning in data lakes

120

(1)

3.6.2 Data lake challenges

120

(1)

3.7 Conclusion

120

(1)

References

121

(4)

4 Query optimization strategies for big data

125

(32)

Nagesh Bhattu Sristy

Prashanth Kadari

Harini Yadamreddy

4.1 Introduction

126

(1)

4.1.1 MapReduce preliminaries

127

(1)

4.1.2 Organization of the chapter

127

(1)

4.2 Multi-way joins using MapReduce

127

(11)

4.2.1 Sequential join

129

(1)

4.2.2 Shares approach

130

(2)

4.2.3 SharesSkew

132

(3)

4.2.4 Θ-Join

135

(3)

4.3 Graph queries using MapReduce

138

(9)

4.3.1 Counting triangles

138

(2)

4.3.2 Subgraph enumeration

140

(7)

4.4 Multi-way spatial join

147

(6)

4.5 Conclusion and future work

153

(1)

References

153

(4)

5 Toward real-time data processing: an advanced approach in big data analytics

157

(18)

Shafqat Ul Ahsaan

Harleen Kaur

Sameena Naaz

5.1 Introduction

157

(2)

5.2 Real-time data processing topology

159

(1)

5.2.1 Choosing the platform

159

(1)

5.2.2 Entry points

159

(1)

5.2.3 Data processing infrastructure

159

(1)

5.3 Streaming processing

160

(1)

5.4 Stream mining

161

(1)

5.4.1 Clustering

161

(1)

5.4.2 Classification

161

(1)

5.4.3 Frequent

162

(1)

5.4.4 Outlier and anomaly detection

162

(1)

5.5 Lambda architecture

162

(1)

5.6 Stream processing approach for big data

163

(9)

5.6.1 Apache Spark

163

(1)

5.6.2 Apache Flink

164

(3)

5.6.3 Apache Samza

167

(1)

5.6.4 Apache Storm

167

(1)

5.6.5 Apache Flume

168

(1)

5.6.6 Apache Kafka

169

(3)

5.7 Evaluation of data streaming processing approaches

172

(1)

5.8 Conclusion

172

(1)

Acknowledgment

172

(1)

References

173

(2)

6 A survey on data stream analytics

175

(34)

Sumit Misra

Sanjoy Kumar Saha

Chandan Mazumdar

6.1 Introduction

175

(2)

6.2 Scope and approach

177

(1)

6.3 Prediction and forecasting

178

(2)

6.3.1 Future direction for prediction and forecasting

179

(1)

6.4 Outlier detection

180

(3)

6.4.1 Future direction for outlier detection

182

(1)

6.5 Concept drift detection

183

(4)

6.5.1 Future direction for concept drift detection

187

(1)

6.6 Mining frequent item sets in data stream

187

(4)

6.6.1 Future direction for frequent item-set mining

190

(1)

6.7 Computational paradigm

191

(6)

6.7.1 Future direction for computational paradigm

196

(1)

6.8 Conclusion

197

(1)

References

198

(11)

7 Architectures of big data analytics: scaling out data mining algorithms using Hadoop-MapReduce and Spark

209

(88)

Sheikh Kamaruddin

Vadlamani Ravi

7.1 Introduction

209

(2)

7.2 Previous related reviews

211

(3)

7.3 Review methodology

214

(3)

7.4 Review of articles in the present work

217

(35)

7.4.1 Association rule mining/pattern mining

217

(7)

7.4.2 Regression/prediction/forecasting

224

(3)

7.4.3 Classification

227

(10)

7.4.4 Clustering

237

(9)

7.4.5 Outlier detection/intrusion detection system

246

(2)

7.4.6 Recommendation

248

(1)

7.4.7 Others

249

(3)

7.5 Discussion

252

(8)

7.6 Conclusion and future directions

260

(10)

References

270

(27)

8 A review of fog and edge computing with big data analytics

297

(20)

Ch. Rajyalakshmi

K. Ram Mohan Rao

Rajeswara Rao Ramisetty

8.1 Introduction

298

(1)

8.1.1 What is big data?

298

(1)

8.1.2 Importance of big data in cloud computing

299

(1)

8.1.3 Merits and demerits

299

(1)

8.2 Introduction to cloud computing with IoT applications

299

(6)

8.2.1 Cloud computing importance

302

(1)

8.2.2 Cloud offloading strategies

303

(1)

8.2.3 Applications of IoT

303

(2)

8.2.4 Merits and demerits of IoT application with cloud

305

(1)

8.3 Importance of fog computing

305

(5)

8.3.1 Overview of fog

306

(1)

8.3.2 Definition for fog

307

(1)

8.3.3 Description of fog architecture

307

(3)

8.3.4 Research direction in fog

310

(1)

8.4 Significance of edge computing

310

(2)

8.4.1 What is edge computing

310

(1)

8.4.2 Benefits of edge computing

310

(1)

8.4.3 How edge computing used in IoT applications

311

(1)

8.4.4 Future of edge computing

312

(1)

8.5 Architecture review with cloud and fog and edge computing with IoT applications

312

(2)

8.5.1 How IoT applications meeting the challenges at edge

312

(1)

8.5.2 Review on cyber threats, latency time and power consumption challenges

313

(1)

8.5.3 Applications and future scope of research

314

(1)

8.6 Conclusion

314

(1)

References

314

(3)

9 Fog computing framework for Big Data processing using cluster management in a resource-constraint environment

317

(18)

Srinivasa Raju Rudraraju

Nagender Kumar Suryadevara

Atul Negi

9.1 Introduction

317

(2)

9.2 Literature survey

319

(4)

9.2.1 Cluster computing

320

(1)

9.2.2 Utility computing

321

(1)

9.2.3 Peer-to-peer computing

321

(1)

9.2.4 Distributed computing frameworks

321

(2)

9.2.5 Gaps identified in the existing research work

323

(1)

9.2.6 Objectives of the chapter

323

(1)

9.3 System description

323

(1)

9.4 Implementation details

324

(5)

9.4.1 Using resource constraint device (Raspberry Pi)

324

(2)

9.4.2 Spark fog cluster evaluation

326

(3)

9.5 Results and discussion

329

(3)

9.6 Conclusion and future work

332

(1)

References

332

(3)

10 Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities

335

(10)

Kundumani Srinivasan Kuppusamy

10.1 Introduction

335

(1)

10.2 Rationale for accessibility

336

(1)

10.3 Artificial intelligence for accessibility

337

(3)

10.3.1 Perception porting

337

(1)

10.3.2 Assisting deaf and hard of hearing

338

(1)

10.3.3 AI-based exoskeletons

339

(1)

10.3.4 Accessible data visualization

339

(1)

10.3.5 Enabling smart environment through IoT for persons with disabilities

339

(1)

10.4 Conclusions

340

(1)

References

340

(5)

Overall conclusions

345

(2)

Vadlamani Ravi

Aswani Kumar Cherukuri

Index

347

Vadlamani Ravi is a professor at the Institute for Development and Research in Banking Technology, Hyderabad, where he spearheads the Center of Excellence in Analytics, the first-of-its-kind in India. He has over 32 years of experience in research and teaching. He is on the Editorial Board several international journals. He has published more than 230 papers in international journals, conferences and book chapters.

Aswani Kumar Cherukuri is a professor of the School of Information Technology and Engineering at Vellore Institute of Technology, India. He has almost 20 years of academic and research experience. His research interests include machine learning and information security. He has published more than 150 research papers in various journals and conferences, and executed major research projects funded by Govt. of India. He is a senior member of ACM and life member of CSI, ISTE.

Lisainfo e-raamatute kohta