Muutke küpsiste eelistusi

E-raamat: Handbook of Big Data Analytics: Methodologies, Volume 1

Edited by (Vellore Institute of Technology, School of Information Technology and Engineering, Vellore, India), Edited by (Institute for Development and Research in Banking Technology, Hyderabad, India)
  • Formaat: EPUB+DRM
  • Sari: Computing and Networks
  • Ilmumisaeg: 20-Sep-2021
  • Kirjastus: Institution of Engineering and Technology
  • Keel: eng
  • ISBN-13: 9781839530586
  • Formaat - EPUB+DRM
  • Hind: 214,50 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: EPUB+DRM
  • Sari: Computing and Networks
  • Ilmumisaeg: 20-Sep-2021
  • Kirjastus: Institution of Engineering and Technology
  • Keel: eng
  • ISBN-13: 9781839530586

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Big Data analytics is the complex process of examining big data to uncover information such as correlations, hidden patterns, trends and user and customer preferences, to allow organizations and businesses to make more informed decisions. These methods and technologies have become ubiquitous in all fields of science, engineering, business and management due to the rise of data-driven models as well as data engineering developments using parallel and distributed computational analytics frameworks, data and algorithm parallelization, and GPGPU programming. However, there remain potential issues that need to be addressed to enable big data processing and analytics in real time.

In the first volume of this comprehensive two-volume handbook, the authors present several methodologies to support Big Data analytics including database management, processing frameworks and architectures, data lakes, query optimization strategies, towards real-time data processing, data stream analytics, Fog and Edge computing, and Artificial Intelligence and Big Data.

The second volume is dedicated to a wide range of applications in secure data storage, privacy-preserving, Software Defined Networks (SDN), Internet of Things (IoTs), behaviour analytics, traffic predictions, gender based classification on e-commerce data, recommender systems, Big Data regression with Apache Spark, visual sentiment analysis, wavelet Neural Network via GPU, stock market movement predictions, and financial reporting.

The two-volume work is aimed at providing a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics.



This comprehensive edited 2-volume handbook provides a unique platform for researchers, engineers, developers, educators and advanced students in the field of Big Data analytics. The first volume presents methodologies that support Big Data analytics, while the second volume offers a wide range of Big Data analytics applications.

About the editors xiii
About the contributors xv
Foreword xxi
Foreword xxiii
Preface xxv
Acknowledgements xxvii
Introduction xxix
1 The impact of Big Data on databases 1(36)
Antonio Sarasa Cabezuelo
1.1 The Big Data phenomenon
2(4)
1.1.1 Big Data Operational and Big Data Analytical
3(2)
1.1.2 The impact of Big Data on databases
5(1)
1.2 Scalability in relational databases
6(3)
1.2.1 Relational databases
6(1)
1.2.2 The limitations of relational databases
7(2)
1.3 NoSQL databases
9(6)
1.3.1 Disadvantages of NoSQL databases
11(1)
1.3.2 Aggregate-oriented NoSQL databases
12(1)
1.3.3 MongoDB: an example of documentary database
13(1)
1.3.4 Cassandra: an example of columnar-oriented database
14(1)
1.4 Data distribution models
15(4)
1.4.1 Sharding
15(2)
1.4.2 Replication
17(1)
1.4.3 Combining sharding and replication
18(1)
1.5 Design examples using NoSQL databases
19(1)
1.6 Design examples using NoSQL databases
19(11)
1.6.1 Example 1
19(2)
1.6.2 Example 2
21(5)
1.6.3 Example 3
26(3)
1.6.4 Example 4
29(1)
1.7 Conclusions
30(1)
References
31(6)
2 Big data processing frameworks and architectures: a survey 37(68)
Raghavendra Kumar Chunduri
Aswani Kumar Cherukuri
2.1 Introduction
37(2)
2.2 Apache Hadoop framework and Hadoop Ecosystem
39(14)
2.2.1 Architecture of Hadoop framework
39(1)
2.2.2 Architecture of MapReduce
39(7)
2.2.3 Application implemented using MapReduce: concept generation in formal concept analysis (FCA)
46(7)
2.3 HaLoop framework
53(3)
2.3.1 Programming model of HaLoop
54(1)
2.3.2 Task scheduling in HaLoop
54(1)
2.3.3 Caching in HaLoop
55(1)
2.3.4 Fault tolerance
55(1)
2.3.5 Concept generation in FCA using HaLoop
56(1)
2.4 Twister framework
56(3)
2.4.1 Architecture of Twister framework
57(1)
2.4.2 Fault tolerance in Twister
58(1)
2.5 Apache Pig
59(2)
2.5.1 Characteristics of Apache Pig
59(1)
2.5.2 Components of Apache Pig
59(1)
2.5.3 Pig data model
60(1)
2.5.4 Word count application using Apache Pig
60(1)
2.6 Apache Mahout
61(1)
2.6.1 Apache Mahout features
61(1)
2.6.2 Applications of Mahout
61(1)
2.7 Apache Sqoop
62(1)
2.7.1 Sqoop import
62(1)
2.7.2 Export from Sqoop
63(1)
2.8 Apache Flume
63(3)
2.8.1 Advantages of Flume
63(1)
2.8.2 Features of Flume
64(1)
2.8.3 Components of Flume
64(2)
2.9 Apache Oozie
66(1)
2.10 Hadoop 2
66(1)
2.11 Apache Spark
67(18)
2.11.1 Spark Core
69(1)
2.11.2 Driver program
69(1)
2.11.3 Spark Context
69(1)
2.11.4 Spark cluster manager
70(1)
2.11.5 Spark worker node
70(1)
2.11.6 Spark resilient distributed datasets (RDDs)
70(1)
2.11.7 Caching RDDs
71(1)
2.11.8 Broadcast variables in spark
71(1)
2.11.9 Spark Datasets
72(1)
2.11.10 Spark System optimization
73(1)
2.11.11 Memory optimization
73(1)
2.11.12 I/O optimization
74(1)
2.11.13 Fault tolerance optimization
74(1)
2.11.14 Data processing in Spark
75(3)
2.11.15 Spark machine learning support
78(1)
2.11.16 Spark deep learning support
79(1)
2.11.17 Programming layer in Spark
80(2)
2.11.18 Concept generation in formal concept analysis using Spark
82(3)
2.12 Big data storage systems
85(5)
2.12.1 Hadoop distributed file system
86(2)
2.12.2 Alluxio
88(1)
2.12.3 Amazon Simple Storage Services-S3
88(1)
2.12.4 Microsoft Azure Blob Storage-WASB
88(1)
2.12.5 HBase
88(1)
2.12.6 Amazon Dynamo
88(1)
2.12.7 Cassandra
89(1)
2.12.8 Hive
89(1)
2.13 Distributed stream processing engines
90(4)
2.13.1 Apache Storm
90(2)
2.13.2 Apache Flink
92(2)
2.14 Apache Zookeeper
94(3)
2.14.1 The Zookeeper data model
95(1)
2.14.2 ZDM-access control list
96(1)
2.15 Open issues and challenges
97(1)
2.15.1 Memory management
97(1)
2.15.2 Failure recovery
97(1)
2.16 Conclusion
98(1)
References
98(7)
3 The role of data lake in big data analytics: recent developments and challenges 105(20)
T. Ramalingeswara Rao
Pabitra Mitra
Adrijit Goswami
3.1 Introduction
105(3)
3.1.1 Differences between data warehouses and data lakes
107(1)
3.1.2 Data lakes pitfalls
108(1)
3.2 Taxonomy of data lakes
108(2)
3.2.1 Data silos
108(1)
3.2.2 Data swamps
109(1)
3.2.3 Data reservoirs
109(1)
3.2.4 Big data fabric
109(1)
3.3 Architecture of a data lake
110(4)
3.3.1 Raw data layer
110(1)
3.3.2 Data ingestion layer
111(1)
3.3.3 Process layer
111(1)
3.3.4 Ingress layer
112(1)
3.3.5 Responsibilities of data scientists in data lakes
112(1)
3.3.6 Metadata management
113(1)
3.3.7 Data lake governance
113(1)
3.3.8 Data cataloging
114(1)
3.4 Commercial-based data lakes
114(2)
3.4.1 Azure data lake environment
114(1)
3.4.2 Developing a data lake with IBM (IBM DL)
115(1)
3.4.3 Amazon Web Services (AWS) Galaxy data lake (GDL)
115(1)
3.5 Open source-based data lakes
116(2)
3.5.1 Delta lake
116(1)
3.5.2 BIGCONNECT data lake
116(1)
3.5.3 Best practices for data lakes
117(1)
3.6 Case studies
118(2)
3.6.1 Machine learning in data lakes
120(1)
3.6.2 Data lake challenges
120(1)
3.7 Conclusion
120(1)
References
121(4)
4 Query optimization strategies for big data 125(32)
Nagesh Bhattu Sristy
Prashanth Kadari
Harini Yadamreddy
4.1 Introduction
126(1)
4.1.1 MapReduce preliminaries
127(1)
4.1.2 Organization of the chapter
127(1)
4.2 Multi-way joins using MapReduce
127(11)
4.2.1 Sequential join
129(1)
4.2.2 Shares approach
130(2)
4.2.3 SharesSkew
132(3)
4.2.4 Θ-Join
135(3)
4.3 Graph queries using MapReduce
138(9)
4.3.1 Counting triangles
138(2)
4.3.2 Subgraph enumeration
140(7)
4.4 Multi-way spatial join
147(6)
4.5 Conclusion and future work
153(1)
References
153(4)
5 Toward real-time data processing: an advanced approach in big data analytics 157(18)
Shafqat Ul Ahsaan
Harleen Kaur
Sameena Naaz
5.1 Introduction
157(2)
5.2 Real-time data processing topology
159(1)
5.2.1 Choosing the platform
159(1)
5.2.2 Entry points
159(1)
5.2.3 Data processing infrastructure
159(1)
5.3 Streaming processing
160(1)
5.4 Stream mining
161(1)
5.4.1 Clustering
161(1)
5.4.2 Classification
161(1)
5.4.3 Frequent
162(1)
5.4.4 Outlier and anomaly detection
162(1)
5.5 Lambda architecture
162(1)
5.6 Stream processing approach for big data
163(9)
5.6.1 Apache Spark
163(1)
5.6.2 Apache Flink
164(3)
5.6.3 Apache Samza
167(1)
5.6.4 Apache Storm
167(1)
5.6.5 Apache Flume
168(1)
5.6.6 Apache Kafka
169(3)
5.7 Evaluation of data streaming processing approaches
172(1)
5.8 Conclusion
172(1)
Acknowledgment
172(1)
References
173(2)
6 A survey on data stream analytics 175(34)
Sumit Misra
Sanjoy Kumar Saha
Chandan Mazumdar
6.1 Introduction
175(2)
6.2 Scope and approach
177(1)
6.3 Prediction and forecasting
178(2)
6.3.1 Future direction for prediction and forecasting
179(1)
6.4 Outlier detection
180(3)
6.4.1 Future direction for outlier detection
182(1)
6.5 Concept drift detection
183(4)
6.5.1 Future direction for concept drift detection
187(1)
6.6 Mining frequent item sets in data stream
187(4)
6.6.1 Future direction for frequent item-set mining
190(1)
6.7 Computational paradigm
191(6)
6.7.1 Future direction for computational paradigm
196(1)
6.8 Conclusion
197(1)
References
198(11)
7 Architectures of big data analytics: scaling out data mining algorithms using Hadoop-MapReduce and Spark 209(88)
Sheikh Kamaruddin
Vadlamani Ravi
7.1 Introduction
209(2)
7.2 Previous related reviews
211(3)
7.3 Review methodology
214(3)
7.4 Review of articles in the present work
217(35)
7.4.1 Association rule mining/pattern mining
217(7)
7.4.2 Regression/prediction/forecasting
224(3)
7.4.3 Classification
227(10)
7.4.4 Clustering
237(9)
7.4.5 Outlier detection/intrusion detection system
246(2)
7.4.6 Recommendation
248(1)
7.4.7 Others
249(3)
7.5 Discussion
252(8)
7.6 Conclusion and future directions
260(10)
References
270(27)
8 A review of fog and edge computing with big data analytics 297(20)
Ch. Rajyalakshmi
K. Ram Mohan Rao
Rajeswara Rao Ramisetty
8.1 Introduction
298(1)
8.1.1 What is big data?
298(1)
8.1.2 Importance of big data in cloud computing
299(1)
8.1.3 Merits and demerits
299(1)
8.2 Introduction to cloud computing with IoT applications
299(6)
8.2.1 Cloud computing importance
302(1)
8.2.2 Cloud offloading strategies
303(1)
8.2.3 Applications of IoT
303(2)
8.2.4 Merits and demerits of IoT application with cloud
305(1)
8.3 Importance of fog computing
305(5)
8.3.1 Overview of fog
306(1)
8.3.2 Definition for fog
307(1)
8.3.3 Description of fog architecture
307(3)
8.3.4 Research direction in fog
310(1)
8.4 Significance of edge computing
310(2)
8.4.1 What is edge computing
310(1)
8.4.2 Benefits of edge computing
310(1)
8.4.3 How edge computing used in IoT applications
311(1)
8.4.4 Future of edge computing
312(1)
8.5 Architecture review with cloud and fog and edge computing with IoT applications
312(2)
8.5.1 How IoT applications meeting the challenges at edge
312(1)
8.5.2 Review on cyber threats, latency time and power consumption challenges
313(1)
8.5.3 Applications and future scope of research
314(1)
8.6 Conclusion
314(1)
References
314(3)
9 Fog computing framework for Big Data processing using cluster management in a resource-constraint environment 317(18)
Srinivasa Raju Rudraraju
Nagender Kumar Suryadevara
Atul Negi
9.1 Introduction
317(2)
9.2 Literature survey
319(4)
9.2.1 Cluster computing
320(1)
9.2.2 Utility computing
321(1)
9.2.3 Peer-to-peer computing
321(1)
9.2.4 Distributed computing frameworks
321(2)
9.2.5 Gaps identified in the existing research work
323(1)
9.2.6 Objectives of the chapter
323(1)
9.3 System description
323(1)
9.4 Implementation details
324(5)
9.4.1 Using resource constraint device (Raspberry Pi)
324(2)
9.4.2 Spark fog cluster evaluation
326(3)
9.5 Results and discussion
329(3)
9.6 Conclusion and future work
332(1)
References
332(3)
10 Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities 335(10)
Kundumani Srinivasan Kuppusamy
10.1 Introduction
335(1)
10.2 Rationale for accessibility
336(1)
10.3 Artificial intelligence for accessibility
337(3)
10.3.1 Perception porting
337(1)
10.3.2 Assisting deaf and hard of hearing
338(1)
10.3.3 AI-based exoskeletons
339(1)
10.3.4 Accessible data visualization
339(1)
10.3.5 Enabling smart environment through IoT for persons with disabilities
339(1)
10.4 Conclusions
340(1)
References
340(5)
Overall conclusions 345(2)
Vadlamani Ravi
Aswani Kumar Cherukuri
Index 347
Vadlamani Ravi is a professor at the Institute for Development and Research in Banking Technology, Hyderabad, where he spearheads the Center of Excellence in Analytics, the first-of-its-kind in India. He has over 32 years of experience in research and teaching. He is on the Editorial Board several international journals. He has published more than 230 papers in international journals, conferences and book chapters.



Aswani Kumar Cherukuri is a professor of the School of Information Technology and Engineering at Vellore Institute of Technology, India. He has almost 20 years of academic and research experience. His research interests include machine learning and information security. He has published more than 150 research papers in various journals and conferences, and executed major research projects funded by Govt. of India. He is a senior member of ACM and life member of CSI, ISTE.