Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Big Data: Concepts, Warehousing, and Analytics

Carlos Costa, Maribel Yasmina Santos

Formaat: 314 pages
Ilmumisaeg: 01-Sep-2022
Kirjastus: River Publishers
ISBN-13: 9781000794038

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 100,87 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 314 pages
Ilmumisaeg: 01-Sep-2022
Kirjastus: River Publishers
ISBN-13: 9781000794038

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Big Data is a concept of major relevance in today's world, sometimes highlighted as a key asset for productivity, growth, innovation, and customer relationships. Its popularity has increased considerably during recent years. Areas like smart cities, manufacturing, retail, finance, software development, environment, digital media, among others, can benefit from the collection, storage, processing, and analysis of Big Data, leveraging unprecedented data-driven workflows and considerably improved decision-making processes.

The concept of a Big Data Warehouse (BDW) is emerging as either an augmentation or a replacement of the traditional Data Warehouse (DW), a concept that has a long history as one of the most valuable enterprise data assets. Nevertheless, research in Big Data Warehousing is still in its infancy, lacking an integrated and validated approach for designing and implementing both the logical layer (data models, data flows, and interoperability between components) and the physical layer (technological infrastructure) of these complex systems.

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complexdecision processes, giving special attention to Big Data Warehouses as a way ofefficiently storing and processing batch or streaming data for structured orsemi-structured analytical problems.

List of Figures

List of Tables

xvii

The Authors

xix

Acknowledgments

xxi

Foreword

xxiii

Notation

xxv

1 Introduction

(8)

1.1 Objectives of this Book

(3)

1.2 Intended Audience

(1)

1.3 Book Structure

(2)

2 Big Data Concepts, Techniques, and Technologies

(28)

2.1 Big Data Relevance

(2)

2.2 Big Data Characteristics

(4)

2.3 Big Data Challenges

(5)

2.3.1 Big Data General Dilemmas

(1)

2.3.2 Challenges in the Big Data Life Cycle

(2)

2.3.3 Big Data in Secure, Private, and Monitored Environments

(1)

2.3.4 Organizational Change

(1)

2.4 Techniques for Big Data Solutions

(9)

2.4.1 Big Data Life Cycle and Requirements

(1)

2.4.1.1 General Steps to Process and Analyze Big Data

(2)

2.4.1.2 Architectural and Infrastructural Requirements

(2)

2.4.2 The Lambda Architecture

(1)

2.4.3 Towards Standardization: the NIST Reference Architecture

(2)

2.5 Big Data Technologies

(7)

2.5.1 Hadoop and Related Projects

(2)

2.5.2 Landscape of Distributed SQL Engines

(3)

2.5.3 Other Technologies for Big Data Analytics

(2)

3 OLTP-oriented Databases for Big Data Environments

(56)

3.1 NoSQL and NewSQL: an Overview

(3)

3.2 NoSQL Databases

(47)

3.2.1 Key-value Databases

(1)

3.2.1.1 Overview

(1)

3.2.1.2 Redis

(7)

3.2.2 Column-oriented Databases

(1)

3.2.2.1 Overview

(1)

3.2.2.2 HBase

(6)

3.2.2.3 From Relational Models to HBase Data Models

(12)

3.2.3 Document-oriented Databases

(1)

3.2.3.1 Overview

(2)

3.2.3.2 MongoDB

(8)

3.2.4 Graph Databases

(1)

3.2.4.1 Overview

(3)

3.2.4.2 Neo4j

(6)

3.3 NewSQL Databases and Translytical Databases

(5)

4 OLAP-oriented Databases for Big Data Environments

(50)

4.1 Hive: the De Facto SQL-on-Hadoop Engine

(25)

4.1.1 Data Storage Formats

(1)

4.1.1.1 Text File

(1)

4.1.1.2 Sequence File

100

(5)

4.1.1.3 RCFile

105

(2)

4.1.1.4 ORCFile

107

(4)

4.1.1.5 AvroFile

111

(1)

4.1.1.6 Parquet

112

(1)

4.1.2 Partitions and Buckets

113

(6)

4.2 From Dimensional Models to Tabular Models

119

(12)

4.2.1 Primary Data Tables

121

(4)

4.2.2 Derived Data Tables

125

(6)

4.3 Optimizing OLAP workloads with Druid

131

(12)

5 Design and Implementation of Big Data Warehouses

143

(34)

5.1 Big Data Warehousing: an Overview

144

(3)

5.2 Model of Logical Components and Data Flows

147

(11)

5.2.1 Data Provider and Data Consumer

149

(1)

5.2.2 Big Data Application Provider

149

(2)

5.2.3 Big Data Framework Provider

151

(1)

5.2.3.1 Messaging/Communications, Resource Management, and Infrastructures

152

(1)

5.2.3.2 Processing

153

(1)

5.2.3.3 Storage: Data Organization and Distribution

154

(3)

5.2.4 System Orchestrator and Security, Privacy, and Management

157

(1)

5.3 Model of Technological Infrastructure

158

(5)

5.4 Method for Data Modeling

163

(14)

5.4.1 Analytical Objects and their Related Concepts

164

(3)

5.4.2 Joining, Uniting, and Materializing Analytical Objects

167

(2)

5.4.3 Dimensional Big Data with Outsourced Descriptive Families

169

(2)

5.4.4 Data Modeling Best Practices

171

(1)

5.4.4.1 Using Null Values

171

(1)

5.4.4.2 Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes

172

(1)

5.4.4.3 Immutable vs. Mutable Records

173

(1)

5.4.5 Data Modeling Advantages and Disadvantages

174

(3)

6 Big Data Warehouses Modeling: From Theory to Practice

177

(20)

6.1 Multinational Bicycle Wholesale and Manufacturing

178

(5)

6.1.1 Fully Flat or Fully Dimensional Data Models

180

(1)

6.1.2 Nested Attributes

181

(1)

6.1.3 Streaming and Random Access on Mutable Analytical Objects

182

(1)

6.2 Brokerage Firm

183

(5)

6.2.1 Unnecessary Complementary Analytical Objects and Update Problems

183

(2)

6.2.1.1 The Traditional Way of Handling SCD-like Scenarios

185

(1)

6.2.1.2 A New Way of Handling SCD-like Scenarios

185

(1)

6.2.2 Joining Complementary Analytical Objects

186

(1)

6.2.3 Data Science Models and Insights as a Core Value

186

(1)

6.2.4 Partition Keys for Streaming and Batch Analytical Objects

187

(1)

6.3 Retail

188

(4)

6.3.1 Simpler Data Models: Dynamic Partitioning Schemas

189

(1)

6.3.2 Considerations for Spatial Objects

189

(1)

6.3.3 Analyzing Non-Existing Events

190

(1)

6.3.4 Wide Descriptive Families

190

(1)

6.3.5 The Need for Joins in Data CPE Workloads

191

(1)

6.4 Code Version Control System

192

(1)

6.5 A Global Database of Society - The GDELT Project

193

(1)

6.6 Air Quality

194

(3)

7 Fueling Analytical Objects in Big Data Warehouses

197

(22)

7.1 From Traditional Data Warehouses

198

(2)

7.2 From OLTP NoSQL Databases

200

(2)

7.3 From Semi-structured Data Sources

202

(2)

7.4 From Streaming Data Sources

204

(6)

7.5 Using Data Science Models

210

(9)

7.5.1 Data Mining/Machine Learning Models for Structured Data

211

(5)

7.5.2 Text Mining, Image Mining, and Video Mining Models

216

(3)

8 Evaluating the Performance of Big Data Warehouses

219

(26)

8.1 The SSB+Benchmark

220

(3)

8.1.1 Data Model and Queries

220

(1)

8.1.2 System Architecture and Infrastructure

221

(2)

8.2 Batch OLAP

223

(13)

8.2.1 Comparing Flat Analytical Objects with Star Schemas

223

(4)

8.2.2 Improving Performance with Adequate Data Partitioning

227

(3)

8.2.3 The Impact of Dimensions' Size in Star Schemas

230

(2)

8.2.4 The Impact of Nested Structures in Analytical Objects

232

(2)

8.2.5 Drill Across Queries and Window and Analytics Functions

234

(2)

8.3 Streaming OLAP

236

(6)

8.3.1 The Impact of Data Volume in the Streaming Storage Component

236

(3)

8.3.2 Considerations for Effective and Efficient Streaming OLAP

239

(3)

8.4 SQL-on-Hadoop Systems under Multi-User Environments

242

(3)

9 Big Data Warehousing in Smart Cities

245

(18)

9.1 Logical Components, Data Flows, and Technological Infrastructure

246

(5)

9.1.1 SusCity Architecture

247

(3)

9.1.2 SusCity Infrastructure

250

(1)

9.2 SusCity Data Model

251

(4)

9.2.1 Buildings Characteristics as an Outsourced Descriptive Family

254

(1)

9.2.2 Nested Structures in Analytical Objects

255

(1)

9.3 The Inter-storage Pipeline

255

(1)

9.4 The SusCity Data Visualization Platform

256

(7)

9.4.1 City's Energy Consumption

257

(1)

9.4.2 City's Energy Grid Simulations

258

(1)

9.4.3 Buildings' Performance Analysis and Simulation

258

(2)

9.4.4 Mobility Patterns Analysis

260

(3)

10 Conclusion

263

(8)

10.1 Synopsis of the Book

265

(5)

10.2 Contributions to the State of the Art

270

(1)

References

271

(10)

Index

281

Maribel Yasmina Santos, Carlos Costa

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97810007940382e.html

Märksõnad:

E-raamat: Big Data: Concepts, Warehousing, and Analytics

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv