Muutke küpsiste eelistusi
  • Formaat - PDF+DRM
  • Hind: 100,87 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: 314 pages
  • Ilmumisaeg: 01-Sep-2022
  • Kirjastus: River Publishers
  • ISBN-13: 9781000794038

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Big Data is a concept of major relevance in today's world, sometimes highlighted as a key asset for productivity, growth, innovation, and customer relationships. Its popularity has increased considerably during recent years. Areas like smart cities, manufacturing, retail, finance, software development, environment, digital media, among others, can benefit from the collection, storage, processing, and analysis of Big Data, leveraging unprecedented data-driven workflows and considerably improved decision-making processes.

The concept of a Big Data Warehouse (BDW) is emerging as either an augmentation or a replacement of the traditional Data Warehouse (DW), a concept that has a long history as one of the most valuable enterprise data assets. Nevertheless, research in Big Data Warehousing is still in its infancy, lacking an integrated and validated approach for designing and implementing both the logical layer (data models, data flows, and interoperability between components) and the physical layer (technological infrastructure) of these complex systems.

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complexdecision processes, giving special attention to Big Data Warehouses as a way ofefficiently storing and processing batch or streaming data for structured orsemi-structured analytical problems.
List of Figures
xi
List of Tables
xvii
The Authors xix
Acknowledgments xxi
Foreword xxiii
Notation xxv
1 Introduction
1(8)
1.1 Objectives of this Book
4(3)
1.2 Intended Audience
7(1)
1.3 Book Structure
7(2)
2 Big Data Concepts, Techniques, and Technologies
9(28)
2.1 Big Data Relevance
10(2)
2.2 Big Data Characteristics
12(4)
2.3 Big Data Challenges
16(5)
2.3.1 Big Data General Dilemmas
16(1)
2.3.2 Challenges in the Big Data Life Cycle
17(2)
2.3.3 Big Data in Secure, Private, and Monitored Environments
19(1)
2.3.4 Organizational Change
20(1)
2.4 Techniques for Big Data Solutions
21(9)
2.4.1 Big Data Life Cycle and Requirements
23(1)
2.4.1.1 General Steps to Process and Analyze Big Data
23(2)
2.4.1.2 Architectural and Infrastructural Requirements
25(2)
2.4.2 The Lambda Architecture
27(1)
2.4.3 Towards Standardization: the NIST Reference Architecture
28(2)
2.5 Big Data Technologies
30(7)
2.5.1 Hadoop and Related Projects
30(2)
2.5.2 Landscape of Distributed SQL Engines
32(3)
2.5.3 Other Technologies for Big Data Analytics
35(2)
3 OLTP-oriented Databases for Big Data Environments
37(56)
3.1 NoSQL and NewSQL: an Overview
38(3)
3.2 NoSQL Databases
41(47)
3.2.1 Key-value Databases
41(1)
3.2.1.1 Overview
41(1)
3.2.1.2 Redis
42(7)
3.2.2 Column-oriented Databases
49(1)
3.2.2.1 Overview
50(1)
3.2.2.2 HBase
51(6)
3.2.2.3 From Relational Models to HBase Data Models
57(12)
3.2.3 Document-oriented Databases
69(1)
3.2.3.1 Overview
69(2)
3.2.3.2 MongoDB
71(8)
3.2.4 Graph Databases
79(1)
3.2.4.1 Overview
79(3)
3.2.4.2 Neo4j
82(6)
3.3 NewSQL Databases and Translytical Databases
88(5)
4 OLAP-oriented Databases for Big Data Environments
93(50)
4.1 Hive: the De Facto SQL-on-Hadoop Engine
94(25)
4.1.1 Data Storage Formats
98(1)
4.1.1.1 Text File
99(1)
4.1.1.2 Sequence File
100(5)
4.1.1.3 RCFile
105(2)
4.1.1.4 ORCFile
107(4)
4.1.1.5 AvroFile
111(1)
4.1.1.6 Parquet
112(1)
4.1.2 Partitions and Buckets
113(6)
4.2 From Dimensional Models to Tabular Models
119(12)
4.2.1 Primary Data Tables
121(4)
4.2.2 Derived Data Tables
125(6)
4.3 Optimizing OLAP workloads with Druid
131(12)
5 Design and Implementation of Big Data Warehouses
143(34)
5.1 Big Data Warehousing: an Overview
144(3)
5.2 Model of Logical Components and Data Flows
147(11)
5.2.1 Data Provider and Data Consumer
149(1)
5.2.2 Big Data Application Provider
149(2)
5.2.3 Big Data Framework Provider
151(1)
5.2.3.1 Messaging/Communications, Resource Management, and Infrastructures
152(1)
5.2.3.2 Processing
153(1)
5.2.3.3 Storage: Data Organization and Distribution
154(3)
5.2.4 System Orchestrator and Security, Privacy, and Management
157(1)
5.3 Model of Technological Infrastructure
158(5)
5.4 Method for Data Modeling
163(14)
5.4.1 Analytical Objects and their Related Concepts
164(3)
5.4.2 Joining, Uniting, and Materializing Analytical Objects
167(2)
5.4.3 Dimensional Big Data with Outsourced Descriptive Families
169(2)
5.4.4 Data Modeling Best Practices
171(1)
5.4.4.1 Using Null Values
171(1)
5.4.4.2 Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes
172(1)
5.4.4.3 Immutable vs. Mutable Records
173(1)
5.4.5 Data Modeling Advantages and Disadvantages
174(3)
6 Big Data Warehouses Modeling: From Theory to Practice
177(20)
6.1 Multinational Bicycle Wholesale and Manufacturing
178(5)
6.1.1 Fully Flat or Fully Dimensional Data Models
180(1)
6.1.2 Nested Attributes
181(1)
6.1.3 Streaming and Random Access on Mutable Analytical Objects
182(1)
6.2 Brokerage Firm
183(5)
6.2.1 Unnecessary Complementary Analytical Objects and Update Problems
183(2)
6.2.1.1 The Traditional Way of Handling SCD-like Scenarios
185(1)
6.2.1.2 A New Way of Handling SCD-like Scenarios
185(1)
6.2.2 Joining Complementary Analytical Objects
186(1)
6.2.3 Data Science Models and Insights as a Core Value
186(1)
6.2.4 Partition Keys for Streaming and Batch Analytical Objects
187(1)
6.3 Retail
188(4)
6.3.1 Simpler Data Models: Dynamic Partitioning Schemas
189(1)
6.3.2 Considerations for Spatial Objects
189(1)
6.3.3 Analyzing Non-Existing Events
190(1)
6.3.4 Wide Descriptive Families
190(1)
6.3.5 The Need for Joins in Data CPE Workloads
191(1)
6.4 Code Version Control System
192(1)
6.5 A Global Database of Society - The GDELT Project
193(1)
6.6 Air Quality
194(3)
7 Fueling Analytical Objects in Big Data Warehouses
197(22)
7.1 From Traditional Data Warehouses
198(2)
7.2 From OLTP NoSQL Databases
200(2)
7.3 From Semi-structured Data Sources
202(2)
7.4 From Streaming Data Sources
204(6)
7.5 Using Data Science Models
210(9)
7.5.1 Data Mining/Machine Learning Models for Structured Data
211(5)
7.5.2 Text Mining, Image Mining, and Video Mining Models
216(3)
8 Evaluating the Performance of Big Data Warehouses
219(26)
8.1 The SSB+Benchmark
220(3)
8.1.1 Data Model and Queries
220(1)
8.1.2 System Architecture and Infrastructure
221(2)
8.2 Batch OLAP
223(13)
8.2.1 Comparing Flat Analytical Objects with Star Schemas
223(4)
8.2.2 Improving Performance with Adequate Data Partitioning
227(3)
8.2.3 The Impact of Dimensions' Size in Star Schemas
230(2)
8.2.4 The Impact of Nested Structures in Analytical Objects
232(2)
8.2.5 Drill Across Queries and Window and Analytics Functions
234(2)
8.3 Streaming OLAP
236(6)
8.3.1 The Impact of Data Volume in the Streaming Storage Component
236(3)
8.3.2 Considerations for Effective and Efficient Streaming OLAP
239(3)
8.4 SQL-on-Hadoop Systems under Multi-User Environments
242(3)
9 Big Data Warehousing in Smart Cities
245(18)
9.1 Logical Components, Data Flows, and Technological Infrastructure
246(5)
9.1.1 SusCity Architecture
247(3)
9.1.2 SusCity Infrastructure
250(1)
9.2 SusCity Data Model
251(4)
9.2.1 Buildings Characteristics as an Outsourced Descriptive Family
254(1)
9.2.2 Nested Structures in Analytical Objects
255(1)
9.3 The Inter-storage Pipeline
255(1)
9.4 The SusCity Data Visualization Platform
256(7)
9.4.1 City's Energy Consumption
257(1)
9.4.2 City's Energy Grid Simulations
258(1)
9.4.3 Buildings' Performance Analysis and Simulation
258(2)
9.4.4 Mobility Patterns Analysis
260(3)
10 Conclusion
263(8)
10.1 Synopsis of the Book
265(5)
10.2 Contributions to the State of the Art
270(1)
References 271(10)
Index 281
Maribel Yasmina Santos, Carlos Costa