E-raamat: Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science

  • Formaat: 200 pages
  • Ilmumisaeg: 21-Feb-2019
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491931523
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 60,47 EUR*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa soovinimekirja
  • Lisa ostukorvi
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks.
  • Formaat: 200 pages
  • Ilmumisaeg: 21-Feb-2019
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491931523
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    E-raamatu lugemiseks on vaja luua Adobe ID ning laadida arvutisse Adobe Digital Editions. Lähemalt siit. E-raamatut saab lugeda ning alla laadida kuni 6'de seadmesse.
    E-raamatut ei saa lugeda Amazon Kindle's. Ülejäänud meie e-poes pakutavad e-lugerid võimaldavad lugeda Adobe ID-ga kaitstud e-raamatuid.

The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You'll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book. Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you'll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries. Get a succinct introduction to data warehousing, big data, and data science Learn various paths enterprises take to build a data lake Explore how to build a self-service model and best practices for providing analysts access to the data Use different methods for architecting your data lake Discover ways to implement a data lake from experts in different industries
Preface ix
1 Introduction to Data Lakes
1(24)
Data Lake Maturity
3(4)
Data Puddles
5(1)
Data Ponds
6(1)
Creating a Successful Data Lake
7(5)
The Right Platform
7(1)
The Right Data
8(1)
The Right Interface
9(2)
The Data Swamp
11(1)
Roadmap to Data Lake Success
12(8)
Standing Up a Data Lake
13(1)
Organizing the Data Lake
14(1)
Setting Up the Data Lake for Self-Service
15(5)
Data Lake Architectures
20(4)
Data Lakes in the Public Cloud
20(1)
Logical Data Lakes
21(3)
Conclusion
24(1)
2 Historical Perspective
25(24)
The Drive for Self-Service Data---The Birth of Databases
25(3)
The Analytics Imperative---The Birth of Data Warehousing
28(1)
The Data Warehouse Ecosystem
29(18)
Storing and Querying the Data
31(6)
Loading the Data---Data Integration Tools
37(4)
Organizing and Managing the Data
41(5)
Consuming the Data
46(1)
Conclusion
47(2)
3 Introduction to Big Data and Data Science
49(14)
Hadoop Leads the Historic Shift to Big Data
50(5)
The Hadoop File System
50(1)
How Processing and Storage Interact in a MapReduce Job
51(2)
Schema on Read
53(1)
Hadoop Projects
53(2)
Data Science
55(1)
What Should Your Analytics Organization Focus On?
56(3)
Machine Learning
59(3)
Explainability
60(1)
Change Management
61(1)
Conclusion
62(1)
4 Starting a Data Lake
63(12)
The What and Why of Hadoop
63(3)
Preventing Proliferation of Data Puddles
66(1)
Taking Advantage of Big Data
67(7)
Leading with Data Science
67(3)
Strategy 1 Offload Existing Functionality
70(1)
Strategy 2 Data Lakes for New Projects
71(1)
Strategy 3 Establish a Central Point of Governance
72(1)
Which Way Is Right for You?
73(1)
Conclusion
74(1)
5 From Data Ponds/Big Data Warehouses to Data Lakes
75(22)
Essential Functions of a Data Warehouse
76(3)
Dimensional Modeling for Analytics
77(1)
Integrating Data from Disparate Sources
78(1)
Preserving History Using Slowly Changing Dimensions
78(1)
Limitations of the Data Warehouse as a Historical Repository
78(1)
Moving to a Data Pond
79(4)
Keeping History in a Data Pond
79(2)
Implementing Slowly Changing Dimensions in a Data Pond
81(2)
Growing Data Ponds into a Data Lake---Loading Data That's Not in the Data Warehouse
83(4)
Raw Data
83(1)
External Data
84(2)
Internet of Things (IoT) and Other Streaming Data
86(1)
Real-Time Data Lakes
87(2)
The Lambda Architecture
89(1)
Data Transformations
90(2)
Target Systems
92(3)
Data Warehouses
93(1)
Operational Data Stores
93(1)
Real-Time Applications and Data Products
93(2)
Conclusion
95(2)
6 Optimizing for Self-Service
97(24)
The Beginnings of Self-Service
98(2)
Business Analysts
100(13)
Finding and Understanding Data---Documenting the Enterprise
101(2)
Establishing Trust
103(7)
Provisioning
110(2)
Preparing Data for Analysis
112(1)
Data Wrangling in the Data Lake
113(3)
Situating Data Preparation in Hadoop
113(1)
Common Use Cases for Data Preparation
114(2)
Analyzing and Visualizing
116(1)
The New World of Self-Service Business Intelligence
116(4)
The New Analytic Workflow
117(1)
Gatekeepers to Shopkeepers
118(1)
Governing Self-Service
119(1)
Conclusion
120(1)
7 Architecting the Data Lake
121(16)
Organizing the Data Lake
121(6)
Landing or Raw Zone
123(1)
Gold Zone
123(2)
Work Zone
125(1)
Sensitive Zone
125(2)
Multiple Data Lakes
127(2)
Advantages of Keeping Data Lakes Separate
127(1)
Advantages of Merging the Data Lakes
128(1)
Cloud Data Lakes
129(2)
Virtual Data Lakes
131(5)
Data Federation
131(1)
Big Data Virtualization
132(2)
Eliminating Redundancy
134(2)
Conclusion
136(1)
8 Cataloging the Data Lake
137(20)
Organizing the Data
137(8)
Technical Metadata
138(5)
Business Metadata
143(2)
Tagging
145(2)
Automated Cataloging
146(1)
Logical Data Management
147(4)
Sensitive Data Management and Access Control
147(2)
Data Quality
149(2)
Relating Disparate Data
151(1)
Establishing Lineage
152(1)
Data Provisioning
153(1)
Tools for Building a Catalog
154(2)
Tool Comparison
155(1)
The Data Ocean
156(1)
Conclusion
156(1)
9 Governing Data Access
157(22)
Authorization or Access Control
158(1)
Tag-Based Data Access Policies
159(3)
Deidentifying Sensitive Data
162(5)
Data Sovereignty and Regulatory Compliance
165(2)
Self-Service Access Management
167(10)
Provisioning Data
171(6)
Conclusion
177(2)
10 Industry-Specific Perspectives
179(18)
Big Data in Financial Services
180(10)
Consumers, Digitization, and Data Are Changing Finance as We Know It
180(2)
Saving the Bank
182(3)
New Opportunities Offered by New Data
185(3)
Key Processes in Making Use of the Data Lake
188(2)
Value Added by Data Lakes in Financial Services
190(2)
Data Lakes in the Insurance Industry
192(1)
Smart Cities
193(2)
Big Data in Medicine
195(2)
Index 197
Alex is CEO and Founder of Waterline Data. As founder of three startups, Alex spent his career inventing cutting edge data-oriented technology and bringing it to market. Prior to Waterline Data, Alex served as GM of Informatica's Data Quality Business Unit, driving Marketing, Product Management and R&D for an $80M business. Also for Informatica, Alex managed a team of 400 engineers and product managers as SVP of R&D for Core Technology, developing Informatica's platform and Data Integration technology. Alex joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Information Integration team. IBM acquired Alex's second startup, Exeros, where he was founder, CTO and VP of Engineering. Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology (acquired by Business Objects and now marketed as SAP Business Objects Data Services). Prior to founding Acta, Alex managed development of Replication Server at Sybase and worked on Sybase's strategy for enterprise application integration (EAI). Earlier, he developed the database kernel for Amdahl's Design Automation group. Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University