Muutke küpsiste eelistusi

E-raamat: Practical Hive: A Guide to Hadoop's Data Warehouse System

  • Formaat: PDF+DRM
  • Ilmumisaeg: 27-Aug-2016
  • Kirjastus: APress
  • Keel: eng
  • ISBN-13: 9781484202715
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 67,91 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 27-Aug-2016
  • Kirjastus: APress
  • Keel: eng
  • ISBN-13: 9781484202715
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software.

In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. 



What You Will Learn











Install and configure Hive for new and existing datasets

Perform DDL operations

Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices







Who This Book Is For



Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL. 
About the Authors xv
About the Technical Reviewers xvii
Acknowledgments xix
Introduction xxi
Chapter 1 Setting the Stage for Hive: Hadoop 1(22)
An Elephant Is Born
2(1)
Hadoop Mechanics
3(3)
Data Redundancy
6(5)
Traditional High Availability
6(2)
Hadoop High Availability
8(3)
Processing with MapReduce
11(12)
Beyond MapReduce
16(1)
YARN and the Modern Data Architecture
17(1)
Hadoop and the Open Source Community
18(4)
Where Are We Now
22(1)
Chapter 2 Introducing Hive 23(14)
Hadoop Distributions
24(2)
Cluster Architecture
26(3)
Hive Installation
29(2)
Finding Your Way Around
31(3)
Hive CLI
34(3)
Chapter 3 Hive Architecture 37(12)
Hive Components
37(1)
HCatalog
38(3)
Hiveserver2
41(2)
Client Tools
43(3)
Execution Engine: Tez
46(3)
Chapter 4 Hive Tables DDL 49(28)
Schema-on-Read
49(1)
Hive Data Model
50(2)
Schemas/Databases
50(1)
Why Use Multiple Schemas/Databases
50(1)
Creating Databases
50(1)
Altering Databases
51(1)
Dropping Databases
51(1)
List Databases
52(1)
Data Types in Hive
52(2)
Primitive Data Types
52(1)
Choosing Data Types
52(1)
Complex Data Types
53(1)
Tables
54(23)
Creating Tables
55(1)
Listing Tables
55(1)
Internal/External Tables
56(1)
Internal or Managed Tables
56(1)
External/Internal Table Example
57(4)
Table Properties
61(1)
Generating a Create Table Command for Existing Tables
62(1)
Partitioning and Bucketing
62(2)
Partitioning Considerations
64(1)
Efficiently Partitioning on Date Columns
65(1)
Bucketing Considerations
66(2)
Altering Tables
68(1)
ORC File Format
69(1)
Altering Table Partitions
70(4)
Modifying Columns
74(1)
Dropping Tables/Partitions
74(1)
Protecting Tables/Partitions
75(1)
Other Create Table Command Options
75(2)
Chapter 5 Data Manipulation Language (DML) 77(22)
Loading Data into Tables
77(13)
Loading Data Using Files Stored on the Hadoop Distributed File System
78(2)
Loading Data Using Queries
80(3)
Writing Data into the File System from Queries
83(2)
Inserting Values Directly into Tables
85(1)
Updating Data Directly in Tables
86(2)
Deleting Data Directly in Tables
88(1)
Creating a Table with the Same Structure
89(1)
Joins
90(9)
Using Equality Joins to Combine Tables
90(1)
Using Outer Joins
91(3)
Using Left Semi-Joins
94(1)
Using Join with Single MapReduce
95(1)
Using Largest Table Last
96(1)
Transactions
97(1)
What Is ACID and Why Use It?
97(1)
Hive Configuration
97(2)
Chapter 6 Loading Data into Hive 99(146)
Design Considerations Before Loading Data
99(1)
Loading Data into HDFS
100(135)
Ambari Files View
100(2)
Hadoop Command Line
102(133)
Hadoop Security
235(1)
Hive Security
235(4)
Default Authorization Mode
235(1)
Storage-Based Authorization Mode
236(1)
SQL Standards-Based Authorization Mode
237(1)
Managing Access through SQL
238(1)
Hive Authorization Using Apache Ranger
239(6)
Accessing the Ranger UI
240(1)
Creating Ranger Policies
240(3)
Auditing Using Apache Ranger
243(2)
Chapter 11 The Future of Hive 245(4)
LLAP (Live Long and Process)
245(1)
Hive-onSpark
246(1)
Hive: ACID and MERGE
246(1)
Tunable Isolation Levels
246(1)
ROLAP/Cube-Based Analytics
247(1)
HiveServer2 Development
247(1)
Multiple HiveServer2 Instances for Different Workloads
247(2)
Appendix A: Building a Big Data Team 249(4)
Minimum Team
249(1)
Executive Team
249(1)
Business Team
249(1)
Technical Team
250(1)
Expanded Team
250(2)
Business Team
250(1)
Technical Team
251(1)
Work Lifecycle for the Team
252(1)
Appendix B: Hive Functions 253(10)
Built-In Functions
253(1)
Mathematical Functions
253(2)
Collection Functions
255(1)
Type-Conversion Functions
255(1)
Date Functions
256(1)
Conditional Functions
257(1)
String Functions
257(3)
Miscellaneous Functions
260(1)
Aggregate Functions
260(2)
User-Defined Functions (UDFs)
262(1)
Index 263
Scott Shaw has over fifteen years of data management experience.  He has worked as both an Oracle and SQL Server DBA. He has worked as a consultant on Microsoft business intelligence projects utilizing both Tabular and OLAP models and co-authored two T-SQL books by Apress. Scott also enjoys speaking across the country about distributed computing, Big Data concepts, business intelligence, Hive, and the value of Hadoop. Scott works as a Sr. Solutions Engineer for Hortonworks and lives in Saint Louis with his wife and two kids. Andreas Francois Vermeulen is Consulting Manager of Business Intelligence, Big Data, Data Science, and Computational Analytics at Sopra-Steria, doctoral researcher at University of Dundee and St Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the "Rapid Information Factory" data processing framework. Active in developing next-generation processing frameworks and mechatronics engineering with over thirty-five years of international experience in data processing, software development and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, and deep learning. Andre took his bachelor's at the North West University at Potchefstroom, his Master of Business Administration at the University of Manchester, Master of Business Intelligence and Data Science at University of Dundee, and Doctor of Philosophy at the University of Dundee and St Andrews. Ankur Gupta is a Senior Solutions Engineer at Hortonworks. He has over fourteen years of experience in data management, working as a Data Architect and Oracle DBA. Before joining the world of big data, he was working as an Oracle Consultant for Investment Banks in the UK. He is a regular speaker on big data concepts, Hive, Hadoop, Oracle in various events and is an author of Oracle Goldengate 11g Complete Cookbook. Ankur has a Masters degree in Computer Science & International Business. He is a Hadoop Certified Administrator & Oracle Certified Professional and lives in London with his wife.

David Kjerrumgaard is a systems architect at Hortonworks. He has 20 years of experience in software development and is a Certified Developer for Apache Hadoop (CCDH). Kjerrumgaard is the author of Data Governance with Apache Falcon and Cloudera Developer Training for Apache Hadoop. He took his BS and MS in Computer Science from Kent State University.