Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Practical Hive: A Guide to Hadoop's Data Warehouse System

2.50/5 (9 hinnangut Goodreads-ist)

Scott Shaw, David Kjerrumgaard, Ankur Gupta, Andreas Franēois Vermeulen

Formaat: PDF+DRM
Ilmumisaeg: 27-Aug-2016
Kirjastus: APress
Keel: eng
ISBN-13: 9781484202715

Teised raamatud teemal:

Data warehousing

Formaat - PDF+DRM
Hind: 67,91 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 27-Aug-2016
Kirjastus: APress
Keel: eng
ISBN-13: 9781484202715

Teised raamatud teemal:

Data warehousing

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software.

In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data.

What You Will Learn

Install and configure Hive for new and existing datasets

Perform DDL operations

Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices

Who This Book Is For

Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

About the Authors

About the Technical Reviewers

xvii

Acknowledgments

xix

Introduction

xxi

Chapter 1 Setting the Stage for Hive: Hadoop

(22)

An Elephant Is Born

(1)

Hadoop Mechanics

(3)

Data Redundancy

(5)

Traditional High Availability

(2)

Hadoop High Availability

(3)

Processing with MapReduce

(12)

Beyond MapReduce

(1)

YARN and the Modern Data Architecture

(1)

Hadoop and the Open Source Community

(4)

Where Are We Now

(1)

Chapter 2 Introducing Hive

(14)

Hadoop Distributions

(2)

Cluster Architecture

(3)

Hive Installation

(2)

Finding Your Way Around

(3)

Hive CLI

(3)

Chapter 3 Hive Architecture

(12)

Hive Components

(1)

HCatalog

(3)

Hiveserver2

(2)

Client Tools

(3)

Execution Engine: Tez

(3)

Chapter 4 Hive Tables DDL

(28)

Schema-on-Read

(1)

Hive Data Model

(2)

Schemas/Databases

(1)

Why Use Multiple Schemas/Databases

(1)

Creating Databases

(1)

Altering Databases

(1)

Dropping Databases

(1)

List Databases

(1)

Data Types in Hive

(2)

Primitive Data Types

(1)

Choosing Data Types

(1)

Complex Data Types

(1)

Tables

(23)

Creating Tables

(1)

Listing Tables

(1)

Internal/External Tables

(1)

Internal or Managed Tables

(1)

External/Internal Table Example

(4)

Table Properties

(1)

Generating a Create Table Command for Existing Tables

(1)

Partitioning and Bucketing

(2)

Partitioning Considerations

(1)

Efficiently Partitioning on Date Columns

(1)

Bucketing Considerations

(2)

Altering Tables

(1)

ORC File Format

(1)

Altering Table Partitions

(4)

Modifying Columns

(1)

Dropping Tables/Partitions

(1)

Protecting Tables/Partitions

(1)

Other Create Table Command Options

(2)

Chapter 5 Data Manipulation Language (DML)

(22)

Loading Data into Tables

(13)

Loading Data Using Files Stored on the Hadoop Distributed File System

(2)

Loading Data Using Queries

(3)

Writing Data into the File System from Queries

(2)

Inserting Values Directly into Tables

(1)

Updating Data Directly in Tables

(2)

Deleting Data Directly in Tables

(1)

Creating a Table with the Same Structure

(1)

Joins

(9)

Using Equality Joins to Combine Tables

(1)

Using Outer Joins

(3)

Using Left Semi-Joins

(1)

Using Join with Single MapReduce

(1)

Using Largest Table Last

(1)

Transactions

(1)

What Is ACID and Why Use It?

(1)

Hive Configuration

(2)

Chapter 6 Loading Data into Hive

(146)

Design Considerations Before Loading Data

(1)

Loading Data into HDFS

100

(135)

Ambari Files View

100

(2)

Hadoop Command Line

102

(133)

Hadoop Security

235

(1)

Hive Security

235

(4)

Default Authorization Mode

235

(1)

Storage-Based Authorization Mode

236

(1)

SQL Standards-Based Authorization Mode

237

(1)

Managing Access through SQL

238

(1)

Hive Authorization Using Apache Ranger

239

(6)

Accessing the Ranger UI

240

(1)

Creating Ranger Policies

240

(3)

Auditing Using Apache Ranger

243

(2)

Chapter 11 The Future of Hive

245

(4)

LLAP (Live Long and Process)

245

(1)

Hive-onSpark

246

(1)

Hive: ACID and MERGE

246

(1)

Tunable Isolation Levels

246

(1)

ROLAP/Cube-Based Analytics

247

(1)

HiveServer2 Development

247

(1)

Multiple HiveServer2 Instances for Different Workloads

247

(2)

Appendix A: Building a Big Data Team

249

(4)

Minimum Team

249

(1)

Executive Team

249

(1)

Business Team

249

(1)

Technical Team

250

(1)

Expanded Team

250

(2)

Business Team

250

(1)

Technical Team

251

(1)

Work Lifecycle for the Team

252

(1)

Appendix B: Hive Functions

253

(10)

Built-In Functions

253

(1)

Mathematical Functions

253

(2)

Collection Functions

255

(1)

Type-Conversion Functions

255

(1)

Date Functions

256

(1)

Conditional Functions

257

(1)

String Functions

257

(3)

Miscellaneous Functions

260

(1)

Aggregate Functions

260

(2)

User-Defined Functions (UDFs)

262

(1)

Index

263

Scott Shaw has over fifteen years of data management experience. He has worked as both an Oracle and SQL Server DBA. He has worked as a consultant on Microsoft business intelligence projects utilizing both Tabular and OLAP models and co-authored two T-SQL books by Apress. Scott also enjoys speaking across the country about distributed computing, Big Data concepts, business intelligence, Hive, and the value of Hadoop. Scott works as a Sr. Solutions Engineer for Hortonworks and lives in Saint Louis with his wife and two kids. Andreas Francois Vermeulen is Consulting Manager of Business Intelligence, Big Data, Data Science, and Computational Analytics at Sopra-Steria, doctoral researcher at University of Dundee and St Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the "Rapid Information Factory" data processing framework. Active in developing next-generation processing frameworks and mechatronics engineering with over thirty-five years of international experience in data processing, software development and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, and deep learning. Andre took his bachelor's at the North West University at Potchefstroom, his Master of Business Administration at the University of Manchester, Master of Business Intelligence and Data Science at University of Dundee, and Doctor of Philosophy at the University of Dundee and St Andrews. Ankur Gupta is a Senior Solutions Engineer at Hortonworks. He has over fourteen years of experience in data management, working as a Data Architect and Oracle DBA. Before joining the world of big data, he was working as an Oracle Consultant for Investment Banks in the UK. He is a regular speaker on big data concepts, Hive, Hadoop, Oracle in various events and is an author of Oracle Goldengate 11g Complete Cookbook. Ankur has a Masters degree in Computer Science & International Business. He is a Hadoop Certified Administrator & Oracle Certified Professional and lives in London with his wife.

David Kjerrumgaard is a systems architect at Hortonworks. He has 20 years of experience in software development and is a Certified Developer for Apache Hadoop (CCDH). Kjerrumgaard is the author of Data Governance with Apache Falcon and Cloudera Developer Training for Apache Hadoop. He took his BS and MS in Computer Science from Kent State University.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814842027152e.html

Märksõnad:

E-raamat: Practical Hive: A Guide to Hadoop's Data Warehouse System

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv