Muutke küpsiste eelistusi

E-raamat: Professional Hadoop [Wiley Online]

  • Formaat: 216 pages
  • Ilmumisaeg: 01-Jul-2016
  • Kirjastus: Wrox Press
  • ISBN-10: 1119281326
  • ISBN-13: 9781119281320
Teised raamatud teemal:
  • Wiley Online
  • Hind: 52,87 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Formaat: 216 pages
  • Ilmumisaeg: 01-Jul-2016
  • Kirjastus: Wrox Press
  • ISBN-10: 1119281326
  • ISBN-13: 9781119281320
Teised raamatud teemal:
The professional's one-stop guide to this open-source, Java-based big data framework Professional Hadoop is the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings. Written by an expert team of certified Hadoop developers, committers, and Summit speakers, this book details every key aspect of Hadoop technology to enable optimal processing of large data sets. Designed expressly for the professional developer, this book skips over the basics of database development to get you acquainted with the framework's processes and capabilities right away. The discussion covers each key Hadoop component individually, culminating in a sample application that brings all of the pieces together to illustrate the cooperation and interplay that make Hadoop a major big data solution. Coverage includes everything from storage and security to computing and user experience, with expert guidance on integrating other software and more.

Hadoop is quickly reaching significant market usage, and more and more developers are being called upon to develop big data solutions using the Hadoop framework. This book covers the process from beginning to end, providing a crash course for professionals needing to learn and apply Hadoop quickly.





Configure storage, UE, and in-memory computing Integrate Hadoop with other programs including Kafka and Storm Master the fundamentals of Apache Big Top and Ignite Build robust data security with expert tips and advice

Hadoop's popularity is largely due to its accessibility. Open-source and written in Java, the framework offers almost no barrier to entry for experienced database developers already familiar with the skills and requirements real-world programming entails. Professional Hadoop gives you the practical information and framework-specific skills you need quickly.
Introduction xix
Chapter 1 Hadoop Introduction
1(14)
Business Analytics and Big Data
2(1)
The Components of Hadoop
2(1)
The Distributed File System (HDFS)
2(1)
What Is MapReduce?
3(1)
What Is YARN?
4(1)
What Is ZooKeeper?
4(1)
What Is Hive?
5(1)
Integration with Other Systems
6(7)
The Hadoop Ecosystem
7(2)
Data Integration and Hadoop
9(4)
Summary
13(2)
Chapter 2 Storage
15(32)
Basics of Hadoop HDFS
16(10)
Concept
16(3)
Architecture
19(3)
Interface
22(4)
Setting Up the HDFS Cluster in Distributed Mode
26(4)
Install
26(4)
Advanced Features of HDFS
30(11)
Snapshots
30(2)
Offline Viewer
32(5)
Tiered Storage
37(2)
Erasure Coding
39(2)
File Format
41(3)
Cloud Storage
44(1)
Summary
45(2)
Chapter 3 Computation
47(20)
Basics of Hadoop MapReduce
47(7)
Concept
48(2)
Architecture
50(4)
How to Launch a MapReduce Job
54(6)
Writing a Map Task
55(1)
Writing a Reduce Task
56(1)
Writing a MapReduce Job
57(2)
Configurations
59(1)
Advanced Features of MapReduce
60(4)
Distributed Cache
60(2)
Counter
62(1)
Job History Server
63(1)
The Difference from a Spark Job
64(1)
Summary
65(2)
Chapter 4 User Experience
67(22)
Apache Hive
68(8)
Hive Installation
69(1)
HiveQL
70(3)
UDF/SerDe
73(2)
Hive Tuning
75(1)
Apache Pig
76(3)
Pig Installation
76(1)
Pig Latin
77(2)
UDF
79(1)
Hue
79(2)
Features
80(1)
Apache Oozie
81(7)
Oozie Installation
82(2)
How Oozie Works
84(1)
Workflow/Coordinator
85(3)
Oozie CLI
88(1)
Summary
88(1)
Chapter 5 Integration with Other Systems
89(20)
Apache Sqoop
90(3)
How It Works
90(3)
Apache Flume
93(4)
How It works
93(4)
Apache Kafka
97(5)
How It Works
98(2)
Kafka Connect
100(1)
Stream Processing
101(1)
Apache Storm
102(5)
How It Works
103(2)
Trident
105(1)
Kafka Integration
105(2)
Summary
107(2)
Chapter 6 Hadoop Security
109(32)
Securing the Hadoop Cluster
110(14)
Perimeter Security
110(2)
Authentication Using Kerberos
112(4)
Service Level Authorization in Hadoop
116(3)
Impersonation
119(2)
Securing the HTTP Channel
121(3)
Securing Data
124(10)
Data Classification
125(1)
Bringing Data to the Cluster
125(4)
Protecting Data in the Cluster
129(5)
Securing Applications
134(4)
YARN Architecture
134(1)
Application Submission in YARN
134(4)
Summary
138(3)
Chapter 7 Ecosystem at Large: Hadoop with Apache Bigtop
141(20)
Basics Concepts
142(2)
Software Stacks
142(1)
Test Stacks
143(1)
Works on My Laptop
143(1)
Developing a Custom-Tailored Stack
144(5)
Apache Bigtop: The History
144(1)
Apache Bigtop: The Concept and Philosophy
145(1)
The Structure of the Project
146(1)
Meet the Build System
147(1)
Toolchain and Development Environment
148(1)
BOM Definition
148(1)
Deployment
149(5)
Bigtop Provisioner
149(1)
Master-less Puppet Deployment of a Cluster
150(2)
Configuration Management with Puppet
152(2)
Integration Validation
154(5)
iTests and Validation Applications
154(1)
Stack Integration Test Development
155(2)
Validating the Stack
157(1)
Cluster Failure Tests
158(1)
Smoke the Stack
158(1)
Putting It All Together
159(1)
Summary
159(2)
Chapter 8 In-Memory Computing in Hadoop Stack
161(22)
Introduction to In-Memory Computing
162(2)
Apache Ignite: Memory First
164(6)
System Architecture of Apache Ignite
165(1)
Data Grid
165(2)
A Discourse on High Availability
167(1)
Compute Grid
168(1)
Service Grid
169(1)
Memory Management
169(1)
Persistence Store
170(1)
Legacy Hadoop Acceleration with Ignite
170(5)
Benefits of In-Memory Storage
171(1)
Memory Filesystem: HDFS Caching
171(1)
In-Memory MapReduce
172(3)
Advanced Use of Apache Ignite
175(6)
Spark and Ignite
175(1)
Sharing the State
176(1)
In-Memory SQL on Hadoop
177(1)
SQL with Ignite
178(2)
Streaming with Apache Ignite
180(1)
Summary
181(2)
Glossary 183(4)
Index 187
About the authors

Benoy Antony is an Apache Hadoop Committer and Hadoop Architect at eBay.

Konstantin Boudnik is co-founder and CEO of Memcore.io, and is one of the early developers of Hadoop and a co-author of Apache Bigtop.

Cheryl Adams is a Senior Cloud Data & Infrastructure Architect in the healthcare data realm.

Branky Shao is a software engineer at eBay, and a contributor to the Cascading project.

Cazen Lee is a Software Architect at Samsung SDS.

Kai Sasaki is a Software Engineer at Treasure Data Inc.

Visit us at wrox.com where you have access to free code samples, Programmer to Programmer forums, and discussions on the latest happenings in the industry from around the world.