Update cookies preferences

E-book: Kafka in Action

3.74/5 (73 ratings by Goodreads)
  • Format: 272 pages
  • Pub. Date: 22-Mar-2022
  • Publisher: Manning Publications
  • Language: eng
  • ISBN-13: 9781638356196
  • Format - EPUB+DRM
  • Price: 38,73 €*
  • * the price is final i.e. no additional discount will apply
  • Add to basket
  • Add to Wishlist
  • This ebook is for personal use only. E-Books are non-refundable.
  • Format: 272 pages
  • Pub. Date: 22-Mar-2022
  • Publisher: Manning Publications
  • Language: eng
  • ISBN-13: 9781638356196

DRM restrictions

  • Copying (copy/paste):

    not allowed

  • Printing:

    not allowed

  • Usage:

    Digital Rights Management (DRM)
    The publisher has supplied this book in encrypted form, which means that you need to install free software in order to unlock and read it.  To read this e-book you have to create Adobe ID More info here. Ebook can be read and downloaded up to 6 devices (single user with the same Adobe ID).

    Required software
    To read this ebook on a mobile device (phone or tablet) you'll need to install this free app: PocketBook Reader (iOS / Android)

    To download and read this eBook on a PC or Mac you need Adobe Digital Editions (This is a free app specially developed for eBooks. It's not the same as Adobe Reader, which you probably already have on your computer.)

    You can't read this ebook with Amazon Kindle

Kafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more.

 

In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue.

 

Key Features

·   Understanding Kafka's concepts

·   Implementing Kafka as a message queue

·   Setting up and executing basic ETL tasks

·   Recording and consuming streaming data

·   Working with Kafka producers and consumers from Java applications

·   Using Kafka as part of a large data project team

·   Performing Kafka developer and admin tasks

 

Written for intermediate Java developers or data engineers. No prior knowledge of Kafka is required.

 

About the technology

Apache Kafka is a distributed streaming platform for logging and streaming data between services or applications. With Kafka, it's easy to build applications that can act on or react to data streams as they flow through your system. Operational data monitoring, large scale message processing, website activity tracking, log aggregation, and more are all possible with Kafka.

 

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.
Foreword xv
Preface xvi
Acknowledgments xviii
About This Book xx
About The Authors xxiii
About The Cover Illustration xxiv
Part 1 Getting Started
1 Introduction to Kafka
3(14)
1.1 What is Kafka?
4(4)
1.2 Kafka usage
8(2)
Kafka for the developer
8(1)
Explaining Kafka to your manager
9(1)
1.3 Kafka myths
10(1)
Kafka only works with Hadoop®
10(1)
Kafka is the same as other message brokers
11(1)
1.4 Kafka in the real world
11(4)
Early examples
12(1)
Later examples
13(1)
When Kafka might not be the right fit
14(1)
1.5 Online resources to get started
15(1)
References
15(2)
2 Getting to know Kafka
17(24)
2.1 Producing and consuming a message
18(1)
2.2 What are brokers?
18(5)
2.3 Tour of Kafka
23(7)
Producers and consumers
23(3)
Topics overview
26(1)
ZooKeeper usage
27(1)
Kafka's high-level architecture
28(1)
The commit log
29(1)
2.4 Various source code packages and what they do
30(3)
Kafka Streams
30(1)
Kafka Connect
31(1)
AdminClient package
32(1)
ksqlDB
32(1)
2.5 Confluent clients
33(3)
2.6 Stream processing and terminology
36(3)
Stream processing
37(1)
What exactly-once means
38(1)
References
39(2)
Part 2 Applying Kafka 41(138)
3 Designing a Kafka project
43(23)
3.1 Designing a Kafka project
44(5)
Taking over an existing data architecture
44(1)
A first change
44(1)
Built-in features
44(3)
Data for our invoices
47(2)
3.2 Sensor event design
49(8)
Existing issues
49(2)
Why Kafka is the right fit
51(1)
Thought starters on our design
52(1)
User data requirements
53(1)
High-level plan for applying our questions
54(3)
Reviewing our blueprint
57(1)
3.3 Format of your data
57(7)
Plan for data
58(1)
Dependency setup
59(5)
References
64(2)
4 Producers: Sourcing data
66(21)
4.1 An example
67(3)
Producer notes
70(1)
4.2 Producer options
70(6)
Configuring the broker list
71(1)
How to go fast (or go safer)
72(2)
Timestamps
74(2)
4.3 Generating code for our requirements
76(9)
Client and broker versions
84(1)
References
85(2)
5 Consumers: Unlocking data
87(24)
5.1 An example
88(8)
Consumer options
89(3)
Understanding our coordinates
92(4)
5.2 How consumers interact
96(1)
5.3 Tracking
96(5)
Group coordinator
98(2)
Partition assignment strategy
100(1)
5.4 Marking our place
101(2)
5.5 Reading from a compacted topic
103(1)
5.6 Retrieving code for our factory requirements
103(5)
Reading options
103(2)
Requirements
105(3)
References
108(3)
6 Brokers
111(18)
6.1 Introducing the broker
111(1)
6.2 Role of ZooKeeper
112(1)
6.3 Options at the broker level
113(4)
Kafka's other logs: Application logs
115(1)
Server log
115(1)
Managing state
116(1)
6.4 Partition replica leaders and their role
117(3)
Losing data
119(1)
6.5 Peeking into Kafka
120(3)
Cluster maintenance
121(1)
Adding a broker
122(1)
Upgrading your cluster
122(1)
Upgrading your clients
122(1)
Backups
123(1)
6.6 A note on stateful systems
123(2)
6.7 Exercise
125(1)
References
126(3)
7 Topics and partitions
129(15)
7.1 Topics
129(5)
Topic-creation options
132(2)
Replication factors
134(1)
7.2 Partitions
134(3)
Partition location
135(1)
Viewing our logs
136(1)
7.3 Testing with EmbeddedKafkaCluster
137(2)
Using Kafka Testcontainers
138(1)
7.4 Topic compaction
139(3)
References
142(2)
8 Kafka storage
144(14)
8.1 How long to store data
145(1)
8.2 Data movement
146(1)
Keeping the original event
146(1)
Moving away from a batch mindset
146(1)
8.3 Tools
147(4)
Apache Flume
147(2)
Red Hat® Debezium™
149(1)
Secor
149(1)
Example use case for data storage
150(1)
8.4 Bringing data back into Kafka
151(1)
Tiered storage
152(1)
8.5 Architectures with Kafka
152(3)
Lambda architecture
153(1)
Kappa architecture
154(1)
8.6 Multiple cluster setups
155(1)
Scaling by adding clusters
155(1)
8.7 Cloud- and container-based storage options
155(1)
Kubernetes clusters
156(1)
References
156(2)
9 Management: Tools and logging
158(21)
9.1 Administration clients
159(4)
Administration in code with AdminClient
159(2)
kcat
161(1)
Confluent REST Proxy API
162(1)
9.2 Running Kafka as a systemd service
163(1)
9.3 Logging
164(2)
Kafka application logs
164(2)
ZooKeeper logs
166(1)
9.4 Firewalls
166(1)
Advertised listeners
166(1)
9.5 Metrics
167(3)
JMX console
167(3)
9.6 Tracing option
170(4)
Producer logic
171(1)
Consumer logic
172(1)
Overriding clients
173(1)
9.7 General monitoring tools
174(2)
References
176(3)
Part 3 Going Further 179(48)
10 Protecting Kafka
181(16)
10.1 Security basics
183(4)
Encryption with SSL
183(1)
SSL between brokers and clients
184(3)
SSL between brokers
187(1)
10.2 Kerberos and the Simple Authentication and Security Layer (SASL)
187(2)
10.3 Authorization in Kafka
189(2)
Access control lists (ACLs)
189(1)
Role-based access control (RBAC)
190(1)
10.4 ZooKeeper
191(1)
Kerberos setup
191(1)
10.5 Quotas
191(3)
Network bandwidth quota
192(1)
Request rate quotas
193(1)
10.6 Data at rest
194(1)
Managed options
194(1)
References
195(2)
11 Schema registry
197(12)
11.1 A proposed Kafka maturity model
198(2)
Level 0
198(1)
Level 1
199(1)
Level 2
199(1)
Level 3
200(1)
11.2 The Schema Registry
200(2)
Installing the Confluent Schema Registry
201(1)
Registry configuration
201(1)
11.3 Schema features
202(3)
REST API
202(1)
Client library
203(2)
11.4 Compatibility rules
205(2)
Validating schema modifications
205(2)
11.5 Alternative to a schema registry
207(1)
References
208(1)
12 Stream processing with Kafka Streams and ksqlDB
209(18)
12.1 Kafka Streams
210(9)
KStreams API DSL
211(4)
KTable API
215(1)
GlobalKTable API
216(1)
Processor API
216(2)
Kafka Streams setup
218(1)
12.2 ksq1DB: An event-streaming database
219(4)
Queries
220(1)
Local development
220(2)
ksqlDB architecture
222(1)
12.3 Going further
223(1)
Kafka Improvement Proposals (KIPS)
223(1)
Kafka projects you can explore
223(1)
Community Slack channel
224(1)
References
224(3)
Appendix A Installation 227(7)
Appendix B Client example 234(5)
Index 239
Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.