Customer Support: +372 7440010

Help | New account | Log In

E-book: Kafka in Action

3.74/5 (73 ratings by Goodreads)

Dylan Scott

Format: 272 pages
Pub. Date: 22-Mar-2022
Publisher: Manning Publications
Language: eng
ISBN-13: 9781638356196

Other books in subject:

Format - EPUB+DRM
Price: 38,73 €*
* the price is final i.e. no additional discount will apply
Add to basket
Add to Wishlist
This ebook is for personal use only. E-Books are non-refundable.

Format: 272 pages
Pub. Date: 22-Mar-2022
Publisher: Manning Publications
Language: eng
ISBN-13: 9781638356196

Other books in subject:

DRM restrictions

Copying (copy/paste):

not allowed
Printing:

not allowed
Usage:

Digital Rights Management (DRM)
The publisher has supplied this book in encrypted form, which means that you need to install free software in order to unlock and read it. To read this e-book you have to create Adobe ID More info here. Ebook can be read and downloaded up to 6 devices (single user with the same Adobe ID).

Required software
To read this ebook on a mobile device (phone or tablet) you'll need to install this free app: PocketBook Reader (iOS / Android)

To download and read this eBook on a PC or Mac you need Adobe Digital Editions (This is a free app specially developed for eBooks. It's not the same as Adobe Reader, which you probably already have on your computer.)

You can't read this ebook with Amazon Kindle

Kafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more.

In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue.

Key Features

· Understanding Kafka's concepts

· Implementing Kafka as a message queue

· Setting up and executing basic ETL tasks

· Recording and consuming streaming data

· Working with Kafka producers and consumers from Java applications

· Using Kafka as part of a large data project team

· Performing Kafka developer and admin tasks

Written for intermediate Java developers or data engineers. No prior knowledge of Kafka is required.

About the technology

Apache Kafka is a distributed streaming platform for logging and streaming data between services or applications. With Kafka, it's easy to build applications that can act on or react to data streams as they flow through your system. Operational data monitoring, large scale message processing, website activity tracking, log aggregation, and more are all possible with Kafka.

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.

Foreword

Preface

xvi

Acknowledgments

xviii

About This Book

About The Authors

xxiii

About The Cover Illustration

xxiv

Part 1 Getting Started

1 Introduction to Kafka

(14)

1.1 What is Kafka?

(4)

1.2 Kafka usage

(2)

Kafka for the developer

(1)

Explaining Kafka to your manager

(1)

1.3 Kafka myths

(1)

Kafka only works with Hadoop®

(1)

Kafka is the same as other message brokers

(1)

1.4 Kafka in the real world

(4)

Early examples

(1)

Later examples

(1)

When Kafka might not be the right fit

(1)

1.5 Online resources to get started

(1)

References

(2)

2 Getting to know Kafka

(24)

2.1 Producing and consuming a message

(1)

2.2 What are brokers?

(5)

2.3 Tour of Kafka

(7)

Producers and consumers

(3)

Topics overview

(1)

ZooKeeper usage

(1)

Kafka's high-level architecture

(1)

The commit log

(1)

2.4 Various source code packages and what they do

(3)

Kafka Streams

(1)

Kafka Connect

(1)

AdminClient package

(1)

ksqlDB

(1)

2.5 Confluent clients

(3)

2.6 Stream processing and terminology

(3)

Stream processing

(1)

What exactly-once means

(1)

References

(2)

Part 2 Applying Kafka

(138)

3 Designing a Kafka project

(23)

3.1 Designing a Kafka project

(5)

Taking over an existing data architecture

(1)

A first change

(1)

Built-in features

(3)

Data for our invoices

(2)

3.2 Sensor event design

(8)

Existing issues

(2)

Why Kafka is the right fit

(1)

Thought starters on our design

(1)

User data requirements

(1)

High-level plan for applying our questions

(3)

Reviewing our blueprint

(1)

3.3 Format of your data

(7)

Plan for data

(1)

Dependency setup

(5)

References

(2)

4 Producers: Sourcing data

(21)

4.1 An example

(3)

Producer notes

(1)

4.2 Producer options

(6)

Configuring the broker list

(1)

How to go fast (or go safer)

(2)

Timestamps

(2)

4.3 Generating code for our requirements

(9)

Client and broker versions

(1)

References

(2)

5 Consumers: Unlocking data

(24)

5.1 An example

(8)

Consumer options

(3)

Understanding our coordinates

(4)

5.2 How consumers interact

(1)

5.3 Tracking

(5)

Group coordinator

(2)

Partition assignment strategy

100

(1)

5.4 Marking our place

101

(2)

5.5 Reading from a compacted topic

103

(1)

5.6 Retrieving code for our factory requirements

103

(5)

Reading options

103

(2)

Requirements

105

(3)

References

108

(3)

6 Brokers

111

(18)

6.1 Introducing the broker

111

(1)

6.2 Role of ZooKeeper

112

(1)

6.3 Options at the broker level

113

(4)

Kafka's other logs: Application logs

115

(1)

Server log

115

(1)

Managing state

116

(1)

6.4 Partition replica leaders and their role

117

(3)

Losing data

119

(1)

6.5 Peeking into Kafka

120

(3)

Cluster maintenance

121

(1)

Adding a broker

122

(1)

Upgrading your cluster

122

(1)

Upgrading your clients

122

(1)

Backups

123

(1)

6.6 A note on stateful systems

123

(2)

6.7 Exercise

125

(1)

References

126

(3)

7 Topics and partitions

129

(15)

7.1 Topics

129

(5)

Topic-creation options

132

(2)

Replication factors

134

(1)

7.2 Partitions

134

(3)

Partition location

135

(1)

Viewing our logs

136

(1)

7.3 Testing with EmbeddedKafkaCluster

137

(2)

Using Kafka Testcontainers

138

(1)

7.4 Topic compaction

139

(3)

References

142

(2)

8 Kafka storage

144

(14)

8.1 How long to store data

145

(1)

8.2 Data movement

146

(1)

Keeping the original event

146

(1)

Moving away from a batch mindset

146

(1)

8.3 Tools

147

(4)

Apache Flume

147

(2)

Red Hat® Debezium™

149

(1)

Secor

149

(1)

Example use case for data storage

150

(1)

8.4 Bringing data back into Kafka

151

(1)

Tiered storage

152

(1)

8.5 Architectures with Kafka

152

(3)

Lambda architecture

153

(1)

Kappa architecture

154

(1)

8.6 Multiple cluster setups

155

(1)

Scaling by adding clusters

155

(1)

8.7 Cloud- and container-based storage options

155

(1)

Kubernetes clusters

156

(1)

References

156

(2)

9 Management: Tools and logging

158

(21)

9.1 Administration clients

159

(4)

Administration in code with AdminClient

159

(2)

kcat

161

(1)

Confluent REST Proxy API

162

(1)

9.2 Running Kafka as a systemd service

163

(1)

9.3 Logging

164

(2)

Kafka application logs

164

(2)

ZooKeeper logs

166

(1)

9.4 Firewalls

166

(1)

Advertised listeners

166

(1)

9.5 Metrics

167

(3)

JMX console

167

(3)

9.6 Tracing option

170

(4)

Producer logic

171

(1)

Consumer logic

172

(1)

Overriding clients

173

(1)

9.7 General monitoring tools

174

(2)

References

176

(3)

Part 3 Going Further

179

(48)

10 Protecting Kafka

181

(16)

10.1 Security basics

183

(4)

Encryption with SSL

183

(1)

SSL between brokers and clients

184

(3)

SSL between brokers

187

(1)

10.2 Kerberos and the Simple Authentication and Security Layer (SASL)

187

(2)

10.3 Authorization in Kafka

189

(2)

Access control lists (ACLs)

189

(1)

Role-based access control (RBAC)

190

(1)

10.4 ZooKeeper

191

(1)

Kerberos setup

191

(1)

10.5 Quotas

191

(3)

Network bandwidth quota

192

(1)

Request rate quotas

193

(1)

10.6 Data at rest

194

(1)

Managed options

194

(1)

References

195

(2)

11 Schema registry

197

(12)

11.1 A proposed Kafka maturity model

198

(2)

Level 0

198

(1)

Level 1

199

(1)

Level 2

199

(1)

Level 3

200

(1)

11.2 The Schema Registry

200

(2)

Installing the Confluent Schema Registry

201

(1)

Registry configuration

201

(1)

11.3 Schema features

202

(3)

REST API

202

(1)

Client library

203

(2)

11.4 Compatibility rules

205

(2)

Validating schema modifications

205

(2)

11.5 Alternative to a schema registry

207

(1)

References

208

(1)

12 Stream processing with Kafka Streams and ksqlDB

209

(18)

12.1 Kafka Streams

210

(9)

KStreams API DSL

211

(4)

KTable API

215

(1)

GlobalKTable API

216

(1)

Processor API

216

(2)

Kafka Streams setup

218

(1)

12.2 ksq1DB: An event-streaming database

219

(4)

Queries

220

(1)

Local development

220

(2)

ksqlDB architecture

222

(1)

12.3 Going further

223

(1)

Kafka Improvement Proposals (KIPS)

223

(1)

Kafka projects you can explore

223

(1)

Community Slack channel

224

(1)

References

224

(3)

Appendix A Installation

227

(7)

Appendix B Client example

234

(5)

Index

239

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.

More information about ebooks

Permanent link: https://www.kriso.ee/db/97816383561966e.html

Keywords:

E-book: Kafka in Action

DRM restrictions

Copying (copy/paste):

Printing:

Usage:

Account & settings

Search

Search database

Refine By

Subjects Ebook Subjects

Choose shopping cart