Apache Kafka Cheat Sheet

Updated 2026-04-27

Next Topic: Apache Paimon Streaming Lakehouse Cheat Sheet

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and scalable data pipelines. Originally developed by LinkedIn and open-sourced as an Apache project, Kafka enables real-time data movement through a publish-subscribe model built on append-only logs. Its ability to handle massive message volumes while maintaining durability and ordering makes it foundational for microservices, data lakes, and stream processing applications. With Kafka 4.0 (March 2025) ZooKeeper was completely removed in favor of the built-in KRaft consensus system, and Kafka 4.2 (February 2026) introduced production-ready Share Groups (queue semantics) alongside a broker-driven Streams rebalance protocol. Understanding Kafka's architecture — topics, partitions, brokers, producers, and consumers — unlocks the ability to build systems that react to data in real time rather than batch-process it hours later.

What This Cheat Sheet Covers

This topic spans 22 focused tables and 193 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core ConceptsTable 2: Topic and Partition ConfigurationTable 3: Producer ConfigurationTable 4: Consumer ConfigurationTable 5: Offset ManagementTable 6: Rebalancing StrategiesTable 7: Message Delivery SemanticsTable 8: Serialization and Schema ManagementTable 9: Compression AlgorithmsTable 10: Kafka Streams BasicsTable 11: Windowing TypesTable 12: Kafka ConnectTable 13: CLI ToolsTable 14: Security ConfigurationTable 15: Performance TuningTable 16: Monitoring MetricsTable 17: Cluster ManagementTable 18: KRaft Mode (ZooKeeper-Free)Table 19: Advanced TopicsTable 20: Error Handling and Retry PatternsTable 21: Testing StrategiesTable 22: Share Groups (Queues for Kafka)

Table 1: Core Concepts

These are the building blocks every Kafka conversation assumes you already know — the vocabulary of topics, partitions, brokers, producers, and consumers that the rest of this cheat sheet builds on. The mental model worth holding onto is that a topic is just an append-only log sliced into partitions, and almost everything else (ordering, parallelism, fault tolerance) follows from how those partitions are written, replicated, and consumed.

Concept	Example	Description
Topic	`user-events`	• Named append-only log where producers write records • logically divided into partitions for parallelism.
Partition	`topic: orders, partition: 0`	• Ordered, immutable sequence of records within a topic • unit of parallelism • each partition has one leader and replicas.
Broker	`kafka-broker-1:9092`	• Kafka server instance that stores and serves data • manages topic partitions and handles client requests.
Producer	`KafkaProducer<K, V>`	• Client application that publishes records to topics • determines partition assignment via key or custom partitioner.
Consumer	`KafkaConsumer<K, V>`	• Client that reads records from topics • tracks position via offset • can be part of consumer group for parallelism.
Consumer Group	`group.id=analytics-team`	• Set of consumers sharing workload across partitions • each partition assigned to one consumer in group • enables horizontal scaling.
Share Group	`KafkaShareConsumer<K, V>`	• Consumer group type (KIP-932, GA in Kafka 4.2) enabling queue semantics • multiple consumers share records from the same partition • uses acknowledgements instead of committed offsets.

Table 1: Core Concepts

Concept	Example	Description
Topic	`user-events`	• Named append-only log where producers write records • logically divided into partitions for parallelism.
Partition	`topic: orders, partition: 0`	• Ordered, immutable sequence of records within a topic • unit of parallelism • each partition has one leader and replicas.
Broker	`kafka-broker-1:9092`	• Kafka server instance that stores and serves data • manages topic partitions and handles client requests.
Producer	`KafkaProducer<K, V>`	• Client application that publishes records to topics • determines partition assignment via key or custom partitioner.
Consumer	`KafkaConsumer<K, V>`	• Client that reads records from topics • tracks position via offset • can be part of consumer group for parallelism.
Consumer Group	`group.id=analytics-team`	• Set of consumers sharing workload across partitions • each partition assigned to one consumer in group • enables horizontal scaling.
Share Group	`KafkaShareConsumer<K, V>`	• Consumer group type (KIP-932, GA in Kafka 4.2) enabling queue semantics • multiple consumers share records from the same partition • uses acknowledgements instead of committed offsets.