Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and scalable data pipelines. Originally developed by LinkedIn and open-sourced as an Apache project, Kafka enables real-time data movement through a publish-subscribe model built on append-only logs. Its ability to handle massive message volumes while maintaining durability and ordering makes it foundational for microservices, data lakes, and stream processing applications. With Kafka 4.0 (March 2025) ZooKeeper was completely removed in favor of the built-in KRaft consensus system, and Kafka 4.2 (February 2026) introduced production-ready Share Groups (queue semantics) alongside a broker-driven Streams rebalance protocol. Understanding Kafka's architecture — topics, partitions, brokers, producers, and consumers — unlocks the ability to build systems that react to data in real time rather than batch-process it hours later.
What This Cheat Sheet Covers
This topic spans 22 focused tables and 193 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Concepts
These are the building blocks every Kafka conversation assumes you already know — the vocabulary of topics, partitions, brokers, producers, and consumers that the rest of this cheat sheet builds on. The mental model worth holding onto is that a topic is just an append-only log sliced into partitions, and almost everything else (ordering, parallelism, fault tolerance) follows from how those partitions are written, replicated, and consumed.
| Concept | Example | Description |
|---|---|---|
user-events | • Named append-only log where producers write records • logically divided into partitions for parallelism. | |
topic: orders, partition: 0 | • Ordered, immutable sequence of records within a topic • unit of parallelism • each partition has one leader and replicas. | |
kafka-broker-1:9092 | • Kafka server instance that stores and serves data • manages topic partitions and handles client requests. | |
KafkaProducer<K, V> | • Client application that publishes records to topics • determines partition assignment via key or custom partitioner. | |
KafkaConsumer<K, V> | • Client that reads records from topics • tracks position via offset • can be part of consumer group for parallelism. | |
group.id=analytics-team | • Set of consumers sharing workload across partitions • each partition assigned to one consumer in group • enables horizontal scaling. | |
KafkaShareConsumer<K, V> | • Consumer group type (KIP-932, GA in Kafka 4.2) enabling queue semantics • multiple consumers share records from the same partition • uses acknowledgements instead of committed offsets. |