Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and scalable data pipelines. Originally developed by LinkedIn and open-sourced as an Apache project, Kafka enables real-time data movement through a publish-subscribe model built on append-only logs. Its ability to handle massive message volumes while maintaining durability and ordering makes it foundational for microservices, data lakes, and stream processing applications. With Kafka 4.0 (March 2025) ZooKeeper was completely removed in favor of the built-in KRaft consensus system, and Kafka 4.2 (February 2026) introduced production-ready Share Groups (queue semantics) alongside a broker-driven Streams rebalance protocol. Understanding Kafka's architecture — topics, partitions, brokers, producers, and consumers — unlocks the ability to build systems that react to data in real time rather than batch-process it hours later.
What This Cheat Sheet Covers
This topic spans 22 focused tables and 193 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Concepts
| Concept | Example | Description |
|---|---|---|
user-events | • Named append-only log where producers write records • logically divided into partitions for parallelism. | |
topic: orders, partition: 0 | • Ordered, immutable sequence of records within a topic • unit of parallelism • each partition has one leader and replicas. | |
kafka-broker-1:9092 | • Kafka server instance that stores and serves data • manages topic partitions and handles client requests. | |
KafkaProducer<K, V> | • Client application that publishes records to topics • determines partition assignment via key or custom partitioner. | |
KafkaConsumer<K, V> | • Client that reads records from topics • tracks position via offset • can be part of consumer group for parallelism. | |
group.id=analytics-team | • Set of consumers sharing workload across partitions • each partition assigned to one consumer in group • enables horizontal scaling. | |
KafkaShareConsumer<K, V> | • Consumer group type (KIP-932, GA in Kafka 4.2) enabling queue semantics • multiple consumers share records from the same partition • uses acknowledgements instead of committed offsets. |