Change Data Capture (CDC) is a design pattern that tracks and streams every change (insert, update, delete) made to a dataset in near real-time. Originally developed for database replication, CDC has evolved into the backbone of modern event-driven architectures, enabling real-time analytics, microservices synchronization, and data lake ingestion without impacting source system performance. Unlike batch ETL which periodically polls tables, CDC reads transaction logs—the append-only journal every ACID database maintains—and converts those low-level log events into structured change streams. This log-based approach delivers sub-second latency at a fraction of the resource cost, making CDC the de facto standard for keeping analytical and operational systems in sync.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 105 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: CDC Implementation Approaches
| Approach | Example | Description |
|---|---|---|
PostgreSQL WAL → DebeziumMySQL binlog → Kafka Connect | • Reads database transaction logs (WAL, binlog, redo logs) to capture every change without querying tables • lowest latency (sub-second) and zero source impact but requires log access permissions. | |
CREATE TRIGGER on_updateINSERT INTO cdc_table | • Database triggers write change records to a shadow table on every INSERT/UPDATE/DELETE • simple to set up but adds write overhead to every transaction and may miss DDL changes. | |
SELECT * WHERE updated_at >MAX(last_sync_time) | • Periodically polls tables filtering by timestamp or sequence column • highest latency (minutes) and cannot detect hard deletes, but works when log access is unavailable. |