Change Data Capture (CDC) Cheat Sheet

Updated 2026-04-12

Change Data Capture (CDC) is a design pattern that tracks and streams every change (insert, update, delete) made to a dataset in near real-time. Originally developed for database replication, CDC has evolved into the backbone of modern event-driven architectures, enabling real-time analytics, microservices synchronization, and data lake ingestion without impacting source system performance. Unlike batch ETL which periodically polls tables, CDC reads transaction logs—the append-only journal every ACID database maintains—and converts those low-level log events into structured change streams. This log-based approach delivers sub-second latency at a fraction of the resource cost, making CDC the de facto standard for keeping analytical and operational systems in sync.

What This Cheat Sheet Covers

This topic spans 20 focused tables and 105 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: CDC Implementation ApproachesTable 2: Database-Specific CDC ImplementationsTable 3: CDC Tools and PlatformsTable 4: Delivery Guarantees and SemanticsTable 5: Event Ordering and DeduplicationTable 6: Schema Evolution and HandlingTable 7: Snapshot and Initial Load StrategiesTable 8: Monitoring and Operational MetricsTable 9: Error Handling and RecoveryTable 10: Performance Optimization TechniquesTable 11: Transformation and EnrichmentTable 12: SCD Type 2 via CDCTable 13: Cloud-Specific CDC ServicesTable 14: Testing and ValidationTable 15: Security and ComplianceTable 16: Advanced Patterns and Use CasesTable 17: Tombstone Events and Delete HandlingTable 18: Latency and Performance TradeoffsTable 19: PostgreSQL-Specific ConsiderationsTable 20: Operational Best Practices

Table 1: CDC Implementation Approaches

Approach	Example	Description
Log-Based CDC	`PostgreSQL WAL → Debezium` `MySQL binlog → Kafka Connect`	• Reads database transaction logs (WAL, binlog, redo logs) to capture every change without querying tables • lowest latency (sub-second) and zero source impact but requires log access permissions.
Trigger-Based CDC	`CREATE TRIGGER on_update` `INSERT INTO cdc_table`	• Database triggers write change records to a shadow table on every INSERT/UPDATE/DELETE • simple to set up but adds write overhead to every transaction and may miss DDL changes.
Query-Based CDC (Polling)	`SELECT * WHERE updated_at >` `MAX(last_sync_time)`	• Periodically polls tables filtering by timestamp or sequence column • highest latency (minutes) and cannot detect hard deletes, but works when log access is unavailable.

Table 1: CDC Implementation Approaches

Approach	Example	Description
Log-Based CDC	`PostgreSQL WAL → Debezium` `MySQL binlog → Kafka Connect`	• Reads database transaction logs (WAL, binlog, redo logs) to capture every change without querying tables • lowest latency (sub-second) and zero source impact but requires log access permissions.
Trigger-Based CDC	`CREATE TRIGGER on_update` `INSERT INTO cdc_table`	• Database triggers write change records to a shadow table on every INSERT/UPDATE/DELETE • simple to set up but adds write overhead to every transaction and may miss DDL changes.
Query-Based CDC (Polling)	`SELECT * WHERE updated_at >` `MAX(last_sync_time)`	• Periodically polls tables filtering by timestamp or sequence column • highest latency (minutes) and cannot detect hard deletes, but works when log access is unavailable.