Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Apache Iceberg Cheat Sheet

Apache Iceberg Cheat Sheet

Back to Data Engineering
Updated 2026-04-12
Next Topic: Apache Iceberg Open Table Format Cheat Sheet

Apache Iceberg is an open table format for managing large-scale analytic datasets on object storage (S3, ADLS, GCS). Originally developed at Netflix to address Hive limitations, Iceberg brings ACID transactions, schema evolution, and time travel to data lakes, enabling reliable lakehouse architectures. Unlike file formats (Parquet, Avro, ORC), Iceberg defines how data files are organized into logical tables with consistent point-in-time snapshots. The key insight: Iceberg replaces expensive directory listings with a three-layer metadata tree (metadata JSON → manifest lists → manifest files → data files), enabling massive scalability—production deployments manage petabyte-scale tables with tens of millions of files. What distinguishes Iceberg is hidden partitioning (users query raw values, transforms happen transparently), partition evolution (change partitioning without rewriting data), and vendor-neutral governance under the Apache Software Foundation with the broadest multi-engine support across Spark, Flink, Trino, Snowflake, BigQuery, and DuckDB.

What This Cheat Sheet Covers

This topic spans 23 focused tables and 179 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Table Format ConceptsTable 2: ACID Transaction SemanticsTable 3: Partition Transforms (Hidden Partitioning)Table 4: Schema Evolution OperationsTable 5: Time Travel and SnapshotsTable 6: Row-Level Modifications (V2+)Table 7: Catalog TypesTable 8: Engine IntegrationsTable 9: Table Maintenance ProceduresTable 10: Performance OptimizationTable 11: Branching and TaggingTable 12: Data TypesTable 13: Iceberg Format VersionsTable 14: Metadata Tables (System Queries)Table 15: Spark Procedures (Iceberg SQL Extensions)Table 16: File FormatsTable 17: Streaming and CDC IntegrationTable 18: Security and GovernanceTable 19: Cloud Platform IntegrationsTable 20: Python and Language APIsTable 21: Table Comparison (Iceberg vs Delta vs Hudi)Table 22: Production Best PracticesTable 23: Common Anti-Patterns to Avoid

Table 1: Core Table Format Concepts

ConceptExampleDescription
Table Format
Defines schema, partitioning, snapshots for data files
• Open specification for organizing raw data files (Parquet/Avro/ORC) into logical tables with ACID semantics
• separates metadata from data storage
Metadata Layer
metadata/v1.metadata.json
Three-layer architecture: metadata JSON (schema, snapshots) → manifest lists (snapshot metadata) → manifest files (file statistics) → data files
Snapshot
snapshot_id=8744736658442914487
• Immutable point-in-time view of table
• created on every commit
• enables time travel and rollback
Manifest List
snap-8744736658442914487-1-abc123.avro
• Avro file containing references to all manifest files for a snapshot
• tracks partition ranges and file counts per manifest
Manifest File
abc123-m0.avro
Avro file tracking subset of data files with per-file statistics (path, partition, record count, min/max, null counts)

More in Data Engineering

  • Apache Hudi Cheat Sheet
  • Apache Iceberg Open Table Format Cheat Sheet
  • Airbyte Open-Source ELT Cheat Sheet
  • Change Data Capture (CDC) Cheat Sheet
  • Databricks Delta Live Tables (DLT) Cheat Sheet
  • Great Expectations Data Quality Cheat Sheet
View all 61 topics in Data Engineering