File formats are standardized ways of encoding information for storage in computer files, each optimized for specific types of data and use cases. From documents and images to databases and scientific datasets, choosing the right format impacts compatibility, file size, quality, and functionality. Understanding format differences—such as lossy versus lossless compression, proprietary versus open standards, and container versus codec—enables effective data management, seamless cross-platform workflows, and long-term digital preservation. In data engineering especially, the choice between row-based formats (CSV, Avro) and columnar formats (Parquet, ORC) can produce order-of-magnitude differences in query performance and storage costs.
What This Cheat Sheet Covers
This topic spans 31 focused tables and 163 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Document Formats
| Format | Example | Description |
|---|---|---|
document.pdf | • Portable Document Format preserving exact layout across all platforms • supports annotations, forms, encryption, and embedded fonts — industry standard for final distribution and archival | |
report.docx | Microsoft Word XML-based format supporting rich formatting, track changes, macros, and embedded objects — default for Word 2007+ with broad compatibility | |
proposal.odt | • OpenDocument Text — open standard for word processing used by LibreOffice and OpenOffice • interoperable but may lose some formatting in Microsoft Word |