Information Theory Cheat Sheet

Updated 2026-05-21

Information theory, founded by Claude Shannon in 1948, provides the mathematical framework for quantifying, storing, and transmitting information. It underpins modern data compression, error-correcting codes, cryptography, and machine learning. This cheat sheet covers entropy measures, divergences, source and channel coding theorems, compression algorithms, error-correcting codes, and Python tools for information-theoretic computation.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 169 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Entropy Fundamentals and UnitsTable 2: Joint and Conditional EntropyTable 3: Mutual Information and VariantsTable 4: Divergences and Distance MeasuresTable 5: Source Coding Theorems and BoundsTable 6: Entropy Coding AlgorithmsTable 7: Lossless Compression AlgorithmsTable 8: Channel Capacity and Coding TheoremsTable 9: Error-Correcting CodesTable 10: Rate-Distortion Theory and Lossy CompressionTable 11: Differential Entropy and Continuous VariablesTable 12: Information Theory in Machine LearningTable 13: Entropy in Cryptography and SecurityTable 14: Python Libraries and Tools

Table 1: Entropy Fundamentals and Units

Concept	Formula / Definition	Notes
Shannon entropy	$H(X) = -\sum_{x \in \mathcal{X}} p(x) \log p(x)$	Measures average uncertainty/information in a random variable $X$
Entropy in bits	Use $\log_2$ ; $H(X) \geq 0$ , measured in bits	• Base-2 logarithm • most common in coding theory
Entropy in nats	Use $\ln$ (natural log); $H(X)$ in nats	• Used in statistics, thermodynamics • 1 nat = $\log_2 e \approx 1.4427$ bits
Entropy in hartleys	Use $\log_{10}$ ; $H(X)$ in hartleys (dits)	• Base-10 log • less common • 1 hartley = $\log_2 10 \approx 3.3219$ bits
Self-information (surprisal)	$I(x) = -\log_2 p(x)$ bits	• Information gained on observing outcome $x$ • rare events have high surprisal
Binary entropy function	$H_b(p) = -p \log_2 p - (1-p) \log_2(1-p)$	• Special case for Bernoulli( $p$ ) • maximum at $p = 0.5$ giving $H_b = 1$ bit

Table 1: Entropy Fundamentals and Units

Concept	Formula / Definition	Notes
Shannon entropy	$H(X) = -\sum_{x \in \mathcal{X}} p(x) \log p(x)$	Measures average uncertainty/information in a random variable $X$
Entropy in bits	Use $\log_2$ ; $H(X) \geq 0$ , measured in bits	• Base-2 logarithm • most common in coding theory
Entropy in nats	Use $\ln$ (natural log); $H(X)$ in nats	• Used in statistics, thermodynamics • 1 nat = $\log_2 e \approx 1.4427$ bits
Entropy in hartleys	Use $\log_{10}$ ; $H(X)$ in hartleys (dits)	• Base-10 log • less common • 1 hartley = $\log_2 10 \approx 3.3219$ bits
Self-information (surprisal)	$I(x) = -\log_2 p(x)$ bits	• Information gained on observing outcome $x$ • rare events have high surprisal
Binary entropy function	$H_b(p) = -p \log_2 p - (1-p) \log_2(1-p)$	• Special case for Bernoulli( $p$ ) • maximum at $p = 0.5$ giving $H_b = 1$ bit