Information theory, founded by Claude Shannon in 1948, provides the mathematical framework for quantifying, storing, and transmitting information. It underpins modern data compression, error-correcting codes, cryptography, and machine learning. This cheat sheet covers entropy measures, divergences, source and channel coding theorems, compression algorithms, error-correcting codes, and Python tools for information-theoretic computation.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 169 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Entropy Fundamentals and UnitsTable 2: Joint and Conditional EntropyTable 3: Mutual Information and VariantsTable 4: Divergences and Distance MeasuresTable 5: Source Coding Theorems and BoundsTable 6: Entropy Coding AlgorithmsTable 7: Lossless Compression AlgorithmsTable 8: Channel Capacity and Coding TheoremsTable 9: Error-Correcting CodesTable 10: Rate-Distortion Theory and Lossy CompressionTable 11: Differential Entropy and Continuous VariablesTable 12: Information Theory in Machine LearningTable 13: Entropy in Cryptography and SecurityTable 14: Python Libraries and Tools
Table 1: Entropy Fundamentals and Units
| Concept | Formula / Definition | Notes |
|---|---|---|
H(X) = -\sum_{x \in \mathcal{X}} p(x) \log p(x) | Measures average uncertainty/information in a random variable X | |
Entropy in bits | Use \log_2; H(X) \geq 0, measured in bits | • Base-2 logarithm • most common in coding theory |
Entropy in nats | Use \ln (natural log); H(X) in nats | • Used in statistics, thermodynamics • 1 nat = \log_2 e \approx 1.4427 bits |
Entropy in hartleys | Use \log_{10}; H(X) in hartleys (dits) | • Base-10 log • less common • 1 hartley = \log_2 10 \approx 3.3219 bits |
I(x) = -\log_2 p(x) bits | • Information gained on observing outcome x• rare events have high surprisal | |
Binary entropy function | H_b(p) = -p \log_2 p - (1-p) \log_2(1-p) | • Special case for Bernoulli(p)• maximum at p = 0.5 giving H_b = 1 bit |