Statistical Distributions Cheat Sheet

Updated 2026-04-21

Next Topic: Statistical Inference Tests Cheat Sheet

Statistical distributions are mathematical functions that describe the probability of different outcomes in a random process, forming the foundation of probability theory, statistical inference, and data analysis across all quantitative fields. Most foundational distributions belong to the exponential family, a unifying mathematical framework that guarantees sufficient statistics and tractable inference. Each distribution is characterized by its parameters (shape, location, scale), and choosing the correct distribution for your data determines the validity of subsequent statistical tests, predictions, and decisions. A key distinction to always keep in mind: discrete distributions model countable outcomes (coin flips, customer arrivals), while continuous distributions model measurable quantities (temperature, time, income)—confusing the two leads to fundamentally incorrect analysis.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 152 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Discrete DistributionsTable 2: Foundational Continuous DistributionsTable 3: Heavy-Tailed & Extreme Value DistributionsTable 4: Multivariate DistributionsTable 5: Specialized Distributions by DomainTable 6: Distribution FunctionsTable 7: Distribution Properties & MeasuresTable 8: Parameter Estimation MethodsTable 9: Goodness-of-Fit TestsTable 10: Model Selection CriteriaTable 11: Distribution Transformations & RelationshipsTable 12: Conjugate Prior Pairs (Bayesian)Table 13: Distribution Families in GLMsTable 14: Limit Theorems & ConvergenceTable 15: Less Common but Important DistributionsTable 16: Mixture & Copula ModelsTable 17: Statistical Distances & Divergences

Table 1: Core Discrete Distributions

Distribution	Example	Description
Bernoulli	`p = 0.3` `P(X=1) = 0.3, P(X=0) = 0.7`	• Models a single trial with two outcomes (success/failure) • parameter `p` is success probability • mean = `p`, variance = `p(1−p)`
Binomial	`n=10, p=0.3` $P(X{=}k) = \binom{10}{k} 0.3^k \cdot 0.7^{10-k}$	• Number of successes in n independent Bernoulli trials • models fixed trials with replacement • mean = `np`, variance = `np(1−p)`
Poisson	`λ=5` $P(X{=}k) = \frac{5^k e^{-5}}{k!}$	• Counts events in a fixed interval (time/space) when events occur independently at constant rate λ • mean = variance = λ
Geometric	`p=0.2` $P(X{=}k) = 0.8^{k-1} \cdot 0.2$	• Number of trials until first success in independent Bernoulli trials • memoryless discrete distribution • mean = `1/p`
Negative Binomial	`r=3, p=0.4` $P(X{=}k) = \binom{k+r-1}{k} 0.4^r \cdot 0.6^k$	• Number of failures before r successes • generalizes geometric (r=1) • models overdispersed count data (variance > mean)
Multinomial	`n=10, p=[0.3,0.5,0.2]` $P(\mathbf{x}) = \frac{10!}{x_1! x_2! x_3!} 0.3^{x_1} 0.5^{x_2} 0.2^{x_3}$	• Generalizes binomial to k>2 mutually exclusive outcomes • models categorical trials • sum of counts = n
Hypergeometric	`N=50, K=20, n=10` $P(X{=}k) = \frac{\binom{20}{k}\binom{30}{10-k}}{\binom{50}{10}}$	• Successes in sampling without replacement from finite population • use when sample size > 5% of population • binomial analog for finite populations

Table 1: Core Discrete Distributions

Distribution	Example	Description
Bernoulli	`p = 0.3` `P(X=1) = 0.3, P(X=0) = 0.7`	• Models a single trial with two outcomes (success/failure) • parameter `p` is success probability • mean = `p`, variance = `p(1−p)`
Binomial	`n=10, p=0.3` $P(X{=}k) = \binom{10}{k} 0.3^k \cdot 0.7^{10-k}$	• Number of successes in n independent Bernoulli trials • models fixed trials with replacement • mean = `np`, variance = `np(1−p)`
Poisson	`λ=5` $P(X{=}k) = \frac{5^k e^{-5}}{k!}$	• Counts events in a fixed interval (time/space) when events occur independently at constant rate λ • mean = variance = λ
Geometric	`p=0.2` $P(X{=}k) = 0.8^{k-1} \cdot 0.2$	• Number of trials until first success in independent Bernoulli trials • memoryless discrete distribution • mean = `1/p`
Negative Binomial	`r=3, p=0.4` $P(X{=}k) = \binom{k+r-1}{k} 0.4^r \cdot 0.6^k$	• Number of failures before r successes • generalizes geometric (r=1) • models overdispersed count data (variance > mean)
Multinomial	`n=10, p=[0.3,0.5,0.2]` $P(\mathbf{x}) = \frac{10!}{x_1! x_2! x_3!} 0.3^{x_1} 0.5^{x_2} 0.2^{x_3}$	• Generalizes binomial to k>2 mutually exclusive outcomes • models categorical trials • sum of counts = n
Hypergeometric	`N=50, K=20, n=10` $P(X{=}k) = \frac{\binom{20}{k}\binom{30}{10-k}}{\binom{50}{10}}$	• Successes in sampling without replacement from finite population • use when sample size > 5% of population • binomial analog for finite populations