Statsmodels is Python's inference-focused statistics and econometrics library, sitting on top of NumPy, SciPy, pandas, and Patsy. It matters because it combines model estimation with the outputs practitioners actually need for analysis: standard errors, hypothesis tests, diagnostics, confidence intervals, and interpretable summaries. A useful mental model is that statsmodels is less about black-box prediction and more about specifying the model carefully, checking assumptions, and defending the conclusions after fitting.
What This Cheat Sheet Covers
This topic spans 10 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core APIs and Workflow
| Method | Example | Description |
|---|---|---|
import statsmodels.api as sm | Main import surface for matrix-based model classes, datasets, graphics, and statistical tools. | |
import statsmodels.formula.api as smf | Formula interface using R-style formulas and Patsy design matrices. | |
X = sm.add_constant(X) | Adds an explicit intercept column for matrix-based models that do not include one by default. | |
sm.OLS.from_formula('y ~ x1 + x2', data=df) | Alternate constructor that builds the design matrix from a formula and DataFrame. | |
res = sm.OLS(y, X).fit() | Estimates the model and returns a results object with inference and diagnostics. | |
res.summary() | Produces the standard text summary with coefficients, tests, fit metrics, and diagnostics. |