Statsmodels is Python's inference-focused statistics and econometrics library, sitting on top of NumPy, SciPy, pandas, and Patsy. It matters because it combines model estimation with the outputs practitioners actually need: standard errors, hypothesis tests, diagnostics, confidence intervals, and interpretable summaries. A useful mental model is that statsmodels is less about black-box prediction and more about specifying the model carefully, checking assumptions, and defending the conclusions after fitting. As of version 0.14, major additions include treatment-effect estimation, hurdle and truncated count models, multiple seasonal decomposition (MSTL), and extended copula support.
What This Cheat Sheet Covers
This topic spans 11 focused tables and 147 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core APIs and Workflow
The two import surfaces — matrix-based sm and formula-based smf — determine how you build design matrices; almost every other call in the library flows from the results object returned by .fit().
| Method | Example | Description |
|---|---|---|
import statsmodels.api as sm | Main import surface for matrix-based model classes, datasets, graphics, and statistical tools. | |
import statsmodels.formula.api as smf | Formula interface using R-style formulas and Patsy design matrices. | |
X = sm.add_constant(X) | Adds an explicit intercept column for matrix-based models that do not include one by default. | |
sm.OLS.from_formula('y ~ x1 + x2', data=df) | Alternate constructor that builds the design matrix from a formula and DataFrame. | |
res = sm.OLS(y, X).fit() | Estimates the model and returns a results object with inference and diagnostics. | |
res.summary() | Produces the standard text summary with coefficients, tests, fit metrics, and diagnostics. | |
res.params | Estimated coefficients in the fitted model. |