Regression Analysis Cheat Sheet

Updated 2026-05-21

Next Topic: Set Theory and Mathematical Relations Cheat Sheet

Regression analysis is a set of statistical methods for estimating the relationships between a dependent variable and one or more independent variables, forming the backbone of predictive modelling and causal inference across every quantitative field. Its power lies not just in fitting a line, but in providing standard errors, hypothesis tests, and diagnostics that quantify how much trust to place in each estimate. The critical insight most practitioners learn too late is that a model's coefficients are only as meaningful as the assumptions it rests on — every regression analysis should be accompanied by careful diagnostics before results are reported.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 146 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: OLS Foundations — The Simple Linear Regression ModelTable 2: OLS AssumptionsTable 3: Multiple Linear Regression — Matrix Form and Key FormulasTable 4: Regression DiagnosticsTable 5: Multicollinearity — Detection and RemediesTable 6: Model Selection CriteriaTable 7: Regularised Regression — Ridge, Lasso, and Elastic NetTable 8: Logistic RegressionTable 9: Multinomial and Ordinal Logistic RegressionTable 10: Poisson, Negative Binomial, and Count RegressionTable 11: Generalised Linear Models (GLM) — Link Functions and FamiliesTable 12: Robust RegressionTable 13: Time-Series Regression and AutocorrelationTable 14: Regression in PythonTable 15: Regression in RTable 16: Quantile RegressionTable 17: Advanced and Specialised Regression TopicsTable 18: Common Pitfalls and Reporting Standards

Table 1: OLS Foundations — The Simple Linear Regression Model

The ordinary least squares estimator underpins almost every form of linear regression. Understanding how OLS works — what it minimises, what conditions make it optimal, and what its closed-form solution looks like — is the entry point to all more advanced regression methods.

Technique	Example	Description
Population regression model	$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$	• The data-generating process relating the scalar response $y_i$ to predictor $x_i$ • $\varepsilon_i$ is the unobserved error term
OLS objective	$\min_{\beta_0,\beta_1} \sum_{i=1}^{n}(y_i - \beta_0 - \beta_1 x_i)^2$	• Minimises the sum of squared residuals • gives the line that is geometrically closest to all points in the vertical direction
OLS slope estimator	$\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$	• Closed-form solution obtained by setting the first-order conditions to zero • equals the sample covariance of $x$ and $y$ divided by the sample variance of $x$ .
OLS intercept estimator	$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$	Ensures the fitted line always passes through the point of means $(\bar{x},\bar{y})$ .

Table 1: OLS Foundations — The Simple Linear Regression Model

Technique	Example	Description
Population regression model	$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$	• The data-generating process relating the scalar response $y_i$ to predictor $x_i$ • $\varepsilon_i$ is the unobserved error term
OLS objective	$\min_{\beta_0,\beta_1} \sum_{i=1}^{n}(y_i - \beta_0 - \beta_1 x_i)^2$	• Minimises the sum of squared residuals • gives the line that is geometrically closest to all points in the vertical direction
OLS slope estimator	$\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$	• Closed-form solution obtained by setting the first-order conditions to zero • equals the sample covariance of $x$ and $y$ divided by the sample variance of $x$ .
OLS intercept estimator	$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$	Ensures the fitted line always passes through the point of means $(\bar{x},\bar{y})$ .