Panel data (also called longitudinal or cross-sectional time-series data) combines both cross-sectional and temporal dimensions, observing multiple entities (individuals, firms, countries) repeatedly over time. This structure enables researchers to control for unobserved heterogeneity that remains constant over time, substantially reducing omitted variable bias compared to pure cross-sectional or time-series approaches. Panel methods are fundamental in econometrics, empirical research, and causal inference, with applications spanning labor economics, health policy, finance, and social sciences. A critical distinction in panel data analysis is understanding the source of variation: whether identification comes from changes within entities over time (within variation) or differences between entities (between variation), as different estimators exploit different dimensions of the data structure.
What This Cheat Sheet Covers
This topic spans 21 focused tables and 128 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Panel Data Models
| Model | Example | Description |
|---|---|---|
xtreg y x1 x2, fe (Stata)plm(y ~ x1 + x2, model="within") (R) | • Eliminates time-invariant unobserved heterogeneity by demeaning (subtracting entity-specific means) • identifies effects using only within-entity variation over time. | |
xtreg y x1 x2, re (Stata)plm(y ~ x1 + x2, model="random") (R) | • Assumes unobserved effects are uncorrelated with regressors • uses both within and between variation • GLS estimator weighted by variance components. | |
reg y x1 x2 (Stata)lm(y ~ x1 + x2) (R) | • Ignores panel structure entirely • treats all observations as independent • valid only if no unobserved heterogeneity exists. | |
xtreg y x1 x2 i.year, fe (Stata)plm(y ~ x1 + x2, effect="twoways") (R) | • Controls for both entity-specific and time-specific unobserved effects • standard for difference-in-differences and event studies. |