Panel Data Analysis Cheat Sheet

Updated 2026-05-28

Panel data (also called longitudinal or cross-sectional time-series data) combines both cross-sectional and temporal dimensions, observing multiple entities (individuals, firms, countries) repeatedly over time. This structure enables researchers to control for unobserved heterogeneity that remains constant over time, substantially reducing omitted variable bias compared to pure cross-sectional or time-series approaches. Panel methods are fundamental in econometrics, empirical research, and causal inference, with applications spanning labor economics, health policy, finance, and social sciences. A critical distinction in panel data analysis is understanding the source of variation: whether identification comes from changes within entities over time (within variation) or differences between entities (between variation), as different estimators exploit different dimensions of the data structure.

What This Cheat Sheet Covers

This topic spans 21 focused tables and 153 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Panel Data ModelsTable 2: Panel Data Structure & FormatsTable 3: Key Concepts & Variation SourcesTable 4: Model Selection & Specification TestsTable 5: Diagnostic Tests for ViolationsTable 6: Standard Errors & InferenceTable 7: Dynamic Panel Models & GMMTable 8: Causal Inference Methods with PanelsTable 9: Data Transformations & OperationsTable 10: Assumptions & RequirementsTable 11: R ImplementationTable 12: Python ImplementationTable 13: Stata CommandsTable 14: Advanced TopicsTable 15: Model Comparison & SelectionTable 16: Panel Analysis WorkflowsTable 17: Common Pitfalls & FixesTable 18: Specialized Panel EstimatorsTable 19: Math FoundationsTable 20: Data Preparation & CleaningTable 21: Inference & Hypothesis Testing

Table 1: Core Panel Data Models

The six foundational estimators each exploit a distinct slice of panel variation; choosing between them hinges on whether unobserved entity effects are correlated with the regressors — a question answered empirically by the Hausman test.

Model	Example	Description
Fixed Effects (Within)	`xtreg y x1 x2, fe` (Stata) `plm(y ~ x1 + x2, model="within")` (R)	• Eliminates time-invariant unobserved heterogeneity by demeaning (subtracting entity-specific means) • identifies effects using only within-entity variation over time.
Random Effects (RE)	`xtreg y x1 x2, re` (Stata) `plm(y ~ x1 + x2, model="random")` (R)	• Assumes unobserved effects are uncorrelated with regressors • uses both within and between variation • GLS estimator weighted by variance components.
Pooled OLS	`reg y x1 x2` (Stata) `lm(y ~ x1 + x2)` (R)	• Ignores panel structure entirely • treats all observations as independent • valid only if no unobserved heterogeneity exists.
Two-Way Fixed Effects	`xtreg y x1 x2 i.year, fe` (Stata) `plm(y ~ x1 + x2, effect="twoways")` (R)	• Controls for both entity-specific and time-specific unobserved effects • standard for difference-in-differences and event studies • can be biased under staggered treatment with heterogeneous effects.

Table 1: Core Panel Data Models

Model	Example	Description
Fixed Effects (Within)	`xtreg y x1 x2, fe` (Stata) `plm(y ~ x1 + x2, model="within")` (R)	• Eliminates time-invariant unobserved heterogeneity by demeaning (subtracting entity-specific means) • identifies effects using only within-entity variation over time.
Random Effects (RE)	`xtreg y x1 x2, re` (Stata) `plm(y ~ x1 + x2, model="random")` (R)	• Assumes unobserved effects are uncorrelated with regressors • uses both within and between variation • GLS estimator weighted by variance components.
Pooled OLS	`reg y x1 x2` (Stata) `lm(y ~ x1 + x2)` (R)	• Ignores panel structure entirely • treats all observations as independent • valid only if no unobserved heterogeneity exists.
Two-Way Fixed Effects	`xtreg y x1 x2 i.year, fe` (Stata) `plm(y ~ x1 + x2, effect="twoways")` (R)	• Controls for both entity-specific and time-specific unobserved effects • standard for difference-in-differences and event studies • can be biased under staggered treatment with heterogeneous effects.