XGBoost Cheat Sheet

Updated 2026-05-20

XGBoost (eXtreme Gradient Boosting) is a highly optimized, scalable implementation of gradient-boosted decision trees that consistently ranks among the top-performing algorithms in structured-data competitions and production ML systems. It solves regression, classification, ranking, and survival problems by sequentially fitting trees to residuals, with second-order Taylor expansion of the loss enabling both speed and strong regularization. The key mental model: XGBoost is not one algorithm — it is a framework; every major behavior from tree structure to sampling to the objective function is configurable, and nearly every real-world win comes from understanding which lever to pull first.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: DMatrix — Data Loading and ConstructionTable 2: Core Training HyperparametersTable 3: Regularization ParametersTable 4: Objective FunctionsTable 5: Tree Method and Booster SelectionTable 6: Early Stopping and EvaluationTable 7: GPU TrainingTable 8: Feature Importance and SHAPTable 9: Handling Imbalanced DataTable 10: Native Categorical Feature SupportTable 11: Distributed Training — DaskTable 12: Distributed Training — PySparkTable 13: Hyperparameter Tuning with OptunaTable 14: Prediction Methods and Output TypesTable 15: Model Saving and LoadingTable 16: Monotone and Interaction ConstraintsTable 17: Multi-Output LearningTable 18: Learning to RankTable 19: Common Pitfalls and Best Practices

Table 1: DMatrix — Data Loading and Construction

The DMatrix is XGBoost's native data container; all training, evaluation, and prediction flows through it. Feeding data via DMatrix rather than raw arrays enables efficient internal compression and avoids redundant work across boosting rounds.

Method	Example	Description
DMatrix from NumPy	`dtrain = xgb.DMatrix(X, label=y)`	Wraps a NumPy array or Pandas DataFrame with optional label, weight, and base_margin.
DMatrix from Pandas	`dtrain = xgb.DMatrix(df[feats], label=df['y'])`	• Accepts a `pd.DataFrame` • column names are preserved as feature names automatically
DMatrix from SciPy sparse	`dtrain = xgb.DMatrix(csr_matrix)`	• Accepts `scipy.sparse.csr_matrix` • implicit zeros are treated as missing, not as the value 0 — convert to dense if zeros are real values
DMatrix missing value	`dtrain = xgb.DMatrix(X, label=y, missing=np.nan)`	• Explicitly declares which value should be treated as missing • default is `np.nan`.
DMatrix with weights	`dtrain = xgb.DMatrix(X, label=y, weight=w)`	• Per-sample training weights • higher weights increase a sample's influence on gradient updates
DMatrix with base_margin	`dtrain = xgb.DMatrix(X, label=y,` `base_margin=prior_scores)`	• Per-sample initial prediction offset (raw margin, before link function) • overrides `base_score` when provided • used to warm-start from another model's output

Table 1: DMatrix — Data Loading and Construction

Method	Example	Description
DMatrix from NumPy	`dtrain = xgb.DMatrix(X, label=y)`	Wraps a NumPy array or Pandas DataFrame with optional label, weight, and base_margin.
DMatrix from Pandas	`dtrain = xgb.DMatrix(df[feats], label=df['y'])`	• Accepts a `pd.DataFrame` • column names are preserved as feature names automatically
DMatrix from SciPy sparse	`dtrain = xgb.DMatrix(csr_matrix)`	• Accepts `scipy.sparse.csr_matrix` • implicit zeros are treated as missing, not as the value 0 — convert to dense if zeros are real values
DMatrix missing value	`dtrain = xgb.DMatrix(X, label=y, missing=np.nan)`	• Explicitly declares which value should be treated as missing • default is `np.nan`.
DMatrix with weights	`dtrain = xgb.DMatrix(X, label=y, weight=w)`	• Per-sample training weights • higher weights increase a sample's influence on gradient updates
DMatrix with base_margin	`dtrain = xgb.DMatrix(X, label=y,` `base_margin=prior_scores)`	• Per-sample initial prediction offset (raw margin, before link function) • overrides `base_score` when provided • used to warm-start from another model's output