CatBoost Cheat Sheet

Updated 2026-05-21

CatBoost is a gradient boosting library developed by Yandex that handles categorical features natively without manual encoding, making it a strong default choice for tabular data with mixed feature types. Its core innovations — ordered boosting and symmetric oblivious trees — address target leakage and deliver fast, regularized training that often requires minimal hyperparameter tuning. Unlike XGBoost and LightGBM, CatBoost computes target statistics on previous-row permutations to prevent prediction shift, and its symmetric tree structure enables efficient vectorized evaluation on both CPU and GPU. The key mental model: CatBoost trades some flexibility (symmetric splits only by default) for strong out-of-the-box generalization and native handling of high-cardinality categoricals.

What This Cheat Sheet Covers

This topic spans 20 focused tables and 123 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Classes and InstallationTable 2: Pool — the Core Data StructureTable 3: Categorical Feature HandlingTable 4: Ordered Boosting and Prediction Shift PreventionTable 5: Symmetric Oblivious TreesTable 6: Key Hyperparameters for TuningTable 7: Overfitting Detection and Early StoppingTable 8: GPU TrainingTable 9: Loss Functions and ObjectivesTable 10: Ranking ObjectivesTable 11: Text and Embedding FeaturesTable 12: Feature Importance and InterpretabilityTable 13: Cross-Validation and Hyperparameter SearchTable 14: Model Saving, Loading, and ExportTable 15: Missing Value HandlingTable 16: Regularization TechniquesTable 17: Uncertainty Estimation with Virtual EnsemblesTable 18: CLI (Command-Line) TrainingTable 19: Custom Loss Functions and MetricsTable 20: Advanced Features — select_features, Snapshots, and Shrink

Table 1: Core Classes and Installation

CatBoost exposes four main Python classes — one per task type — each with a scikit-learn-compatible API. Knowing which class to use and how to install the library correctly is the starting point for any CatBoost workflow.

Class / Command	Example	Description
CatBoostClassifier	`from catboost import CatBoostClassifier` `model = CatBoostClassifier(iterations=500, depth=6)`	• Estimator for binary and multi-class classification • default loss is `Logloss` (binary) or `MultiClass` (>2 classes).
CatBoostRegressor	`from catboost import CatBoostRegressor` `model = CatBoostRegressor(loss_function='RMSE')`	• Estimator for regression tasks • default loss is `RMSE`.
CatBoostRanker	`from catboost import CatBoostRanker` `model = CatBoostRanker(loss_function='YetiRank')`	• Estimator for learning-to-rank tasks • default loss is `YetiRank`.

Table 1: Core Classes and Installation

Class / Command	Example	Description
CatBoostClassifier	`from catboost import CatBoostClassifier` `model = CatBoostClassifier(iterations=500, depth=6)`	• Estimator for binary and multi-class classification • default loss is `Logloss` (binary) or `MultiClass` (>2 classes).
CatBoostRegressor	`from catboost import CatBoostRegressor` `model = CatBoostRegressor(loss_function='RMSE')`	• Estimator for regression tasks • default loss is `RMSE`.
CatBoostRanker	`from catboost import CatBoostRanker` `model = CatBoostRanker(loss_function='YetiRank')`	• Estimator for learning-to-rank tasks • default loss is `YetiRank`.