CatBoost is a gradient boosting library developed by Yandex that handles categorical features natively without manual encoding, making it a strong default choice for tabular data with mixed feature types. Its core innovations β ordered boosting and symmetric oblivious trees β address target leakage and deliver fast, regularized training that often requires minimal hyperparameter tuning. Unlike XGBoost and LightGBM, CatBoost computes target statistics on previous-row permutations to prevent prediction shift, and its symmetric tree structure enables efficient vectorized evaluation on both CPU and GPU. The key mental model: CatBoost trades some flexibility (symmetric splits only by default) for strong out-of-the-box generalization and native handling of high-cardinality categoricals.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 123 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Classes and Installation
CatBoost exposes four main Python classes β one per task type β each with a scikit-learn-compatible API. Knowing which class to use and how to install the library correctly is the starting point for any CatBoost workflow.
| Class / Command | Example | Description |
|---|---|---|
from catboost import CatBoostClassifiermodel = CatBoostClassifier(iterations=500, depth=6) | Estimator for binary and multi-class classification; default loss is Logloss (binary) or MultiClass (>2 classes). | |
from catboost import CatBoostRegressormodel = CatBoostRegressor(loss_function='RMSE') | Estimator for regression tasks; default loss is RMSE. | |
from catboost import CatBoostRankermodel = CatBoostRanker(loss_function='YetiRank') | Estimator for learning-to-rank tasks; default loss is YetiRank. |