RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Classical ML algorithms / CatBoost
Classical ML algorithms

CatBoost

CatBoost is a gradient boosting library developed by Yandex that handles categorical features automatically without manual encoding. For operators running local AI, CatBoost is relevant when working with tabular data tasks like classification or regression, where it competes with XGBoost and LightGBM. Its key differentiator is the use of ordered boosting to reduce prediction shift, and it natively supports categorical columns, which can simplify preprocessing pipelines.

Deeper dive

CatBoost builds an ensemble of decision trees sequentially, where each tree corrects errors of the previous ones. Unlike other boosting libraries, CatBoost uses symmetric trees (oblivious trees) and a novel method for handling categorical features: it computes target statistics based on a random permutation of the data, avoiding target leakage. The library also implements ordered boosting, a technique that uses a separate model for each data point to compute residuals, further reducing overfitting. CatBoost is optimized for GPU training and can be faster than XGBoost on certain datasets. For local operators, CatBoost is available via pip and can be used with Python or command-line interface. It outputs a model file that can be loaded for inference, but it is not typically used in LLM pipelines; it is more common in traditional ML workflows.

Practical example

An operator training a model to predict housing prices on a dataset with categorical features like 'neighborhood' and 'roof type' can use CatBoost without one-hot encoding. With a 16 GB GPU, training on 100k rows with 50 features takes roughly 5-10 minutes. The model file size is typically a few MB, easily fitting in system RAM.

Workflow example

In a local ML workflow, an operator might run catboost fit --learn-set train.csv --test-set test.csv --column-description col_desc.cd to train a model. The column description file specifies which columns are categorical. After training, the model is saved as model.cbm. For inference, catboost calc --input-path test.csv --model-path model.cbm --output-path predictions.txt produces predictions. This workflow is common in Kaggle competitions or small-scale production systems.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →