Ollama

The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.

By Fredoline Eruo·Last verified May 6, 2026·130,000 GitHub stars

Overview

The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.

Pros

Zero-config setup
OpenAI-compatible API
Curated model library
Cross-platform

Cons

Less control than raw llama.cpp
Conservative default context length

Compatibility

Operating systems	macOS Linux Windows
GPU backends	NVIDIA CUDA AMD ROCm Apple Metal CPU
License	Open source · free

Get Ollama

Official site

https://ollama.com

GitHub

https://github.com/ollama/ollama

Benchmarks using Ollama

Model	Hardware	Quant	tok/s	VRAM
Mistral 7B Instruct v0.3	NVIDIA GeForce RTX 4090	Q4_K_M	112.3 tok/s	5.1 GB
Llama 3.1 8B Instruct	NVIDIA GeForce RTX 4090	Q4_K_M	104.7 tok/s	5.4 GB
Mixtral 8x7B Instruct	NVIDIA GeForce RTX 4090	Q4_K_M	31.4 tok/s	23.1 GB

Frequently asked

Is Ollama free?

Yes — Ollama is free to download and use and open-source under a permissive license.

What operating systems does Ollama support?

Ollama supports macOS, Linux, Windows.

Which GPUs work with Ollama?

Ollama supports NVIDIA CUDA, AMD ROCm, Apple Metal, CPU. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.