runner
Open source
free
4.7/5

Ollama

The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.

By Fredoline Eruo·Last verified May 6, 2026·130,000 GitHub stars

Overview

The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.

Pros

  • Zero-config setup
  • OpenAI-compatible API
  • Curated model library
  • Cross-platform

Cons

  • Less control than raw llama.cpp
  • Conservative default context length

Compatibility

Operating systems
macOS
Linux
Windows
GPU backends
NVIDIA CUDA
AMD ROCm
Apple Metal
CPU
LicenseOpen source · free

Get Ollama

Benchmarks using Ollama

ModelHardwareQuanttok/sVRAM
Mistral 7B Instruct v0.3NVIDIA GeForce RTX 4090Q4_K_M112.3 tok/s5.1 GB
Llama 3.1 8B InstructNVIDIA GeForce RTX 4090Q4_K_M104.7 tok/s5.4 GB
Mixtral 8x7B InstructNVIDIA GeForce RTX 4090Q4_K_M31.4 tok/s23.1 GB

Frequently asked

Is Ollama free?

Yes — Ollama is free to download and use and open-source under a permissive license.

What operating systems does Ollama support?

Ollama supports macOS, Linux, Windows.

Which GPUs work with Ollama?

Ollama supports NVIDIA CUDA, AMD ROCm, Apple Metal, CPU. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.