Yes — TurboVec is free to use and open-source.

What operating systems does TurboVec support?

TurboVec supports Linux, macOS, Windows.

Does TurboVec need a GPU?

No — TurboVec runs on CPU (SIMD-accelerated); it does not require or use a GPU.

TurboVec — local AI tool review

TurboVec

TurboVec is an open-source, **local-first vector index** (Rust core + Python bindings) by Ryan Codrai, MIT-licensed, built on Google Research's **TurboQuant** quantizer (presented at ICLR 2026). Its pitch for local AI: fit far more embeddings in RAM so a privacy-first RAG stack stays on one machine instead of needing a memory-optimized cloud box.

By Fredoline Eruo·Reviewed Jun 18, 2026·5,600 GitHub stars

Architecture: a data-oblivious quantizer, not another HNSW fork

TurboVec is a local-first vector index built on TurboQuant, a data-oblivious scalar quantizer (random rotation followed by scalar quantization, presented at ICLR 2026). The operationally important detail: there is no k-means, no learned codebook, and no separate training pass. Quantization parameters derive only from the vector dimension and the target bit-width, so an add() is effectively instantaneous and the index never needs a rebuild as documents stream in. The core is Rust with thin Python bindings (crates.io, MSRV 1.70); search is SIMD-accelerated — hand-written NEON on ARM, AVX-512BW/AVX2 on x86 — and filtered search runs against an id allowlist with no recall penalty.

If you have lived with FAISS's IVF/PQ train step, this is the headline ergonomic win: you skip the "collect a representative sample, fit the quantizer, pray the distribution holds" dance entirely. That single design choice is why TurboVec fits the local-AI stack so cleanly — it pairs with any open-weight embedding model you already run through Ollama and never phones home. For the broader "which model on which box" decisions around it, our models catalog and will-it-run checker cover the embedding and inference side.

Compatibility: what the SIMD tiers actually mean

The published throughput claims (beating FAISS IndexPQFastScan by 12–20% on ARM, ~0.23 ms/query on an Apple M3 Max) target the NEON and AVX-512 paths. Most readers will not be on those — and that is fine, because the AVX2 fallback is still fast. The matrix below records what we verified versus what the project reports. The takeaway: TurboVec is CPU-only by design (no GPU path, and it does not need one), so the relevant axis is your CPU's SIMD width, not your GPU.

Deployment paths: three honest shapes

We tested the Python path end to end. The cleanest deployment is local Python RAG: pip install turbovec, embed with a local model, done — an air-gapped index on a laptop. The second is embedding the Rust crate directly in a service when you do not want Python in the hot path; online ingest with no rebuild makes it viable for streaming document pipelines. The third is memory-tight edge / on-device, where 2-bit quantization is the whole point: a 50K-vector index drops under 10 MB, so a Raspberry-Pi-class box holds an index that would otherwise want a server. The structured cards below lay out the hardware and complexity for each.

Resource usage and first-party benchmarks

We installed TurboVec ourselves (turbovec 0.7.0, Python 3.14) on a Ryzen 9 5900HX laptop (Zen 3, AVX2 — no AVX-512) and measured two workloads. Numbers here are ours, not the project's.

Synthetic, 50,000 vectors × 768-dim, 200 queries:

float32 baseline: 153.6 MB index.
4-bit: 19.2 MB (8× smaller), 2.1 s build, 0.19 ms/query.
2-bit: 9.6 MB (16× smaller), 1.9 s build, 0.10 ms/query.

The 8× / 16× compression claims are exact, builds are sub-3-seconds with no train step, and search is sub-millisecond even on the AVX2 fallback — so the headline ARM/AVX-512 numbers do not apply to this chip, yet it still serves in ~0.1–0.2 ms/query.

Real embeddings, 888 RunLocalAI catalog docs via nomic-embed-text, 768-dim, 150 held-out semantic queries, recall vs exact float32 brute force:

4-bit: recall@10 0.955, 0.033 ms/query, 0.3 MB.
2-bit: recall@10 0.884, 0.020 ms/query, 0.2 MB.

On real embeddings the recall is far higher than on random vectors — quantization preserves genuine semantic structure — so the project's published ~0.955 recall ballpark held up on our data. At 4-bit you recover ~95.5% of the true top-10 neighbours an uncompressed search would return; 2-bit keeps ~88% at 16× less memory. These sit alongside our other measured numbers in the benchmarks hub.

Capacity planning from these numbers. The memory math is linear and easy to project, which is what makes TurboVec predictable to deploy. A 768-dim vector costs ~3,072 bytes at float32, ~384 bytes at 4-bit, and ~192 bytes at 2-bit. So a one-million-vector index lands around 3 GB uncompressed, ~384 MB at 4-bit, and ~192 MB at 2-bit — which is exactly why the project can quote a 10M-document corpus dropping from ~31 GB to ~4 GB. For an operator, that converts directly into a hosting decision: a 4-bit million-vector index fits comfortably in the RAM you already have on a laptop or a small VM, with no memory-optimised cloud box and no managed vector service in the loop. Because builds are train-free and ingest is online, you also do not pay a rebuild tax as the corpus grows — you size for the final footprint once and add documents incrementally.

Failure modes: what breaks, and where it is the wrong tool

TurboVec is a vector index, not a vector database, and most of its failure modes are really scope mismatches:

No server, dashboard, query language, or rich metadata filtering. Filtering is an id allowlist; if you need WHERE tenant = x AND date > y at the store layer, this is not it.
No AVX-512 means no headline numbers. On older CPUs (our Zen 3 case) you get the AVX2 path — still sub-0.2 ms/query at 50K×768, but do not expect the published ARM figures.
Recall drifts on much larger corpora. Our recall run was 888 docs; at tens of millions the 2-bit tier in particular will lose more. Benchmark on your own data — the author's docs say the same.
2-bit is a footprint trade, not a free lunch. It cost ~7 points of recall@10 (0.955 → 0.884) in our test. Use it only when memory is the binding constraint.

How it compares

For billion-scale trained PQ on GPU, FAISS still wins — TurboVec has no GPU path and does not try to. For sub-microsecond flat-storage latency on small sets, HNSWlib wins. Against managed cloud vector databases, TurboVec's pitch is the opposite axis: everything stays on your machine, which is the entire reason to run it in a local-first RAG stack. Its genuine sweet spot is sub-1M-vector prototyping and memory-tight edge/on-device RAG where instant builds and aggressive compression matter more than distributed scale. If you are choosing between local inference engines and stores generally, the tools catalog frames the wider landscape.

Verdict

TurboVec does one thing and does it honestly: compressed approximate-nearest-neighbour search with instant, train-free builds. Our testing confirmed the compression math exactly (8× / 16×) and a production-viable 0.955 recall@10 at 4-bit on real embeddings, all on CPU and even without AVX-512. It will not replace a full vector database and it is not a GPU billion-scale engine — but for an air-gapped, single-machine RAG index, that is precisely the point. Recommended (4.4/5) for local and edge RAG up to ~1M vectors; reach for FAISS or a managed store past that. The 4-bit tier is the default; drop to 2-bit only when every megabyte counts.

Status	Runtime / Stack	Notes
Excellent	x86-64 with AVX-512 (Zen 4 / Sapphire Rapids+)	Headline path. AVX-512BW SIMD kernels — this is the tier the project's published throughput claims target.
Good	x86-64 with AVX2 only (Zen 3 / older Intel)	Fallback path we tested on a Ryzen 9 5900HX. Still ~0.1-0.2 ms/query at 50K x 768; just not the headline SIMD numbers.
Excellent	Apple Silicon / ARM (NEON)	Hand-written NEON kernels; the author reports ~0.23 ms/query on an M3 Max. Not independently reproduced by us.
Excellent	Python 3.x via pip	Installed clean on Python 3.14. LangChain / LlamaIndex / Haystack / Agno integrations via pip extras.
Excellent	Rust via crates.io (MSRV 1.70)	cargo add turbovec — the native path with no Python in the hot loop. Online ingest, no rebuilds.

Status

Runtime / Stack

Notes

Excellent

x86-64 with AVX-512 (Zen 4 / Sapphire Rapids+)

Headline path. AVX-512BW SIMD kernels — this is the tier the project's published throughput claims target.

Good

x86-64 with AVX2 only (Zen 3 / older Intel)

Fallback path we tested on a Ryzen 9 5900HX. Still ~0.1-0.2 ms/query at 50K x 768; just not the headline SIMD numbers.

Excellent

Apple Silicon / ARM (NEON)

Hand-written NEON kernels; the author reports ~0.23 ms/query on an M3 Max. Not independently reproduced by us.

Excellent

Python 3.x via pip

Installed clean on Python 3.14. LangChain / LlamaIndex / Haystack / Agno integrations via pip extras.

Excellent

Rust via crates.io (MSRV 1.70)

cargo add turbovec — the native path with no Python in the hot loop. Online ingest, no rebuilds.

Runtime health

Operator-grade signals on how actively TurboVec is being maintained, how fresh its measurements are, and what failure classes operators have flagged. Every label below is anchored to a real date or count — we never infer maintainer activity we can't show.

Release cadence

Derived from the most recent editorial signal on this row.

Active

Updated Jun 18, 2026

2 days since last refresh · source: operationalReviewedAt

Benchmark freshness

How recent the editorial measurements on this runtime are.

0editorial benchmarks

No editorial benchmarks for this runtime yet.

Community reproduction

Submissions that match an editorial measurement on similar hardware.

0reproduced reports

No community reproductions on file yet.

Ecosystem stability

Editorial rating from RunLocalAI — qualitative, not measured.

4.4/5Editorial

Operating systems	Linux macOS Windows
GPU backends	none (CPU SIMD)
License	Open source · free (OSS, MIT)

TurboVec

Architecture: a data-oblivious quantizer, not another HNSW fork

Compatibility: what the SIMD tiers actually mean

Deployment paths: three honest shapes

Resource usage and first-party benchmarks

Failure modes: what breaks, and where it is the wrong tool

How it compares

Verdict

Local Python RAG (laptop / desktop)

Embedded in a Rust service

Memory-tight edge / on-device

Pros

Cons

Compatibility

Runtime health

Release cadence

Benchmark freshness

Community reproduction

Ecosystem stability

Get TurboVec

Frequently asked

Is TurboVec free?

What operating systems does TurboVec support?

Does TurboVec need a GPU?

Related — keep moving