RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /First Local Chatbot
  6. /Ch. 10
First Local Chatbot

10. Multiple Model Selection

Chapter 10 of 15 · 15 min
KEY INSIGHT

Model switching requires zero backend changes—Ollama handles multiple models from one endpoint; the backend just passes the model name through.

Add a model selector to the UI. When the user changes the model, the next request uses the new model. Ollama models are just strings—no registration required.

async function loadModels() {
  const res = await fetch("/models");
  const { models } = await res.json();
  const select = document.getElementById("modelSelect");
  select.innerHTML = "";
  for (const m of models) {
    const opt = document.createElement("option");
    opt.value = m;
    opt.textContent = m;
    select.appendChild(opt);
  }
}
loadModels();

FastAPI already serves model names from the Ollama API at GET /models. To filter models (e.g., only show 7B parameter models), you can parse the model string:

def compatible_models() -> list[str]:
    all_models = list_models()
    # Ollama model names are like "llama3:8b-instruct-q4_0"
    # Filter to exclude quantization variants if needed
    base_models = set()
    for m in all_models:
        base = m.split(":")[0]
        base_models.add(base)
    return list(base_models)

A failure mode: if a model is selected that is not pulled, Ollama returns {"error": "model not found"}. Handle it in the frontend by showing an alert: "Model not found. Run: ollama pull {selectedModel}".

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Add a "Model Info" display that shows estimated size and quantization from the model name, parsed client-side.

← Chapter 9
Personality Configuration
Chapter 11 →
Error Handling