Multiple Model Selection — First Local Chatbot (Chapter 10)

Add a model selector to the UI. When the user changes the model, the next request uses the new model. Ollama models are just strings—no registration required.

async function loadModels() {
  const res = await fetch("/models");
  const { models } = await res.json();
  const select = document.getElementById("modelSelect");
  select.innerHTML = "";
  for (const m of models) {
    const opt = document.createElement("option");
    opt.value = m;
    opt.textContent = m;
    select.appendChild(opt);
  }
}
loadModels();

FastAPI already serves model names from the Ollama API at GET /models. To filter models (e.g., only show 7B parameter models), you can parse the model string:

def compatible_models() -> list[str]:
    all_models = list_models()
    # Ollama model names are like "llama3:8b-instruct-q4_0"
    # Filter to exclude quantization variants if needed
    base_models = set()
    for m in all_models:
        base = m.split(":")[0]
        base_models.add(base)
    return list(base_models)

A failure mode: if a model is selected that is not pulled, Ollama returns {"error": "model not found"}. Handle it in the frontend by showing an alert: "Model not found. Run: ollama pull {selectedModel}".

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.