Personality Configuration — First Local Chatbot (Chapter 9)

System prompts control the chatbot's personality. Ollama accepts a messages array where the first message has role: "system". Store the system prompt in the session and prepend it to every request:

DEFAULT_SYSTEM = "You are a helpful, concise assistant. Answer in plain text, no markdown unless requested."

@app.post("/chat")
async def chat(session_id: str, model: str, messages: list[dict], system_prompt: str = DEFAULT_SYSTEM):
    if session_id not in sessions:
        sessions[session_id] = []

    # Inject system prompt at the start
    full_messages = [{"role": "system", "content": system_prompt}] + sessions[session_id]

    def stream():
        from app.ollama_client import stream_chat
        for chunk in stream_chat(model, full_messages):
            yield chunk

    return StreamingResponse(stream(), media_type="text/event-stream")

On the frontend, add a system prompt textarea to the settings panel:

<label>System Prompt:</label>
<textarea id="systemPrompt" rows="3">You are a helpful, concise assistant.</textarea>

Send it with each request:

const systemPrompt = document.getElementById("systemPrompt").value;
const response = await fetch(`/chat?session_id=${sessionId}&model=${model}&system_prompt=${encodeURIComponent(systemPrompt)}`, { ... });

Temperature controls randomness. Add a slider:

<label>Temperature: <span id="tempVal">0.7</span></label>
<input type="range" id="tempSlider" min="0" max="2" step="0.1" value="0.7" />

Pass it in the payload and update stream_chat to include temperature in the request body.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.