03. Model Configuration

Chapter 3 of 18 · 20 min

KEY INSIGHT

Configuring models in Continue involves matching the provider interface to your backend's API format and ensuring the model identifier matches what your server advertises. Continue supports multiple backend providers: OpenAI-compatible API, Anthropic, Google, local providers, and custom endpoints. Most local model servers (LM Studio, Ollama, llama.cpp) expose OpenAI-compatible APIs, making the `openai` provider the most common choice. The provider configuration requires three parameters: the provider name, the model identifier, and the API base URL. The model identifier must exactly match what your server reports when you query `/v1/models`. ```bash # Query LM Studio for available models curl http://localhost:1234/v1/models | python3 -m json.tool ``` The response looks like: ```json { "object": "list", "data": [ { "id": "codestral-22b-instruct", "object": "model", "created": 1234567890, "owned_by": "local" } ] } ``` Use `codestral-22b-instruct` as your model name in the Continue config. Ollama exposes models differently. Run: ```bash ollama list ``` You'll see model names like `codestral:latest` or `deepseek-coder:33b`. Configure these as: ```json { "models": [{ "title": "Deepseek Coder", "provider": "openai", "model": "deepseek-coder:33b", "api_base": "http://localhost:11434/v1" }] } ``` Environment variables handle API keys for cloud providers. For local models, use `"api_key": "local"` or omit the field entirely. Some backends require specific authentication headersΓÇöin those cases, use the `request_options` configuration: ```json { "models": [{ "title": "Codestral", "provider": "openai", "model": "codestral", "api_base": "http://localhost:1234/v1", "request_options": { "headers": { "Authorization": "Bearer your-token-here" } } }] } ``` Model-specific settings like temperature, top_p, and max tokens apply per-model: ```json { "models": [{ "title": "Code Assistant", "provider": "openai", "model": "codestral-22b-instruct", "api_base": "http://localhost:1234/v1", "temperature": 0.3, "max_tokens": 4096 }] } ``` Lower temperature (0.2-0.4) produces deterministic code completion. Higher temperature (0.6-0.8) works better for creative tasks like suggesting alternative implementations. For dual-model setups (separate autocomplete and chat models), configure both in the same file: ```json { "models": [ { "title": "Deepseek Coder Chat", "provider": "openai", "model": "deepseek-coder:33b", "api_base": "http://localhost:11434/v1" } ], "tabAutocompleteModel": { "title": "Starcoder FIM", "provider": "openai", "model": "bigcode/starcoder", "api_base": "http://localhost:1234/v1", "useLegacyFill": true } } ``` The `useLegacyFill` option enables the older fill-in-middle format some models trained with.

EXERCISE

Query your model server's /v1/models endpoint, note the exact model identifiers, then configure Continue with both a chat model and an autocomplete model. Verify both work by triggering each explicitly.