HOW-TO · DEV
How to use Continue.dev to connect a local codebase to a custom LLM backend
Target environment
Ubuntu 24.04 · Continue.dev 0.9.x
PREREQUISITES
Continue.dev installed, LLM backend running
What this does
Continue.dev is an open-source VS Code and JetBrains extension that provides an AI coding assistant powered by any LLM. By connecting Continue.dev to a local LLM backend (such as Ollama or LM Studio), developers keep code entirely on-premises while retaining full AI-assisted navigation, editing, and search capabilities.
Steps
- Open the Continue.dev configuration file at
~/.continue/config.jsonusing a text editor. - In the
modelsarray, add a new entry with"provider": "openai"(Continue uses OpenAI-compatible API shape) and"model": "local-model". - Set
"api_base": "http://localhost:11434/v1"to point to the local Ollama proxy. - If the backend requires an API key, set
"api_key": "ollama"(Ollama does not validate the key field). - Save the configuration and reload the IDE window.
- Open the Continue.dev side panel and verify that the model selector dropdown shows the newly added local model.
- Select the local model from the dropdown and send a simple prompt such as "List the files in this project" to confirm it responds.
- Confirm that responses reference the actual project files to verify context injection is functioning.
Verification
curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; models=json.load(sys.stdin)['models']; print('Models available:', len(models), '| First:', models[0]['name'] if models else 'none')"
Expected output: a line showing the count of available models and the name of at least one model, confirming the backend is reachable.
Common failures
- Connection refused: Ensure the LLM backend process is running and the port matches the
api_baseURL in the config. - Model not loaded: Run
ollama pull <model-name>before attempting to use the model in Continue.dev; the backend must have the model filesystem available. - Context window errors: Reduce the context size in the Ollama configuration or lower
maxTokensinconfig.jsonto prevent overflow on large codebases.