05. Autocomplete Setup

Chapter 5 of 18 · 20 min

KEY INSIGHT

Autocomplete tuning involves balancing suggestion frequency, latency, and relevance through configuration parameters and model selection. Effective autocomplete requires tuning three dimensions: trigger behavior, suggestion quality, and response speed. The goal is suggestions that feel anticipatory without being intrusive. Trigger behavior controls when suggestions appear. Continue offers several modes: ```json { "autocomplete": { "disabled": false, "triggerMode": "automatic", "maxPrefixLines": 50, "maxSuffixLines": 10 } } ``` `triggerMode` accepts `automatic` (trigger on keystroke), `manual` (trigger on Ctrl+K), or `always` (continuous streaming). Streaming mode shows partial completions as they're generatedΓÇöuseful for seeing long completions but potentially distracting. The `maxPrefixLines` and `maxSuffixLines` settings control how much context gets sent to the model. Higher values improve suggestion relevance but increase latency and token usage. For typical Python files, 50 prefix lines captures most function and class context. The suffix should include enough lines to capture the current block's closing but not so much that it confuses the model. Debounce settings prevent autocomplete from triggering on every keystroke: ```json { "autocomplete": { "debounceDelay": 150, "quickShortcuts": ["Tab"], "multiline": true, "maxTokens": 300 } } ``` `debounceDelay` of 150ms prevents rapid-fire requests during fast typing. The `quickShortcuts` array lets you immediately accept a suggestion with Tab. `multiline` enables multi-line completions, which many developers prefer for generating complete function bodies. Model selection significantly impacts autocomplete quality. Smaller models (3-7B) generate faster but often produce irrelevant suggestions. Larger models (15B+) provide better suggestions but introduce latency. The sweet spot for most hardware is 7-15B models with FIM training. Prompt engineering for autocomplete uses a compressed context format: ``` Current file context: [recent imports and definitions] [function/class being edited] [...] [the partial line being typed] Suggested continuation: ``` The model receives only the most relevant context to keep token count low and inference fast. Common autocomplete failures: 1. **Empty suggestions**: Model isn't receiving proper context. Check API connectivity and model availability. 2. **Irrelevant completions**: Prefix/suffix context includes too much noise. Reduce context line counts. 3. **Truncated completions**: `maxTokens` limit is too low. Increase to 300-500 for longer completions. 4. **Slow suggestions**: Model too large for hardware. Consider quantizing or using a smaller model. Quality evaluation involves tracking suggestion acceptance rate. If you're accepting less than 20% of suggestions, either the model isn't matching your coding style or the trigger threshold is too aggressive. If you're accepting 60%+ but productivity feels unchanged, you may be using a verbose model that generates too much boilerplate.

EXERCISE

Configure autocomplete with a 300ms debounce, 40 max prefix lines, and multiline enabled. Use the editor for one hour, noting every time you dismiss a suggestion as irrelevant versus accept it. Adjust parameters based on your acceptance rate.