04. Fill-in-Middle Models

Chapter 4 of 18 · 20 min

KEY INSIGHT

Fill-in-middle (FIM) training enables models to predict code at arbitrary cursor positions by conditioning on both prefix and suffix, which matches how developers actually write code. Traditional language models predict sequences from left to right. Given the code before the cursor, they predict what comes next. This works for line completion but fails when you want to insert code in the middle of an existing block, complete method bodies, or fill in gaps between existing code. FIM-trained models receive three components: a prefix (code before cursor), a middle placeholder, and a suffix (code after cursor). The model learns to predict what belongs in the middle position given both surrounding contexts. The training objective uses two distinct phases: infill span prediction and standard next-token prediction. During FIM training, the model learns to reconstruct masked regions while respecting surrounding context boundaries. Inference requires specialized prompting formats. The `codegen` library and some model servers support FIM via special tokens: ``` <∩╜£prefix∩╜£>function calculateMetrics(data) { const results = []; <∩╜£suffix∩╜£> return results; } <∩╜£middle∩╜£> ``` The model generates the `middle` sectionΓÇöthe code that fits between prefix and suffix. Continue's `useLegacyFill` option enables a simpler format used by older FIM models: ```json { "tabAutocompleteModel": { "model": "bigcode/starcoder", "useLegacyFill": true } } ``` The legacy format wraps prefix and suffix with ` Φ╡ñ enc ` and ` πÉÇ ` tokens, followed by the middle generation. Not all models support FIM. Models trained specifically for code completion include: | Model | Parameters | Context | FIM Support | |-------|-----------|---------|-------------| | Starcoder 2 | 3B, 7B, 15B | 8K | Native | | Codestral | 22B | 32K | Native | | Deepseek-Coder | 6.7B, 33B | 16K | Native | | Qwen2.5-Coder | 7B, 32B | 128K | Via instruction tuning | Codestral specifically excels at FIM tasks due to its training on extensive code completion data. The model handles both single-line and multi-line completions with natural code flow. Context preparation for FIM matters significantly. The prefix typically includes 20-50 lines before the cursor, while the suffix captures the closing scope (function end, class brace, import block). Too little suffix context causes the model to generate incomplete code. Too much prefix reduces the tokens available for actual completion. FIM performance degrades when the model confuses the surrounding context boundaries. This commonly occurs with large comment blocks, string literals containing code-like syntax, and deeply nested indentation. Strategies to improve FIM quality: 1. Keep suffix context minimalΓÇöjust enough to capture scope closure 2. Exclude large comment blocks from the prefix 3. Use language-specific truncation to preserve structural context

EXERCISE

Write a Python function with a missing body, position your cursor inside it, and observe how the FIM model handles both prefix (function signature) and suffix (indentation return) context. Compare the output to a left-to-right model by temporarily switching your autocomplete model.