RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI for Code Generation
  6. /Ch. 4
Local AI for Code Generation

04. Fill-in-Middle Models

Chapter 4 of 18 · 20 min
KEY INSIGHT

Fill-in-middle (FIM) training enables models to predict code at arbitrary cursor positions by conditioning on both prefix and suffix, which matches how developers actually write code. Traditional language models predict sequences from left to right. Given the code before the cursor, they predict what comes next. This works for line completion but fails when you want to insert code in the middle of an existing block, complete method bodies, or fill in gaps between existing code. FIM-trained models receive three components: a prefix (code before cursor), a middle placeholder, and a suffix (code after cursor). The model learns to predict what belongs in the middle position given both surrounding contexts. The training objective uses two distinct phases: infill span prediction and standard next-token prediction. During FIM training, the model learns to reconstruct masked regions while respecting surrounding context boundaries. Inference requires specialized prompting formats. The `codegen` library and some model servers support FIM via special tokens: ``` <|prefix|>function calculateMetrics(data) { const results = []; <|suffix|> return results; } <|middle|> ``` The model generates the `middle` section—the code that fits between prefix and suffix. Continue's `useLegacyFill` option enables a simpler format used by older FIM models: ```json { "tabAutocompleteModel": { "model": "bigcode/starcoder", "useLegacyFill": true } } ``` The legacy format wraps prefix and suffix with ` 赤 enc ` and ` 㐀 ` tokens, followed by the middle generation. Not all models support FIM. Models trained specifically for code completion include: | Model | Parameters | Context | FIM Support | |-------|-----------|---------|-------------| | Starcoder 2 | 3B, 7B, 15B | 8K | Native | | Codestral | 22B | 32K | Native | | Deepseek-Coder | 6.7B, 33B | 16K | Native | | Qwen2.5-Coder | 7B, 32B | 128K | Via instruction tuning | Codestral specifically excels at FIM tasks due to its training on extensive code completion data. The model handles both single-line and multi-line completions with natural code flow. Context preparation for FIM matters significantly. The prefix typically includes 20-50 lines before the cursor, while the suffix captures the closing scope (function end, class brace, import block). Too little suffix context causes the model to generate incomplete code. Too much prefix reduces the tokens available for actual completion. FIM performance degrades when the model confuses the surrounding context boundaries. This commonly occurs with large comment blocks, string literals containing code-like syntax, and deeply nested indentation. Strategies to improve FIM quality: 1. Keep suffix context minimal—just enough to capture scope closure 2. Exclude large comment blocks from the prefix 3. Use language-specific truncation to preserve structural context

EXERCISE

Write a Python function with a missing body, position your cursor inside it, and observe how the FIM model handles both prefix (function signature) and suffix (indentation return) context. Compare the output to a left-to-right model by temporarily switching your autocomplete model.

← Chapter 3
Model Configuration
Chapter 5 →
Autocomplete Setup