COURSE · BLD · I007

Local AI for Code Generation

Learn local ai for code generation through RunLocalAI's practical lens: code generation, continue dev, fim and autocomplete, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

18 chapters10hBuilder trackBy Fredoline Eruo
PREREQUISITES
  • B002
  • B003

Why this course matters

Local AI for Code Generation is for builders turning local models into working tools, agents and retrieval systems. It connects code generation, continue dev, fim, autocomplete and review to the questions RunLocalAI wants every reader to answer before they install, upgrade or scale a model: will it run, what will it cost in memory, what setting changes the result, and how do you verify the answer instead of trusting a demo?

What you will be able to do

By the end, you should be able to explain the main tradeoffs in plain language, choose a safe next experiment, and use the chapter exercises as a repeatable operator checklist. The course favors local evidence, hardware fit, context limits, latency and failure modes over generic AI vocabulary.

How to use this course

Start at chapter one if the topic is new. If you already have a working stack, scan for chapters such as Code Generation Landscape, Continue.dev Installation, Model Configuration and Fill-in-Middle Models and use those lessons as a quality-control pass before changing a workstation, team workflow or production-like local deployment.

CHAPTERS
  1. 01Code Generation LandscapeLocal code generation replaces cloud API dependencies with models running on your own hardware, giving you full control over data, latency, and costs. The code generation ecosystem splits into two distinct approach: streaming autocomplete and conversational chat. Autocomplete systems predict the next tokens as you type, requiring predictions within 50-100ms to feel responsive. Chat systems generate responses on demand, tolerating higher latency since users actively wait for results. Autocomplete implementations typically use fill-in-the-middle (FIM) architectures trained to predict code between prefix and suffix. These models process the visible cursor context and suggest what belongs at that position. Starcoder, Codestral, and Deepseek-Coder families support FIM inference. Autocomplete models are generally smaller (7-15B parameters) because speed matters more than breadth. Chat-based code assistants handle more complex tasks: explaining code, generating new functions, refactoring, and debugging. These systems work with full file context, project-wide knowledge, and multi-turn conversations. Models like CodeQwen, WizardCoder, and Deepseek-Coder-instruct excel at conversational coding tasks but require more compute for acceptable generation speeds. Context management differs significantly between approach. Autocomplete typically receives only the current file and cursor positionΓÇötypically 4K-8K tokens of context. Chat systems can incorporate entire repositories, documentation, and conversation historyΓÇödemanding models that support 32K-128K token contexts. This distinction drives hardware requirements: autocomplete needs fast token generation (50+ tokens/second), while chat prioritizes context window size. Latency targets vary by use case. Research shows developers tolerate autocomplete delays up to 150ms before perceiving sluggishness. Chat responses of 10-20 tokens/second feel responsive for most tasks, though complex refactoring may justify waiting longer. Local hardware determines which model classes are viable: | Use Case | Model Size | Latency Target | VRAM | |----------|-----------|----------------|------| | Autocomplete | 7-15B | <100ms/token | 8-16GB | | Chat | 7-33B | 10-30 tok/sec | 16-48GB | Security and privacy considerations drive many teams to local deployment. Code processed through external APIs leaves your infrastructure, potentially violating compliance requirements. Local models keep proprietary code, internal APIs, and architecture patterns within your network boundary. The tradeoff is managing model updates, infrastructure maintenance, and ensuring quality matches cloud alternatives. This course uses Continue as the primary interface because it provides both autocomplete and chat capabilities through a unified plugin architecture. Alternative editors have similar plugins (Cursor, Cline, Roo Code), but Continue's open-source nature and configuration flexibility make it ideal for learning the underlying concepts that transfer across tools.15 min
  2. 02Continue.dev InstallationContinue is a VS Code and JetBrains extension that brings local AI coding capabilities directly into your development environment with a declarative configuration file. Continue requires two components: the editor extension and a local model server. The extension handles UI, context management, and communication with the model backend. The backend runs the actual inferenceΓÇötypically through LM Studio, Ollama, or a custom API endpoint. Install the VS Code extension by opening VS Code, pressing `Ctrl+Shift+X` (or `Cmd+Shift+X` on macOS), and searching for "Continue". Click Install on the Continue extension by Continue Dev. The extension appears in the sidebar after installation. Alternatively, for JetBrains IDEs, open Settings > Plugins, search for "Continue", and install the JetBrains-compatible version. After installation, create the configuration file at `~/.continue/config.json`. This file controls model selection, context providers, and UI behavior. ```json { "models": [ { "title": "Codestral", "provider": "openai", "model": "codestral", "api_base": "http://localhost:1234/v1" } ], "tabAutocompleteModel": { "title": "Starcoder", "provider": "openai", "model": "bigcode/starcoder", "api_base": "http://localhost:1234/v1" } } ``` The `api_base` points to your local server. The configuration above uses port 1234, which is LM Studio's default. If using Ollama, the base would be `http://localhost:11434/v1`. Test the installation by opening any Python or JavaScript file in VS Code. Type a function signature like: ```python def calculate_fibonacci(n: int) -> list[int]: ``` After a brief pause, you should see autocomplete suggestions appearing. If suggestions don't appear, check the Continue status bar at the bottom of VS CodeΓÇöit shows the current model and connection status. Red indicators mean the backend isn't reachable. Common installation failures include the backend not running, firewall blocking localhost connections, and incorrect `api_base` URLs. Verify the backend with: ```bash curl http://localhost:1234/v1/models ``` This should return a JSON list of available models. If you receive a connection refused error, start your model server (covered in Chapter 3). The sidebar UI provides access to chat, context management, and extension settings. The chat panel sits on the right, context files appear in the main workspace, and a configuration button opens the `config.json` editor. Keyboard shortcuts include `Ctrl+L` to open chat and `Ctrl+K` to invoke autocomplete manually.20 min
  3. 03Model ConfigurationConfiguring models in Continue involves matching the provider interface to your backend's API format and ensuring the model identifier matches what your server advertises. Continue supports multiple backend providers: OpenAI-compatible API, Anthropic, Google, local providers, and custom endpoints. Most local model servers (LM Studio, Ollama, llama.cpp) expose OpenAI-compatible APIs, making the `openai` provider the most common choice. The provider configuration requires three parameters: the provider name, the model identifier, and the API base URL. The model identifier must exactly match what your server reports when you query `/v1/models`. ```bash # Query LM Studio for available models curl http://localhost:1234/v1/models | python3 -m json.tool ``` The response looks like: ```json { "object": "list", "data": [ { "id": "codestral-22b-instruct", "object": "model", "created": 1234567890, "owned_by": "local" } ] } ``` Use `codestral-22b-instruct` as your model name in the Continue config. Ollama exposes models differently. Run: ```bash ollama list ``` You'll see model names like `codestral:latest` or `deepseek-coder:33b`. Configure these as: ```json { "models": [{ "title": "Deepseek Coder", "provider": "openai", "model": "deepseek-coder:33b", "api_base": "http://localhost:11434/v1" }] } ``` Environment variables handle API keys for cloud providers. For local models, use `"api_key": "local"` or omit the field entirely. Some backends require specific authentication headersΓÇöin those cases, use the `request_options` configuration: ```json { "models": [{ "title": "Codestral", "provider": "openai", "model": "codestral", "api_base": "http://localhost:1234/v1", "request_options": { "headers": { "Authorization": "Bearer your-token-here" } } }] } ``` Model-specific settings like temperature, top_p, and max tokens apply per-model: ```json { "models": [{ "title": "Code Assistant", "provider": "openai", "model": "codestral-22b-instruct", "api_base": "http://localhost:1234/v1", "temperature": 0.3, "max_tokens": 4096 }] } ``` Lower temperature (0.2-0.4) produces deterministic code completion. Higher temperature (0.6-0.8) works better for creative tasks like suggesting alternative implementations. For dual-model setups (separate autocomplete and chat models), configure both in the same file: ```json { "models": [ { "title": "Deepseek Coder Chat", "provider": "openai", "model": "deepseek-coder:33b", "api_base": "http://localhost:11434/v1" } ], "tabAutocompleteModel": { "title": "Starcoder FIM", "provider": "openai", "model": "bigcode/starcoder", "api_base": "http://localhost:1234/v1", "useLegacyFill": true } } ``` The `useLegacyFill` option enables the older fill-in-middle format some models trained with.20 min
  4. 04Fill-in-Middle ModelsFill-in-middle (FIM) training enables models to predict code at arbitrary cursor positions by conditioning on both prefix and suffix, which matches how developers actually write code. Traditional language models predict sequences from left to right. Given the code before the cursor, they predict what comes next. This works for line completion but fails when you want to insert code in the middle of an existing block, complete method bodies, or fill in gaps between existing code. FIM-trained models receive three components: a prefix (code before cursor), a middle placeholder, and a suffix (code after cursor). The model learns to predict what belongs in the middle position given both surrounding contexts. The training objective uses two distinct phases: infill span prediction and standard next-token prediction. During FIM training, the model learns to reconstruct masked regions while respecting surrounding context boundaries. Inference requires specialized prompting formats. The `codegen` library and some model servers support FIM via special tokens: ``` <|prefix|>function calculateMetrics(data) { const results = []; <|suffix|> return results; } <|middle|> ``` The model generates the `middle` section—the code that fits between prefix and suffix. Continue's `useLegacyFill` option enables a simpler format used by older FIM models: ```json { "tabAutocompleteModel": { "model": "bigcode/starcoder", "useLegacyFill": true } } ``` The legacy format wraps prefix and suffix with ` 赤 enc ` and ` 㐀 ` tokens, followed by the middle generation. Not all models support FIM. Models trained specifically for code completion include: | Model | Parameters | Context | FIM Support | |-------|-----------|---------|-------------| | Starcoder 2 | 3B, 7B, 15B | 8K | Native | | Codestral | 22B | 32K | Native | | Deepseek-Coder | 6.7B, 33B | 16K | Native | | Qwen2.5-Coder | 7B, 32B | 128K | Via instruction tuning | Codestral specifically excels at FIM tasks due to its training on extensive code completion data. The model handles both single-line and multi-line completions with natural code flow. Context preparation for FIM matters significantly. The prefix typically includes 20-50 lines before the cursor, while the suffix captures the closing scope (function end, class brace, import block). Too little suffix context causes the model to generate incomplete code. Too much prefix reduces the tokens available for actual completion. FIM performance degrades when the model confuses the surrounding context boundaries. This commonly occurs with large comment blocks, string literals containing code-like syntax, and deeply nested indentation. Strategies to improve FIM quality: 1. Keep suffix context minimal—just enough to capture scope closure 2. Exclude large comment blocks from the prefix 3. Use language-specific truncation to preserve structural context20 min
  5. 05Autocomplete SetupAutocomplete tuning involves balancing suggestion frequency, latency, and relevance through configuration parameters and model selection. Effective autocomplete requires tuning three dimensions: trigger behavior, suggestion quality, and response speed. The goal is suggestions that feel anticipatory without being intrusive. Trigger behavior controls when suggestions appear. Continue offers several modes: ```json { "autocomplete": { "disabled": false, "triggerMode": "automatic", "maxPrefixLines": 50, "maxSuffixLines": 10 } } ``` `triggerMode` accepts `automatic` (trigger on keystroke), `manual` (trigger on Ctrl+K), or `always` (continuous streaming). Streaming mode shows partial completions as they're generatedΓÇöuseful for seeing long completions but potentially distracting. The `maxPrefixLines` and `maxSuffixLines` settings control how much context gets sent to the model. Higher values improve suggestion relevance but increase latency and token usage. For typical Python files, 50 prefix lines captures most function and class context. The suffix should include enough lines to capture the current block's closing but not so much that it confuses the model. Debounce settings prevent autocomplete from triggering on every keystroke: ```json { "autocomplete": { "debounceDelay": 150, "quickShortcuts": ["Tab"], "multiline": true, "maxTokens": 300 } } ``` `debounceDelay` of 150ms prevents rapid-fire requests during fast typing. The `quickShortcuts` array lets you immediately accept a suggestion with Tab. `multiline` enables multi-line completions, which many developers prefer for generating complete function bodies. Model selection significantly impacts autocomplete quality. Smaller models (3-7B) generate faster but often produce irrelevant suggestions. Larger models (15B+) provide better suggestions but introduce latency. The sweet spot for most hardware is 7-15B models with FIM training. Prompt engineering for autocomplete uses a compressed context format: ``` Current file context: [recent imports and definitions] [function/class being edited] [...] [the partial line being typed] Suggested continuation: ``` The model receives only the most relevant context to keep token count low and inference fast. Common autocomplete failures: 1. **Empty suggestions**: Model isn't receiving proper context. Check API connectivity and model availability. 2. **Irrelevant completions**: Prefix/suffix context includes too much noise. Reduce context line counts. 3. **Truncated completions**: `maxTokens` limit is too low. Increase to 300-500 for longer completions. 4. **Slow suggestions**: Model too large for hardware. Consider quantizing or using a smaller model. Quality evaluation involves tracking suggestion acceptance rate. If you're accepting less than 20% of suggestions, either the model isn't matching your coding style or the trigger threshold is too aggressive. If you're accepting 60%+ but productivity feels unchanged, you may be using a verbose model that generates too much boilerplate.20 min
  6. 06Chat in EditorIn-editor chat combines natural language interaction with full code context, letting you explain, generate, and refactor code without leaving your workflow. Continue's chat interface provides conversational AI assistance directly within VS Code or JetBrains. The chat panel appears on the right side and maintains conversation history throughout your session. Opening chat (`Ctrl+L` or `Cmd+L`) presents an empty input field. Type your question or request naturally: ``` Explain what this function does: def normalize_vector(v, target_magnitude=1.0): magnitude = sqrt(sum(x*x for x in v)) if magnitude == 0: return [0.0] * len(v) return [x * (target_magnitude / magnitude) for x in v] ``` The model receives the selected code automatically when you have text selected. Without selection, the model uses the current file's context. Chat capabilities include: - **Code explanation**: Ask "What does this do?" or "How does this algorithm work?" - **Code generation**: "Write a function that parses CSV with quoted fields" - **Refactoring**: "Convert this to use type hints" or "Extract this into a separate module" - **Debugging**: "Why is this returning None?" with error context - **Documentation**: "Add docstrings to all public methods in this class" Context attachment expands what the model can reason about. Click the paperclip icon or drag files into the chat panel to attach additional files: ``` Looking at auth.py and database.py together, suggest improvements to the authentication flow to reduce database queries while maintaining security. ``` The model receives the full content of attached files, enabling cross-file analysis. Multi-turn conversations work like standard chatΓÇöeach message appends to the history, and the model maintains context from earlier exchanges. This enables iterative refinement: ``` User: Write a function to validate email addresses Model: [generates email validation function] User: Make it reject disposable email domains Model: [modifies function to include disposable domain blocklist] User: Extract the domain validation into a separate function we can reuse Model: [refactors code into separate functions] ``` Context limits apply to conversations. Models with 32K+ context windows handle longer conversations better, but you'll eventually exhaust available tokens. When this happens, start a new conversation or use the `/clear` command to reset context. Streaming responses display incrementally as the model generates tokens. This feels responsive but can be jarring for longer outputs. Disable streaming in settings for more deliberate, complete responses: ```json { "models": [{ "title": "Deepseek Coder", "provider": "openai", "model": "deepseek-coder:33b", "api_base": "http://localhost:11434/v1", "streaming": false }] } ``` Inserting code from chat into the editor requires the `/insert` command or clicking the insertion button. The code appears at your cursor position. For multi-file changes, use the `/apply` command to apply patches to specific files.20 min
  7. 07Custom CommandsCustom commands extend Continue's capabilities by automating repetitive tasks, defining specialized prompts, and integrating external tools into your coding workflow. Commands in Continue are reusable prompt templates with optional file targeting and code transformation logic. They appear in the command palette (`Ctrl+Shift+P` > "Continue: Custom Commands") and can be assigned keyboard shortcuts. Define commands in `config.json` under the `commands` array: ```json { "commands": [ { "name": "Explain Code", "description": "Explain selected code with technical depth", "prompt": "{{{ clipboard }}}\n\nExplain this code in technical detail, focusing on:\n1. Algorithm complexity\n2. Edge cases handled\n3. Potential bugs or security issues\n4. Improvement suggestions" } ] } ``` Commands receive variables through template syntax: - `{{{ selected }}}` - currently selected text - `{{{ clipboard }}}` - system clipboard contents - `{{{ file }}}` - current file path - `{{{ directory }}}` - current directory path The `prompt` field contains the template that gets sent to the model. The selected code or clipboard content gets substituted where you place the variable. More complex commands use the `run` field for file operations: ```json { "commands": [ { "name": "Add Type Hints", "description": "Add type hints to Python functions", "prompt": "Add thorough type hints to the following Python code:\n\n{{{ selected }}}", "run": "replace" } ] } ``` The `run` field accepts `"replace"` (replace selection with model output), `"insert"` (insert at cursor), `"apply"` (patch files with diff output), or `"query"` (display in chat without modification). Parameterized commands accept user input before execution: ```json { "commands": [ { "name": "Generate Tests", "prompt": "{{{ selected }}}\n\nGenerate pytest tests for the above code. Test coverage should include:\n1. Happy path cases\n2. Edge cases\n3. Error conditions\n4. Edge case: {{ input:edge_case_description }}", "run": "insert" } ] } ``` The `{{ input:field_name }}` syntax prompts for user input before the command runs. Multiple inputs are supported. Real-world command examples: ```json { "commands": [ { "name": "Code Review", "description": "Perform a security-focused code review", "prompt": "Review the following code for security vulnerabilities, focusing on OWASP Top 10 categories:\n\n{{{ selected }}}", "run": "replace" }, { "name": "Write Docstring", "description": "Generate Google-style docstring", "prompt": "Write a Google-style docstring for:\n```\n{{{ selected }}}\n```", "run": "replace" }, { "name": "Commit Message", "description": "Generate conventional commit message", "prompt": "Based on these changes:\n```\n{{{ clipboard }}}\n```\nGenerate a conventional commit message with:\n- Type (feat, fix, docs, refactor, test)\n- Short summary (50 chars max)\n- Detailed description\n\nFormat as:\n<type>: <short summary>\n\n<detailed description>", "run": "clipboard" } ] } ``` The `run: "clipboard"` value copies output to clipboard instead of inserting into the editorΓÇöuseful for commit messages, PR descriptions, and documentation. Assign keyboard shortcuts for frequently used commands: 1. Open VS Code keyboard shortcuts (`Ctrl+K Ctrl+S`) 2. Search for "Continue: Custom Commands" 3. Assign your preferred shortcut (e.g., `Ctrl+Shift+G` for commit message generation)25 min
  8. 08Context ProvidersContext providers extend Continue's awareness beyond the current file, enabling models to reason about entire repositories, documentation, git history, and external resources. Context providers define how Continue gathers relevant information to include with user requests. Without context providers, the model sees only the current file and selection. With providers, you can give the model repository-wide awareness. Configure context providers in `config.json`: ```json { "contextProviders": [ { "name": "open", "params": {} }, { "name": "diff", "params": {} }, { "name": "terminal", "params": {} } ] } ``` The built-in providers include: **`open`**: Indexes all files in the workspace for semantic search. The model can find related files by asking questions like "Find the file that handles user authentication" or "Which files use this database model?" **`diff`**: Provides the current git changes. When you have uncommitted modifications, the model sees what you're changing and can suggest contextually appropriate completions based on your in-progress work. **`terminal`**: Captures terminal output from the last command. Useful for debuggingΓÇöpaste error messages directly into chat and ask for solutions. **`google`**: Searches the web for documentation and examples. Useful for integrating external knowledge when local documentation is incomplete. **`search`**: Provides the results of ripgrep searches against the workspace. Ask "Where is the login function defined?" and the provider surfaces the relevant code locations. Additional providers available through Continue's plugin ecosystem: ```json { "contextProviders": [ { "name": "codebase", "params": { "n": 20, "description": "The codebase context provides a brief overview of your codebase and relevant code from your codebase. Use this when you need to understand or reference parts of your codebase." } }, { "name": "docs", "params": { "url": "https://docs.example.com" } } ] } ``` Context providers add tokens to every request, which increases latency and may exceed model context limits in large repositories. Limit provider usage with `n` parameters to control how many results are retrieved. The `@` syntax references context providers directly in chat. Type `@` followed by the provider name to activate it for the current message: ``` @codebase How is authentication handled across the codebase? ``` The model receives indexed information from the provider, enabling repository-aware responses without manual file attachment. Workspace indexing builds a searchable representation of your codebase. Indexing happens on first use and updates incrementally as files change. The `codebase` provider uses embeddings to find semantically similar codeΓÇöwhen you ask about "database connection pooling," it surfaces files discussing connection pools even if those exact words don't appear. Debugging context providers: 1. Check provider configuration syntax in `config.json` 2. Verify workspace is indexed by asking simple questions like "What files are in this project?" 3. Monitor token usage in the UI to see how much context providers contribute 4. Disable providers selectively to isolate issues20 min
  9. 09Code Review AutomationLocal AI can automate code review tasks like bug detection, style checking, and security scanning by applying consistent criteria across every diff without cloud dependencies. Automated code review with local AI processes pull requests and diffs through configurable prompts that detect specific issue categories. The workflow integrates with git hosting through webhooks or runs manually through Continue's chat interface. Create a review command in `config.json`: ```json { "commands": [ { "name": "Code Review", "description": "Automated security and quality review", "prompt": "You are performing a code review. Analyze the following code changes for:\n\n1. **Security**: SQL injection, XSS, authentication bypass, sensitive data exposure\n2. **Correctness**: Logic errors, null handling, race conditions, edge cases\n3. **Performance**: N+1 queries, unnecessary iterations, missing indexes\n4. **Style**: Consistency with project conventions, naming, documentation\n\nFor each issue found, provide:\n- File and line number\n- Severity (critical, major, minor)\n- Description\n- Suggested fix\n\nCode changes:\n```\n{{{ clipboard }}}\n```\n\nOutput a summary table first, then detailed findings.", "run": "clipboard" } ] } ``` Running this command on a diff: 1. Copy the diff output with `git diff` 2. Execute the Code Review command 3. Receive formatted findings in clipboard 4. Paste into PR comments or review tools More sophisticated review involves checking against project-specific rules. Create a `.review-rules.md` file in your repository: ```markdown # Code Review Rules ## Required Checks - All public functions have type hints - No `print` statements in production code - Error handling uses specific exception types - Database queries use parameterized queries - Sensitive config values come from environment variables ## Naming Conventions - Functions: snake_case - Classes: PascalCase - Constants: UPPER_SNAKE_CASE - Tables: plural snake_case ## Security Requirements - Input validation on all API endpoints - Authentication required for mutation operations - Rate limiting on public endpoints ``` Reference these rules in your review prompt: ```json { "commands": [ { "name": "Review Against Rules", "prompt": "Review the following changes against our project rules:\n\nRules from .review-rules.md:\n{{{{ file:.review-rules.md }}}}\n\nChanges to review:\n{{{ clipboard }}}\n\nReport any violations with file, line, and suggested fix." } ] } ``` The `{{{{ file:path }}}` syntax embeds file contents into the prompt. Automated review scaling strategies: 1. **Split large diffs**: Review files in batches to avoid context overflow 2. **Prioritize by risk**: Review security-sensitive modules first 3. **Track recurring issues**: Build a database of patterns to catch in future reviews 4. **CI integration**: Run review on every PR automatically CI integration example using a shell script: ```bash #!/bin/bash # .github/scripts/ai-review.sh git diff --cached > /tmp/changes.diff CONTENT=$(cat /tmp/changes.diff) curl -X POST http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-coder:33b", "messages": [ {"role": "system", "content": "You review code for security issues."}, {"role": "user", "content": "Review these changes:\n'"$CONTENT"'"} ] }' > /tmp/review.md cat /tmp/review.md | jq -r '.choices[0].message.content' > "$GITHUB_STEP_SUMMARY" ``` This script runs as a GitHub Actions step, posting review results to the PR summary.25 min
  10. 10PR Review with AIAI PR review works best for pattern-based checks on focused diffs, not thorough architectural assessment.15 min
  11. 11Debugging AssistanceDebugging with AI works through rapid iterationΓÇöprovide context, receive hypotheses, verify, repeatΓÇörather than expecting single-shot solutions.15 min
  12. 12Error ExplanationError explanation quality depends directly on context completenessΓÇöstack traces alone rarely suffice for accurate diagnosis.15 min
  13. 13Repo-Level RAGCode RAG requires indexing structure and relationships, not just textΓÇöthe chunking strategy matters more than the embedding model choice.15 min
  14. 14Indexing CodebaseIndexing is not a one-time operationΓÇödesign refresh strategies and quality testing into your system from the start.15 min
  15. 15Custom Slash CommandsSlash commands encode team-specific workflows into reusable, documented actionsΓÇötransforming tribal knowledge into automated process.15 min
  16. 16Multi-Model SetupMulti-model routing optimizes cost-quality-latency tradeoffs by matching task requirements to appropriate model capabilities.15 min
  17. 17Performance OptimizationPerformance optimization requires measurement before actionΓÇöprofile to identify actual bottlenecks rather than guessing.15 min
  18. 18Code Assistant ProjectA production code assistant requires ongoing maintenanceΓÇödesign for evolution, not permanence.15 min