HOW-TO · SUP

How to build a code generation agent with local models

advanced35 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Local model running (Ollama), Python environment

What this does

Building a code generation agent with local models enables fully offline code writing and editing using open-weight models running on local hardware. The agent accepts natural language descriptions of desired code changes, reads the existing codebase, generates new code or modifications, applies diff-based edits, and verifies the result by running tests. The entire pipeline operates without cloud API calls, keeping proprietary code on-premises.

Steps

Configure the model: ensure the code model is pulled and test it: ollama run codellama:13b "Write a Python function that calculates factorial.". Build the agent loop using the Ollama Python client: import ollama. Define the system prompt for code generation with strict output formatting: system = "You are a code generation assistant. Output ONLY valid code in the requested language. Do not include explanations. Use the following format:\n\``\n// code here\n```". Implement the generate_codefunction:def generate_code(spec: str, language: str, existing_code: str = "") -> str: response = ollama.chat(model="codellama:13b", messages=[{"role": "system", "content": system}, {"role": "user", "content": f"Specification: {spec}\n\nExisting code context:\n{existing_code}\n\nGenerate the code:"}]); return extract_code_block(response["message"]["content"]). The extract_code_blockfunction parses the model output using regex:re.search(r"```(\w+)?\n(.*?)\n```", text, re.DOTALL). Add a file editing capability: def apply_edit(filepath: str, old_code: str, new_code: str): with open(filepath) as f: content = f.read(); if old_code not in content: raise ValueError("Old code not found — possible hallucination"); content = content.replace(old_code, new_code); with open(filepath, "w") as f: f.write(content). Implement a validation step that runs tests after code generation: subprocess.run(["python", "-m", "pytest", "-x", "--tb=short"], capture_output=True, text=True). If tests fail, feed the error output back to the model for correction in a retry loop with a maximum of 3 attempts. Add a review step that presents the diff to the user before applying: use difflib.unified_diffto generate a human-readable diff. Wrap everything in aCodeAgentclass with arun(spec, target_files)` method that orchestrates generation, diff display, approval, application, and testing.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Provide a specification: "Add a function is_palindrome(s) to utils.py that returns True if the string is a palindrome." The agent should generate correct code, show the diff, and after approval, apply it and run tests. Verify generated code is syntactically correct by running python -m py_compile <file>. Test error recovery: intentionally break a test and verify the agent retries with the error context. Run a multi-file change: "Add an API endpoint and its corresponding service function" and verify both files are modified correctly. Check that the agent never modifies files without showing a diff first.

Common failures

Model generates code in wrong language: Strengthen the system prompt with explicit language constraints; include the target language in the specification. Generated code uses unavailable imports: Pre-scan the project's requirements.txt or pyproject.toml and include available libraries in the system prompt context. Old code string not found during apply_edit: Model may modify whitespace or formatting—use fuzzy matching with ast.parse or normalize whitespace before comparison. Model output not parseable as code: Add a retry with temperature=0 and a stricter system prompt; validate syntax before applying. Agent modifies too many files at once: Limit the agent to modifying max 3 files per run to reduce risk of cascading errors.

  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • build-langgraph-agent-scratch
  • setup-agent-tool-use-function-calling
  • implement-guardrails-ai-agents