What this does

Building a code generation agent with local models enables fully offline code writing and editing using open-weight models running on local hardware. The agent accepts natural language descriptions of desired code changes, reads the existing codebase, generates new code or modifications, applies diff-based edits, and verifies the result by running tests. The entire pipeline operates without cloud API calls, keeping proprietary code on-premises.

Steps

Configure the model: ensure the code model is pulled and test it: ollama run codellama:13b "Write a Python function that calculates factorial.". Build the agent loop using the Ollama Python client: import ollama. Define the system prompt for code generation with strict output formatting: system = "You are a code generation assistant. Output ONLY valid code in the requested language. Do not include explanations. Use the following format:\n\``\n// code here\n```". Implement the generate_codefunction:def generate_code(spec: str, language: str, existing_code: str = "") -> str: response = ollama.chat(model="codellama:13b", messages=[{"role": "system", "content": system}, {"role": "user", "content": f"Specification: {spec}\n\nExisting code context:\n{existing_code}\n\nGenerate the code:"}]); return extract_code_block(response["message"]["content"]). The extract_code_blockfunction parses the model output using regex:re.search(r"```(\w+)?\n(.*?)\n```", text, re.DOTALL). Add a file editing capability: def apply_edit(filepath: str, old_code: str, new_code: str): with open(filepath) as f: content = f.read(); if old_code not in content: raise ValueError("Old code not found — possible hallucination"); content = content.replace(old_code, new_code); with open(filepath, "w") as f: f.write(content). Implement a validation step that runs tests after code generation: subprocess.run(["python", "-m", "pytest", "-x", "--tb=short"], capture_output=True, text=True). If tests fail, feed the error output back to the model for correction in a retry loop with a maximum of 3 attempts. Add a review step that presents the diff to the user before applying: use difflib.unified_diffto generate a human-readable diff. Wrap everything in aCodeAgentclass with arun(spec, target_files)` method that orchestrates generation, diff display, approval, application, and testing.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.
Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.
Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Provide a specification: "Add a function is_palindrome(s) to utils.py that returns True if the string is a palindrome." The agent should generate correct code, show the diff, and after approval, apply it and run tests. Verify generated code is syntactically correct by running python -m py_compile <file>. Test error recovery: intentionally break a test and verify the agent retries with the error context. Run a multi-file change: "Add an API endpoint and its corresponding service function" and verify both files are modified correctly. Check that the agent never modifies files without showing a diff first.

Common failures

Model generates code in wrong language: Strengthen the system prompt with explicit language constraints; include the target language in the specification. Generated code uses unavailable imports: Pre-scan the project's requirements.txt or pyproject.toml and include available libraries in the system prompt context. Old code string not found during apply_edit: Model may modify whitespace or formatting—use fuzzy matching with ast.parse or normalize whitespace before comparison. Model output not parseable as code: Add a retry with temperature=0 and a stricter system prompt; validate syntax before applying. Agent modifies too many files at once: Limit the agent to modifying max 3 files per run to reduce risk of cascading errors.

Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

build-langgraph-agent-scratch
setup-agent-tool-use-function-calling
implement-guardrails-ai-agents