Function Calling for Local Models
Learn function calling for local models through RunLocalAI's practical lens: function calling, tools, json schema and ollama, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.
- B015
Course I015: Function Calling for Local Models
Why this course exists
Function calling transforms language models from text generators into interactive agents capable of executing real actions. When a model can call functions—search databases, execute shell commands, query APIs, or manipulate files—it becomes a practical tool rather than a sophisticated autocomplete. Local models running on Ollama or vLLM now support structured function calling, enabling fully private, latency-free agentic pipelines without cloud dependencies.
This course covers the complete implementation stack: JSON Schema definitions for tools, client-side parsing of function calls, execution orchestration, and multi-tool coordination patterns. The material focuses on production patterns and failure modes encountered in real deployments.
What you will know after
- Define JSON Schema tool definitions that models can parse correctly
- Implement function calling with Ollama's structured output mode
- Configure vLLM served models with tool use enabled
- Build single-step tool execution pipelines with proper error handling
- Orchestrate multi-step agent loops with state management
- Handle parallel tool calls and aggregate results
- Design tool outputs that models consume effectively
- Debug function calling failures with concrete diagnostic techniques
- Implement streaming function calls for responsive interfaces
- Build retry logic and fallback patterns for production reliability
- 01Function Calling OverviewFunction calling is a client-side parsing pattern—the model outputs structured data, and the client decides whether to execute and how to respond.15 min
- 02JSON Schema for ToolsSchema descriptions guide model decisions—invest time writing clear, specific descriptions for parameters that could be confused.15 min
- 03Defining Tool FunctionsThe registry pattern separates tool definitions from implementations, allowing you to modify code without changing the schema sent to the model.20 min
- 04Ollama Function CallingOllama handles tool formatting internally—pass JSON Schema definitions and parse the `tool_calls` field from responses.15 min
- 05vLLM Function CallingvLLM requires `--tool-call-format` during serving—choose the format matching your model's fine-tuning.20 min
- 06Single Tool ExecutionSingle tool execution requires careful message formatting—append both the tool call and its result before the next model call.20 min
- 07Multi-Tool OrchestrationMulti-tool orchestration uses iteration loops with message accumulation—each tool result informs subsequent model decisions.20 min
- 08Parallel Tool CallsParallel tool calls require extracting all calls before execution and aggregating all results for the model in a single response cycle.20 min
- 09Tool Output HandlingConsistent output structure with status, data, and error fields helps models parse results reliably across different tools.20 min
- 10Error RecoveryError recovery is not about preventing all failures—it is about handling them gracefully while preserving system integrity and providing actionable feedback.20 min
- 11Retry LogicRetry logic with exponential backoff and circuit breakers prevents cascading failures while giving transient issues time to resolve.25 min
- 12Streaming with ToolsStreaming tool calls require buffering partial model outputs and asynchronously executing tools without blocking the token stream to the client.25 min
- 13LangChain Tools IntegrationLangChain provides standardized tool schemas and binding mechanisms that work with Ollama, enabling use of the broader LangChain tool ecosystem with local models.25 min
- 14Production MonitoringProduction monitoring combines metrics for dashboards, structured logs for debugging, and health endpoints for orchestration—each serving different operational needs.25 min
- 15Rate LimitingRate limiting protects infrastructure by enforcing per-user quotas while token bucket algorithms handle both sustained rates and burst allowances gracefully.25 min
- 16Tool SecurityTool security combines input validation, least-privilege execution, allowlists, and audit logging to prevent function calling from becoming an attack vector.25 min
- 17Testing Function CallingTesting function calling requires unit tests for tools, mocked integration tests for model behavior, and real integration tests with a running model to catch compatibility issues.25 min
- 18Tool Ecosystem ProjectBuilding a complete function calling system requires integrating tools, error recovery, rate limiting, monitoring, and tests into a cohesive architecture where each component supports the others.35 min