COURSE · BLD · I015

Function Calling for Local Models

Learn function calling for local models through RunLocalAI's practical lens: function calling, tools, json schema and ollama, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

18 chapters10hBuilder trackBy Fredoline Eruo
PREREQUISITES
  • B015

Course I015: Function Calling for Local Models

Why this course exists

Function calling transforms language models from text generators into interactive agents capable of executing real actions. When a model can call functions—search databases, execute shell commands, query APIs, or manipulate files—it becomes a practical tool rather than a sophisticated autocomplete. Local models running on Ollama or vLLM now support structured function calling, enabling fully private, latency-free agentic pipelines without cloud dependencies.

This course covers the complete implementation stack: JSON Schema definitions for tools, client-side parsing of function calls, execution orchestration, and multi-tool coordination patterns. The material focuses on production patterns and failure modes encountered in real deployments.

What you will know after

  • Define JSON Schema tool definitions that models can parse correctly
  • Implement function calling with Ollama's structured output mode
  • Configure vLLM served models with tool use enabled
  • Build single-step tool execution pipelines with proper error handling
  • Orchestrate multi-step agent loops with state management
  • Handle parallel tool calls and aggregate results
  • Design tool outputs that models consume effectively
  • Debug function calling failures with concrete diagnostic techniques
  • Implement streaming function calls for responsive interfaces
  • Build retry logic and fallback patterns for production reliability
CHAPTERS
  1. 01Function Calling OverviewFunction calling is a client-side parsing pattern—the model outputs structured data, and the client decides whether to execute and how to respond.15 min
  2. 02JSON Schema for ToolsSchema descriptions guide model decisions—invest time writing clear, specific descriptions for parameters that could be confused.15 min
  3. 03Defining Tool FunctionsThe registry pattern separates tool definitions from implementations, allowing you to modify code without changing the schema sent to the model.20 min
  4. 04Ollama Function CallingOllama handles tool formatting internally—pass JSON Schema definitions and parse the `tool_calls` field from responses.15 min
  5. 05vLLM Function CallingvLLM requires `--tool-call-format` during serving—choose the format matching your model's fine-tuning.20 min
  6. 06Single Tool ExecutionSingle tool execution requires careful message formatting—append both the tool call and its result before the next model call.20 min
  7. 07Multi-Tool OrchestrationMulti-tool orchestration uses iteration loops with message accumulation—each tool result informs subsequent model decisions.20 min
  8. 08Parallel Tool CallsParallel tool calls require extracting all calls before execution and aggregating all results for the model in a single response cycle.20 min
  9. 09Tool Output HandlingConsistent output structure with status, data, and error fields helps models parse results reliably across different tools.20 min
  10. 10Error RecoveryError recovery is not about preventing all failures—it is about handling them gracefully while preserving system integrity and providing actionable feedback.20 min
  11. 11Retry LogicRetry logic with exponential backoff and circuit breakers prevents cascading failures while giving transient issues time to resolve.25 min
  12. 12Streaming with ToolsStreaming tool calls require buffering partial model outputs and asynchronously executing tools without blocking the token stream to the client.25 min
  13. 13LangChain Tools IntegrationLangChain provides standardized tool schemas and binding mechanisms that work with Ollama, enabling use of the broader LangChain tool ecosystem with local models.25 min
  14. 14Production MonitoringProduction monitoring combines metrics for dashboards, structured logs for debugging, and health endpoints for orchestration—each serving different operational needs.25 min
  15. 15Rate LimitingRate limiting protects infrastructure by enforcing per-user quotas while token bucket algorithms handle both sustained rates and burst allowances gracefully.25 min
  16. 16Tool SecurityTool security combines input validation, least-privilege execution, allowlists, and audit logging to prevent function calling from becoming an attack vector.25 min
  17. 17Testing Function CallingTesting function calling requires unit tests for tools, mocked integration tests for model behavior, and real integration tests with a running model to catch compatibility issues.25 min
  18. 18Tool Ecosystem ProjectBuilding a complete function calling system requires integrating tools, error recovery, rate limiting, monitoring, and tests into a cohesive architecture where each component supports the others.35 min