Function Calling Overview — Function Calling for Local Models (Chapter 1)

Function calling enables a language model to request execution of specific actions by outputting structured JSON that maps to code functions. Instead of generating freeform text, the model produces a JSON object with a function name and arguments, which the client application then executes and returns results.

Three architectural patterns exist for function calling. The first pattern uses the model to output JSON directly in the response text, relying on parsing to extract the call. This approach breaks frequently because models struggle to produce valid JSON without constraints. The second pattern employs guided decoding or constrained sampling to force JSON output matching a schema—Ollama uses this with guided_decoding and grammar constraints. The third pattern uses tool-use message formats where the system prompt defines available tools and the model generates calls in a specific format—vLLM supports this via chat templates with tool definitions.

Modern function calling implementations follow a request-response cycle. The client sends a message history including system instructions defining available tools, the user query, and any prior tool results. The model generates a tool call message containing the function name and arguments. The client parses this call, executes the function, and appends the result to the message history. The model then generates a natural language response incorporating the tool output.

The critical difference between local and API-based function calling lies in control and latency. Local models eliminate per-call pricing, enable data sovereignty, and reduce network round-trips. However, local models may produce malformed calls more frequently than frontier models, requiring reliable client-side validation and retry mechanisms.

Function calling supports several use cases: database query tools that let models answer questions about structured data, file system operations for document processing, API clients for external services, code execution environments for computation, and web search tools for current information.