ReAct
ReAct (Reasoning + Acting) is a prompting technique that interleaves chain-of-thought reasoning with tool-use actions. In practice, the model outputs a reasoning trace (e.g., 'I need to find the current weather in Tokyo'), then calls a tool (e.g., a weather API), receives the result, and continues reasoning. This loop lets the model gather external information and correct its own mistakes. Operators encounter ReAct in agent frameworks like LangChain or when using models fine-tuned for function calling (e.g., Llama 3.1 8B Instruct). The pattern matters because it turns a static text generator into an interactive agent that can query databases, run code, or browse the web.
Deeper dive
ReAct was introduced by Yao et al. (2022) to address the limitations of pure chain-of-thought reasoning, which cannot access external knowledge. The core mechanism is a loop: the model generates a thought (e.g., 'I need to compute 2+3'), then an action (e.g., a calculator call), then observes the result, and repeats until it produces a final answer. This is typically implemented via structured output formats (e.g., JSON or XML tags) that the runtime parses to invoke tools. Variants include ReAct with memory (e.g., storing past observations) and ReAct with planning (e.g., generating a multi-step plan before acting). For local operators, ReAct is relevant because it requires the model to follow a strict output format and handle tool-calling tokens—capabilities that vary by model and quantization level. A 7B model at Q4 may struggle with multi-step ReAct due to context window limits or instruction-following degradation.
Practical example
An operator runs a local agent using Llama 3.1 8B (Q4_K_M, ~5 GB VRAM) via Ollama with a custom tool to query a local SQLite database. The prompt instructs the model to output thoughts in tags and tool calls in tags. When asked 'What was the total sales in Q1?', the model generates: I need to run a SQL query to sum sales for Q1.query_db with parameters 'SELECT SUM(sales) FROM revenue WHERE quarter='Q1''. The runtime executes the query, returns the result, and the model continues to produce the final answer. If the model fails to follow the format, the runtime logs an error and the operator may need to adjust the prompt or switch to a larger model.
Workflow example
In LangChain, an operator defines a ReAct agent by creating a tool list and an LLM instance. For example, using Ollama with llama3.1:8b, the code agent = create_react_agent(llm, tools, prompt) sets up the ReAct loop. When agent.invoke({'input': 'What is the weather in Paris?'}) runs, the runtime sends the prompt with the ReAct format, parses the model's output for tool calls, executes them, and feeds observations back into the model. The operator monitors the loop via LangChain's debug logs, which show each thought-action-observation step. If the model loops indefinitely, the operator can set a max_iterations parameter (e.g., 10) to force a final answer.
Reviewed by Fredoline Eruo. See our editorial policy.