intermediate5 sectionsUpdated Jun 15, 2025

ReAct Pattern

Reasoning and Acting — the agent interleaves chain-of-thought reasoning with concrete actions in an iterative loop, grounding each decision in observed evidence before proceeding.

Algorithm / Pseudocode
function ReAct(question, tools, max_steps):
    context = [system_prompt, question]

    for step in 1..max_steps:
        // Phase 1: Reason about current state
        thought = LLM(context)
        context.append(thought)

        // Phase 2: Decide on action
        action = LLM.select_tool(thought, tools)

        if action == FINAL_ANSWER:
            return thought.answer

        // Phase 3: Execute and observe
        observation = execute_tool(action.tool, action.params)
        context.append(observation)

    // Fallback if max steps reached
    return LLM.summarize(context)

When to Use

  • Tasks requiring real-time information retrieval or fact-checking
  • Multi-step research where each step depends on prior findings
  • Problems that benefit from transparent, auditable reasoning traces
  • Scenarios where the agent must interact with external APIs, databases, or search engines
  • When you need to ground the LLM's responses in verified data rather than parametric memory

When NOT to Use

  • Simple, self-contained reasoning tasks that don't need external data
  • Latency-critical applications where multiple LLM calls are unacceptable
  • Tasks where the entire answer exists within the model's training data
  • High-throughput batch processing where cost per query must be minimized
  • When you need deterministic outputs — ReAct introduces variability through tool interactions

Core Mechanism

The ReAct pattern (Reasoning + Acting) was introduced by Yao et al. in 2022 and has become the foundational loop for most modern agent frameworks. The pattern fuses two capabilities that are individually powerful but become transformative when combined: chain-of-thought reasoning (the ability to decompose problems into intermediate steps) and action execution (the ability to interact with external environments and tools).

A pure chain-of-thought approach reasons internally but has no way to verify its assumptions against reality. Conversely, a pure action-based system takes actions without articulating why, making it brittle and hard to debug. ReAct solves both problems by structuring the agent into a three-phase loop:

  1. Thought: The agent reasons about the current state, evaluates what information is missing, and formulates a plan for the next step. This reasoning trace is emitted as text, making the agent's decision-making transparent and auditable.
  2. Action: Based on the thought, the agent selects and invokes a tool or function — a web search, a database query, a calculation, an API call — with specific parameters.
  3. Observation: The result of the action is returned to the agent as new context. The agent incorporates this evidence into its working memory and begins the next iteration.

This loop continues until the agent determines it has sufficient information to produce a final answer, or until a maximum iteration limit is reached. The explicit reasoning trace at each step distinguishes ReAct from simpler tool-use patterns and makes it far easier to diagnose failures.

Step-by-Step Execution Flow

Consider a concrete example: a user asks "What was the GDP growth rate of the country that won the 2024 Cricket World Cup?" A ReAct agent would proceed as follows:

  1. Thought 1: "I need to find which country won the 2024 Cricket World Cup. I'll search for this information."
  2. Action 1: search("2024 Cricket World Cup winner")
  3. Observation 1: "India won the 2024 ICC T20 Cricket World Cup."
  4. Thought 2: "Now I know India won. I need to find India's GDP growth rate. I should search for the most recent data."
  5. Action 2: search("India GDP growth rate 2024")
  6. Observation 2: "India's GDP growth rate for FY2024 was approximately 8.2%."
  7. Thought 3: "I now have both pieces of information. India won the 2024 Cricket World Cup and their GDP growth rate was approximately 8.2%. I can formulate the final answer."
  8. Final Answer: "India won the 2024 Cricket World Cup. India's GDP growth rate for 2024 was approximately 8.2%."

Each iteration builds on the previous observation. The reasoning trace makes it clear why each action was taken, and the observations ground the reasoning in real-world data rather than relying on the model's parametric memory alone.

Implementation Approaches

Most modern agent frameworks provide first-class support for the ReAct loop. The implementation details vary, but the conceptual structure remains the same.

LangGraph implements ReAct as a cyclic state graph. You define an AgentState that carries messages, then wire a call_model node (which produces a thought and optional tool call) to a tool_executor node. A conditional edge checks whether the model's response contains a tool call; if so, it routes back to the tool node, and then back to the model. If there is no tool call, the loop terminates and the final response is returned. LangGraph's explicit graph structure gives you full control over the routing logic, making it straightforward to add human-in-the-loop gates, retry logic, or branching.

OpenAI Agents SDK and Claude Agent SDK implement the ReAct loop natively at the API level through function calling. You register tools with their JSON schemas, and the model alternates between producing reasoning and emitting structured tool calls. The SDK manages the loop internally — you receive tool calls, execute them, and return results until the model emits a final text response. This approach requires less boilerplate but offers less control over the loop structure.

Smolagents takes a code-centric approach: the agent writes and executes Python code rather than calling structured tool schemas. This makes the action space more flexible (the agent can compose tools in a single step) but harder to sandbox safely.

ReAct vs. Pure Chain-of-Thought

Understanding when to use ReAct versus pure chain-of-thought (CoT) reasoning is essential for designing effective agents:

DimensionPure CoTReAct
External data accessNone — relies on parametric knowledgeYes — retrieves real-time information
Factual groundingProne to hallucinationGrounded in observations
TransparencyReasoning is visible but unverifiedReasoning is visible and evidence-backed
LatencySingle inference passMultiple inference passes (higher latency)
CostLower (single call)Higher (multiple calls + tool execution)
Best forLogic, math, self-contained reasoningResearch, fact-checking, multi-step retrieval

In practice, many systems use CoT for planning and ReAct for execution — the agent reasons about the overall approach (CoT) and then uses the ReAct loop to carry out each step with tool access.

Limitations and Mitigations

The ReAct pattern is powerful but not without challenges:

  • Iteration runaway: An agent can enter infinite loops if its observations don't converge toward a final answer. Mitigation: Enforce a maximum step count (typically 5-15 iterations) and include a fallback that synthesizes the best available answer when the limit is reached.
  • Compounding errors: Early mistakes in reasoning or tool selection can propagate through subsequent steps. Mitigation: Implement self-reflection checkpoints where the agent reviews its progress every N steps and can course-correct.
  • Context window pressure: Each thought-action-observation cycle adds tokens to the context. For long tasks, the agent may exceed the context window. Mitigation: Summarize or truncate earlier observations, use a sliding window, or store intermediate results in external memory.
  • Tool selection errors: The agent may choose inappropriate tools or format parameters incorrectly. Mitigation: Provide clear tool descriptions, examples of correct usage, and validate tool inputs before execution.
  • Latency and cost: Multiple LLM calls increase both latency and API costs. Mitigation: Use smaller, faster models for simple reasoning steps and reserve larger models for complex decisions. Cache tool results when possible.

Explore Related Content