Planning & Reasoning
Chain of Thought, ReAct, Tree of Thought, and other reasoning strategies agents use to solve problems.
Why Planning and Reasoning Matter
Raw LLMs generate text token by token, which can lead to shallow, incorrect, or incomplete answers for complex problems. Planning and reasoning strategies are techniques that structure how the model thinks, allowing it to break down complex problems, consider alternatives, and arrive at more accurate solutions.
Think of it this way: if you ask someone to multiply 37 by 84 in their head, they might get it wrong. But if they write out the steps on paper, they almost always get it right. Reasoning strategies give LLMs that "paper" — structured thinking processes that improve reliability and accuracy.
For agents specifically, reasoning strategies determine how the agent decides what to do next. Should it call a tool immediately, or think through the problem first? Should it create a full plan upfront, or figure things out one step at a time? The choice of reasoning strategy has a major impact on agent performance, cost, and latency.
Chain of Thought (CoT)
Chain of Thought is the foundational reasoning technique, introduced by Wei et al. (2022). Instead of jumping directly to an answer, the model generates intermediate reasoning steps that lead to the conclusion. This simple idea dramatically improves performance on math, logic, commonsense reasoning, and multi-step problems.
CoT can be triggered in two ways:
- Few-shot CoT — Include examples with step-by-step reasoning in the prompt. The model follows the demonstrated reasoning pattern through in-context learning.
- Zero-shot CoT — Introduced separately by Kojima et al. (2022), you simply append "Let's think step by step" to the prompt. Remarkably effective even without examples.
# Without CoT
Q: "A store has 15 apples. 8 are sold in the morning,
and 3 more are delivered. How many apples remain?"
A: "10" (correct, but no explanation, fragile for harder problems)
# With CoT
Q: "A store has 15 apples. 8 are sold in the morning,
and 3 more are delivered. How many apples remain?
Let's think step by step."
A: "Starting apples: 15
Sold in morning: -8, so 15 - 8 = 7
New delivery: +3, so 7 + 3 = 10
The store has 10 apples remaining."
For agents, CoT is used within the reasoning phase of the agent loop. Before deciding which tool to call, the agent reasons through the problem: "The user wants to know their account balance. I need to first authenticate them, then query the accounts API. Let me start by checking if I have their user ID."
Note: CoT is most effective with larger models (roughly 100B+ parameters). Smaller models may not benefit significantly from explicit reasoning steps.
ReAct: Reasoning + Acting
ReAct (Reason + Act), introduced by Yao et al. (2022), is the most widely used reasoning pattern for AI agents. It interleaves reasoning traces (thinking about what to do) with actions (tool calls) in a tight loop. The key insight is that reasoning helps the agent decide which action to take, and action results inform the next round of reasoning.
A ReAct step has three parts:
- Thought — The agent reasons about what it knows and what it needs to do next.
- Action — The agent calls a tool based on its reasoning.
- Observation — The tool result is fed back as an observation, triggering the next thought.
User: "Who won more Grand Slams, Federer or Nadal?"
Thought 1: I need to look up Grand Slam wins for both players.
Let me search for Federer first.
Action 1: search("Roger Federer Grand Slam titles total")
Observation 1: Roger Federer won 20 Grand Slam singles titles.
Thought 2: Now I need Nadal's count to compare.
Action 2: search("Rafael Nadal Grand Slam titles total")
Observation 2: Rafael Nadal won 22 Grand Slam singles titles.
Thought 3: Nadal has 22 vs Federer's 20. I can now answer.
Action 3: final_answer("Rafael Nadal won more Grand Slams (22)
compared to Roger Federer (20).")
ReAct's strength is its flexibility. The agent adapts its plan based on what it discovers — if the first search returns ambiguous results, it can refine its query. If an API call fails, it can try an alternative approach. This makes ReAct the default pattern for most agent frameworks, including LangGraph, OpenAI Agents SDK, and Claude Agent SDK. (Note: some frameworks like Smolagents default to code-based actions instead of JSON tool calls, but use the same Thought-Action-Observation structure.)
Tree of Thoughts (ToT)
Tree of Thoughts (Yao et al., 2023) extends Chain of Thought by exploring multiple reasoning paths using tree search (BFS or DFS), like a chess player considering several possible moves before choosing the best one. Instead of a single linear chain, the model generates a tree of potential solutions and evaluates each branch.
The process works in three phases:
- Generation — At each step, generate multiple possible next thoughts or actions (branching).
- Evaluation — Score each branch on how promising it looks (using the LLM itself as an evaluator — the key innovation that makes ToT practical).
- Search — Use breadth-first search (BFS) or depth-first search (DFS) to explore the most promising branches and prune dead ends.
A critical advantage of ToT over linear CoT is backtracking — when a branch hits a dead end, the search can return to a prior state and explore alternatives. This is impossible in standard chain-of-thought reasoning.
ToT is most valuable for problems where:
- The solution space is large and the best path is not obvious.
- Backtracking is important — some approaches will hit dead ends.
- Creative or strategic thinking is needed (game playing, puzzle solving, complex code architecture decisions).
The trade-off is cost and latency: ToT requires many more LLM calls than linear CoT or ReAct. It is best reserved for high-stakes decisions where the extra computation is justified.
Plan-and-Execute
The Plan-and-Execute pattern separates planning from execution into two distinct phases. First, a planner agent creates a complete step-by-step plan. Then, an executor agent carries out each step, reporting results back to the planner, which can revise the plan if needed.
# Phase 1: Planning
Planner: "To analyze Q3 sales data and create a report, I need to:
1. Query the sales database for Q3 transactions
2. Aggregate by region and product category
3. Calculate quarter-over-quarter growth rates
4. Generate visualizations (bar chart + trend line)
5. Write the executive summary
6. Format as PDF and email to the user"
# Phase 2: Execution
Executor: [Executes step 1] -> Got 15,243 transactions
Planner: Plan still valid, proceed to step 2.
Executor: [Executes step 2] -> Aggregated into 4 regions x 8 categories
Planner: Step 3 needs adjustment - also add year-over-year comparison.
Executor: [Executes revised step 3] -> ...
This pattern shines for complex, well-defined tasks where upfront planning reduces wasted effort. It is commonly used in coding agents (plan the full set of file changes before writing code), research agents (plan the research methodology before starting), and workflow automation (plan the entire pipeline before executing).
The main risk is plan fragility — the plan may be based on assumptions that prove wrong during execution. Mitigate this by having the planner re-evaluate after each step and revise as needed (the "re-planning" variant).
Reflection and Self-Critique
Reflection is a meta-reasoning strategy where the agent evaluates its own output and iterates on it. After generating a response or completing an action, the agent asks itself: "Is this correct? Is this complete? Could this be better?" If the answer is no, it revises its work.
Common reflection patterns include:
Self-Ask (Press et al., 2022) — The agent decomposes a complex question into simpler sub-questions, answers each one, then synthesizes the final answer. "To answer 'Is X a good investment?', I first need to ask: What is the company's revenue trend? What is the competitive landscape? What do analysts say?"
Reflexion (Shinn et al., 2023) — A verbal reinforcement learning framework where the agent re-attempts the same task iteratively. After a failed attempt, a self-reflection module generates a verbal critique analyzing what went wrong. This critique is stored in an episodic memory buffer and injected as context for the next attempt, enabling the agent to learn from failures without weight updates.
Critic-generator pattern — One LLM call generates a solution; a second call critiques it; a third call produces an improved version. This can be done within a single agent (different prompt roles) or across multiple agents.
# Reflection loop
draft = agent.generate(task)
for i in range(max_revisions):
critique = agent.reflect(draft, task)
if is_satisfactory(critique):
break
draft = agent.revise(draft, critique)
return draft
Reflection adds latency and cost (each revision is another LLM call), but for tasks where quality matters more than speed — writing, code review, analysis, and decision-making — it produces significantly better results. Many production agents combine ReAct for action selection with reflection for output quality.
Comparing Reasoning Strategies
| Approach | Best For | Tool Use | Backtracking | Cost |
|---|---|---|---|---|
| Chain of Thought (CoT) | Math, logic, multi-step reasoning | No | No | Low (1 LLM call) |
| ReAct | Tool-augmented tasks, dynamic problem solving | Yes | No | Medium (multiple LLM + tool calls) |
| Tree of Thoughts (ToT) | Creative/strategic problems, puzzles | Optional | Yes | High (many parallel LLM calls) |
| Plan-and-Execute | Complex well-defined tasks, workflows | Yes | Via re-planning | Medium-High (planner + executor calls) |
| Reflexion | Tasks requiring iterative improvement | Optional | Yes (retry-based) | High (multiple full attempts) |
Key Takeaways
- 1Planning and reasoning strategies structure how agents think, improving accuracy and reliability on complex tasks.
- 2Chain of Thought (CoT), introduced by Wei et al. (2022), generates intermediate reasoning steps — most effective in large models (100B+).
- 3ReAct interleaves reasoning and acting in a loop, making it the default pattern for most agent frameworks.
- 4Tree of Thoughts explores multiple reasoning paths via tree search (BFS/DFS)
- 5Plan-and-Execute separates planning from execution, reducing wasted effort on complex, well-defined tasks.
- 6Reflection and Reflexion allow agents to evaluate, critique, and iteratively improve their own outputs.