Tool-Augmented Generation
Agents extend their capabilities by iteratively selecting and invoking external tools — search engines, calculators, APIs, databases — integrating tool outputs into their reasoning to produce grounded, accurate responses.
function ToolAugmentedGeneration(task, tools):
messages = [system_prompt, task]
while true:
// LLM decides: respond or use a tool
response = LLM(messages, tools.schemas)
if response.has_tool_calls():
for tool_call in response.tool_calls:
// Validate inputs
validated = validate_params(
tool_call.params,
tools[tool_call.name].schema
)
// Execute with error handling
try:
result = tools[tool_call.name].execute(validated)
messages.append(tool_result(result))
catch error:
messages.append(tool_error(error.message))
else:
// No tool calls — return final response
return response.textWhen to Use
- Any task requiring access to external data, APIs, or real-time information
- When the LLM's parametric knowledge is insufficient or potentially outdated
- Tasks involving calculations, data lookups, or system interactions
- Building general-purpose assistants that need to interact with multiple services
- As a building block within more complex patterns (ReAct, Supervisor, Agent Teams)
When NOT to Use
- Pure reasoning or creative writing tasks with no external data needs
- When all required information is already in the prompt context
- Extremely latency-sensitive applications where tool calls add unacceptable delay
- When tools cannot be adequately sandboxed for safety
- Simple Q&A where the model can answer directly from training data
How Agents Use Tools
Tool-Augmented Generation (TAG) is the foundational pattern that makes LLMs genuinely useful as agents. A bare language model can only generate text based on its training data — it cannot access real-time information, perform precise calculations, interact with APIs, or modify external systems. Tools bridge this gap by giving the agent actuators that extend its reach into the real world.
The core mechanism works as follows: the agent receives a task, reasons about what information or actions it needs, selects an appropriate tool from its available toolkit, formulates the tool call with the correct parameters, receives the tool's output, and integrates that output into its ongoing reasoning. This cycle may repeat multiple times — the agent might search the web, then query a database based on the search results, then perform a calculation on the database data.
Modern LLMs implement tool use through function calling — the model is trained to emit structured JSON objects that specify a function name and its arguments, rather than generating freeform text. The application layer intercepts these structured outputs, executes the corresponding function, and feeds the result back to the model. This structured approach is more reliable than earlier methods that relied on parsing natural language tool calls from the model's text output.
Every major agent framework — LangGraph, OpenAI Agents SDK, Claude Agent SDK, Vercel AI SDK — supports tool use as a primitive operation. The difference between TAG and more complex patterns like ReAct is scope: TAG focuses specifically on tool interaction mechanics, while ReAct wraps tool use in an explicit reasoning loop.
Tool Selection Strategies
As an agent's toolkit grows, selecting the right tool becomes a critical challenge. An agent with 3 tools can easily pick the right one; an agent with 50 tools needs a strategy.
Description-based selection: Each tool has a natural-language description that explains what it does, when to use it, and what parameters it expects. The LLM reads all descriptions and selects the most appropriate tool. This is the default approach in most frameworks and works well for up to 10-15 tools. Beyond that, the descriptions compete for context window space and the model's selection accuracy degrades.
Category-based filtering: Tools are organized into categories (search, math, data, communication). The agent first selects a category, then selects a specific tool within that category. This two-stage approach reduces the cognitive load at each decision point and scales to larger toolkits.
Retrieval-based selection: Tool descriptions are embedded in a vector store. When the agent needs a tool, it performs a similarity search against the task description to retrieve the top-k most relevant tools. Only these candidates are presented to the model for final selection. This scales to hundreds or thousands of tools.
Learned routing: A lightweight classifier (fine-tuned BERT or similar) maps task descriptions to tool IDs. This is the fastest approach but requires training data and doesn't generalize to novel task types as well as LLM-based selection.
Regardless of the selection strategy, tool descriptions should be precise and unambiguous. Include the tool's purpose, expected input format, output format, and examples of correct usage. Poorly written tool descriptions are the most common cause of tool selection errors.
Tool Chaining and Composition
Real-world tasks often require multiple tools used in sequence, where each tool's output feeds into the next tool's input. This is tool chaining, and designing for it effectively is crucial for building capable agents.
Sequential chaining: The agent calls Tool A, processes the result, then calls Tool B with data derived from Tool A's output. Example: search for a company's stock ticker, then query a financial API with that ticker, then calculate return metrics from the financial data. The agent manages the chain implicitly through its reasoning loop.
Parallel tool calls: Modern function-calling APIs support multiple simultaneous tool calls. If the agent needs data from two independent sources (weather API and calendar API), it can request both in a single turn. This reduces latency by eliminating unnecessary sequential round-trips.
Conditional chaining: The choice of the next tool depends on the result of the current tool. If a search returns no results, try a different search engine. If a database query returns an error, fall back to a cached version. The agent's reasoning at each step determines the next action based on what it observed.
Tool composition patterns: Some tools are designed to work together as a pipeline — a scraper feeds into a parser, which feeds into an analyzer. When designing your toolkit, consider which tools are frequently chained together and whether a composite tool (that combines multiple steps internally) would be more reliable than relying on the agent to chain them correctly.
Error Recovery and Robustness
Tools fail. APIs return errors, databases time out, search engines return irrelevant results, and calculations receive malformed inputs. A robust tool-augmented agent must handle these failures gracefully.
Input validation: Before executing a tool call, validate that the parameters conform to the expected schema. Type checking, range validation, and format verification can catch many errors before they reach the tool. Return a descriptive error message to the agent so it can reformulate its request.
Output validation: After receiving a tool's response, verify that it matches the expected format and contains useful data. An HTTP 200 status code doesn't guarantee useful content — the response might be an error page, rate-limit notice, or empty result set.
Retry strategies: For transient errors (timeouts, rate limits, temporary unavailability), implement exponential backoff retries. For persistent errors, inform the agent so it can try an alternative approach. A well-designed system distinguishes between retryable and non-retryable errors.
Graceful degradation: When a tool is unavailable, the agent should fall back to its parametric knowledge (with an explicit caveat about reduced accuracy) or suggest an alternative approach. The worst outcome is a hard failure with no useful output; the agent should always provide its best available answer.
Sandboxing: Tools that execute code, modify files, or make API calls with side effects must be sandboxed. Use containerized execution environments, read-only filesystem mounts, and network policies to prevent accidental damage. This is especially important for code-execution tools where the agent generates the code to run.
Designing Effective Tool Interfaces
The design of your tool interfaces has an outsized impact on agent performance. Well-designed tools make the agent more capable; poorly designed tools cause frustration and errors.
Single responsibility: Each tool should do one thing well. A tool that searches the web, parses HTML, and summarizes content is doing three things — break it into three tools or implement it as a single well-tested pipeline. The agent needs to understand what each tool does, and compound tools are harder to reason about.
Clear parameter naming: Use descriptive parameter names (search_query not q, max_results not n). Include type annotations and constraints. The LLM reads these names and uses them to understand how to call the tool correctly.
Structured outputs: Return data in a structured format (JSON objects with labeled fields) rather than free text. Structured outputs are easier for the agent to parse and reason about in subsequent steps. Include metadata (source URLs, timestamps, confidence scores) alongside the data.
Informative error messages: When a tool fails, return a message that helps the agent understand what went wrong and how to fix it. "Invalid date format: expected YYYY-MM-DD, got '2024/01/15'" is far more useful than "Error: 400 Bad Request".
Idempotency: Where possible, make tools idempotent — calling them multiple times with the same input produces the same result without side effects. This makes retry logic safe and simplifies the agent's reasoning about tool interactions.
Implemented By
Explore Related Content
Tool Use & Function Calling
How agents interact with external tools, APIs, and services to take action in the real world.
GuideBuilding MCP Servers
Create Model Context Protocol servers that expose tools and resources to Claude and other MCP-compatible clients.
FrameworkClaude Agent SDK
Anthropic's production agent runtime