intermediate15 min readGuide 8 of 12Updated Jun 15, 2025

Prompt Engineering for Agents

Craft system prompts that make your agents more reliable, capable, and predictable.

Prerequisites

1Completed the Getting Started guide
2Built at least one agent with tool use
3Basic understanding of LLM behavior

What you will learn

How to write effective system prompts for agents
Techniques for role definition and behavior constraints
How to write tool-use instructions that reduce errors
Output formatting strategies for structured responses
Few-shot prompting techniques for agents

System Prompts for Agents

The system prompt is the most important piece of an agent. It defines who the agent is, what it can do, and how it should behave. A good system prompt has four parts:

Identity — Who is this agent?
Capabilities — What tools does it have?
Constraints — What should it NOT do?
Output format — How should it respond?

SYSTEM_PROMPT = """You are a Senior Data Analyst agent.

## Identity
You help users analyze datasets and produce insights.
You have access to a SQL database and a charting tool.

## Tools Available
- query_database: Execute SQL queries against the analytics database
- create_chart: Generate charts from query results

## Constraints
- NEVER modify or delete data. Only SELECT queries are allowed.
- Always explain your reasoning before executing a query.
- If a query would return more than 1000 rows, add a LIMIT clause.
- If you are unsure about a column name, use the schema tool first.

## Output Format
1. State what you understand about the user's request
2. Explain your analysis approach
3. Execute queries and present results
4. Summarize key insights in bullet points
"""

Role Definition Best Practices

The identity section is not just flavor text — it significantly impacts agent behavior:

Be specific about expertise — "Senior Data Analyst with 10 years of experience in SQL and data visualization" produces better results than "helpful assistant".
Define the scope — Tell the agent what domains it covers and which it does not. "You handle analytics questions. For engineering questions, tell the user to contact the engineering team."
Set the tone — "You communicate in a professional but friendly manner. You use technical terms when appropriate but always explain them."

Python

# Bad: Too vague
agent = Agent(
    instructions="You are a helpful assistant.",
)

# Good: Specific role with clear boundaries
agent = Agent(
    instructions="""You are a customer support agent for Acme Corp.
    You handle billing, account, and product questions.
    For technical issues, escalate to the engineering team.
    Always verify the customer's account before making changes.
    Never share internal pricing or roadmap information.""",
)

TypeScript

// Bad: Too vague
const { text } = await generateText({
  model: anthropic("claude-sonnet-4-20250514"),
  system: "You are a helpful assistant.",
  prompt: userMessage,
});

// Good: Specific role with clear boundaries
const { text } = await generateText({
  model: anthropic("claude-sonnet-4-20250514"),
  system: `You are a customer support agent for Acme Corp.
    You handle billing, account, and product questions.
    For technical issues, escalate to the engineering team.
    Always verify the customer's account before making changes.
    Never share internal pricing or roadmap information.`,
  prompt: userMessage,
});

Tool Use Instructions

LLMs make better tool-use decisions when you explicitly tell them when and how to use each tool:

SYSTEM_PROMPT = """You are a research assistant.

## Tool Usage Guidelines

### search_web
- Use this FIRST for any factual question
- Prefer specific search queries over broad ones
- Always search before making claims about current events

### read_document
- Use this to read uploaded files
- Read the file BEFORE attempting to answer questions about it
- For large documents, read specific sections rather than the whole file

### create_summary
- Use this AFTER gathering information, not before
- Include the sources in the summary
- Maximum 500 words per summary

## Decision Flow
1. Understand the user's question
2. If it requires current information -> search_web
3. If it references a document -> read_document
4. Gather all needed information
5. Synthesize and respond (or use create_summary for long answers)
"""

Explicit decision flows reduce hallucination and tool-use errors because the model does not have to guess when to use each tool.

Output Formatting

Structure your agent's output to be consistent and machine-parseable when needed:

# For structured output, use explicit format instructions
agent = Agent(
    instructions="""You are a code review agent.

When reviewing code, ALWAYS use this exact format:

## Summary
One paragraph overview of the code quality.

## Issues Found
For each issue:
- **Severity**: critical | warning | info
- **Line**: line number or range
- **Issue**: description of the problem
- **Fix**: suggested fix

## Score
Overall quality score: X/10

If no issues are found, say "No issues found" and give a score of 10/10.
""",
)

For agents that need to return structured data to another system, combine prompt formatting with SDK features:

Python (Pydantic + OpenAI Agents SDK)

from pydantic import BaseModel
from agents import Agent, Runner

class ReviewResult(BaseModel):
    summary: str
    issues: list[dict]
    score: int

agent = Agent(
    name="CodeReviewer",
    instructions="Review code and identify issues.",
    output_type=ReviewResult,  # Enforces structured output
)

result = Runner.run_sync(agent, "Review this Python function: ...")
review = result.final_output_as(ReviewResult)
print(f"Score: {review.score}/10")

TypeScript (Zod + Vercel AI SDK)

import { generateObject } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const { object: review } = await generateObject({
  model: anthropic("claude-sonnet-4-20250514"),
  schema: z.object({
    summary: z.string().describe("One paragraph overview"),
    issues: z.array(z.object({
      severity: z.enum(["critical", "warning", "info"]),
      line: z.string(),
      issue: z.string(),
      fix: z.string(),
    })),
    score: z.number().min(0).max(10),
  }),
  prompt: "Review this TypeScript function: ...",
});

console.log(`Score: ${review.score}/10`);
console.log(`Issues: ${review.issues.length}`);

The Zod schema approach gives you compile-time type safety and runtime validation — the SDK enforces that the LLM output matches your schema exactly.

Few-Shot Examples

Including examples in the system prompt teaches the agent your expected behavior patterns:

SYSTEM_PROMPT = """You are a SQL generation agent.

Given a natural language question, generate a SQL query.

## Examples

User: How many users signed up last month?
Thought: I need to count users where created_at is within the last month.
SQL: SELECT COUNT(*) FROM users WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') AND created_at < DATE_TRUNC('month', CURRENT_DATE);

User: What are the top 5 products by revenue?
Thought: I need to join orders with products and sum the revenue, ordering by total.
SQL: SELECT p.name, SUM(o.amount) as total_revenue FROM orders o JOIN products p ON o.product_id = p.id GROUP BY p.name ORDER BY total_revenue DESC LIMIT 5;

User: Show me users who have never placed an order.
Thought: I need users that do not appear in the orders table. A LEFT JOIN with NULL check works.
SQL: SELECT u.id, u.email FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL;
"""

Key principles for few-shot examples:

Include 2-4 examples that cover different patterns
Show the reasoning process, not just the output
Include edge cases (e.g., the NULL check example above)
Keep examples realistic and relevant to your domain

Common Prompt Anti-Patterns

Avoid these patterns that lead to unreliable agent behavior:

"Be creative" — Agents need precision, not creativity. Say exactly what you want.
Contradictory instructions — "Be concise" and "explain everything in detail" in the same prompt confuse the model.
No constraints — Without explicit "do NOT" rules, the agent will attempt things you did not expect.
Wall of text — If your system prompt exceeds 2000 words, refactor it. Use tools or resources for reference data instead.
Assuming tool knowledge — Do not assume the model knows your tools. Describe when and why to use each one.

Common Mistakes to Avoid

!Writing system prompts that are too short — agents need detailed instructions to be reliable
!Not testing prompt changes with a variety of inputs before deploying
!Including dynamic data in system prompts instead of user messages or tool results
!Relying on the model to infer tool usage patterns instead of spelling them out explicitly
!Forgetting to define error handling behavior — what should the agent do when a tool fails?

Recommended Next Steps

Guardrails & Safety Evaluation & Testing Multi-Agent Architecture

Explore Related Content

Concept

Edit on GitHub