Glossary
Key terminology for AI agent development — 82 terms across 11 categories.
A
Agent
Core ConceptsAn AI system that can perceive its environment, reason about observations, make decisions, and take autonomous actions to achieve specific goals. Unlike simple chatbots, agents maintain state, use tools, and operate in iterative loops.
Agent Loop
Core ConceptsThe iterative cycle where an agent observes its environment, reasons about what to do next, takes an action, and processes the resulting feedback. This loop continues until the task is completed or a termination condition is met.
Agentic AI
Core ConceptsA class of AI systems that exhibit agency — the ability to independently plan, make decisions, use tools, and take multi-step actions with minimal human oversight. Agentic AI goes beyond single-turn question-answering to pursue complex goals over extended interactions.
Autonomous Agent
Core ConceptsAn agent that operates with little to no human intervention, independently deciding which actions to take, when to use tools, and how to recover from errors. Fully autonomous agents are capable of planning, executing, and self-correcting over extended task horizons.
Action
Core ConceptsA discrete operation an agent performs to affect its environment, such as calling an API, writing to a file, executing code, or sending a message. Actions are the 'doing' step of the agent loop.
Agentic RAG
RAGA RAG architecture where the retrieval process itself is managed by an agent that can decide what to search for, evaluate retrieval quality, reformulate queries, and iteratively retrieve until it has sufficient context. This goes beyond single-shot retrieval.
Agent Teams
PatternsA group of specialized agents that work together on a shared task, each contributing domain-specific expertise. Agent teams can be organized as supervisor-worker, peer-to-peer, or hierarchical structures.
A2A (Agent-to-Agent Protocol)
ProtocolsAn open protocol by Google that enables AI agents built with different frameworks and by different vendors to communicate, collaborate, and delegate tasks to each other. A2A provides a standard for cross-agent interoperability.
Agent Communication
Multi-AgentThe mechanisms by which agents exchange information, delegate tasks, and share results. Communication can be direct (agent-to-agent messages), mediated (through a shared message bus), or implicit (through shared state or artifacts).
C
Chain of Thought (CoT)
ReasoningA prompting technique that encourages the model to reason step-by-step before arriving at a final answer. By breaking down complex problems into intermediate reasoning steps, CoT significantly improves accuracy on math, logic, and multi-hop questions.
Computer Use
ToolsAn agent's ability to interact with a computer's graphical user interface by reading the screen, clicking buttons, typing text, scrolling, and navigating applications. This enables agents to use any software a human would, without requiring dedicated APIs.
Code Interpreter
ToolsA sandboxed execution environment where an agent can write and run code (typically Python) to perform calculations, data analysis, generate visualizations, or manipulate files. The agent receives the execution output and can iterate on the code.
Context Window
MemoryThe maximum number of tokens an LLM can process in a single request, encompassing both the input (prompt, system instructions, history) and the output. Context window sizes range from 4K tokens (older models) to 200K+ tokens (modern models like Claude).
Conversation History
MemoryThe running record of all messages exchanged between a user and an agent within a session. Conversation history provides continuity and context but consumes tokens from the context window, often requiring summarization or truncation strategies.
Chunking
RAGThe process of splitting large documents into smaller, semantically coherent pieces (chunks) before embedding them. Chunking strategy — chunk size, overlap, and splitting method — significantly affects retrieval quality in RAG pipelines.
Content Moderation
SafetyThe automated process of scanning, classifying, and filtering AI-generated content for harmful, inappropriate, or policy-violating material. Content moderation can happen at both the input stage (filtering user prompts) and the output stage (filtering model responses).
Cost per Query
ProductionThe total monetary cost of processing a single agent request, including all LLM API calls (input and output tokens), tool invocations, embedding generation, vector database queries, and infrastructure costs. Multi-step agents can have high per-query costs.
Caching
ProductionStoring and reusing the results of previous LLM calls, tool invocations, or embedding computations to reduce latency, costs, and redundant processing. Caching strategies include exact-match caching, semantic caching, and prompt caching.
E
Episodic Memory
MemoryMemory of specific past events, interactions, or experiences — the 'what happened' record. Episodic memory stores concrete instances (e.g., 'the user preferred dark mode last Tuesday') rather than generalized knowledge.
Embedding
RAGA dense numerical vector representation of text (or other data) that captures semantic meaning in a high-dimensional space. Similar texts have embeddings that are close together, enabling similarity search, clustering, and classification.
F
Function Calling
ToolsThe ability of an LLM to output structured JSON payloads that match predefined function schemas, enabling the model to invoke external tools, APIs, or services. The model decides which function to call and what arguments to pass based on the user's intent.
Fine-tuning
LLM FundamentalsThe process of further training a pre-trained LLM on a domain-specific or task-specific dataset to improve its performance on particular tasks. Fine-tuning adjusts the model's weights, unlike prompting which only changes the input.
Few-shot Prompting
LLM FundamentalsA prompting technique where several examples of the desired input-output format are included in the prompt. Few-shot examples teach the model the expected pattern without any weight updates, relying purely on in-context learning.
Fallback
ProductionAn alternative action or model that an agent system uses when the primary option fails, times out, or is rate-limited. Common fallbacks include using a different LLM provider, returning a cached response, or escalating to a human operator.
G
Guardrails
SafetySafety mechanisms that validate, filter, and constrain agent inputs and outputs against defined policies, formats, and safety criteria. Guardrails can be rule-based (regex, schema validation) or model-based (a classifier that checks for harmful content).
Graceful Degradation
ProductionA design principle where an agent system continues to provide useful (if reduced) functionality when a component fails, rather than crashing entirely. For example, falling back to a smaller model if the primary model is unavailable.
H
Hybrid Search
RAGA retrieval approach that combines dense vector similarity search with traditional keyword-based (BM25/TF-IDF) search. Hybrid search captures both semantic meaning and exact keyword matches, yielding better recall than either method alone.
Hierarchical Pattern
PatternsA multi-agent architecture organized in layers: a top-level agent delegates to mid-level agents, which in turn delegate to specialized leaf agents. This mirrors organizational hierarchies and enables managing very complex, multi-domain tasks.
Human-in-the-Loop
PatternsA design pattern where human review, approval, or intervention is required at critical decision points in an agent workflow. This provides safety guardrails while still leveraging agent autonomy for routine steps.
Handoff
PatternsThe transfer of control, context, and responsibility from one agent to another during a multi-agent workflow. A well-designed handoff includes passing relevant state, conversation history, and the specific sub-task the receiving agent should handle.
Hallucination
SafetyWhen an LLM generates information that is factually incorrect, fabricated, or not grounded in the provided context. Hallucinations are a fundamental challenge because models produce confident-sounding text even when they lack knowledge.
I
Inner Monologue
ReasoningThe internal reasoning trace that an agent generates before producing a visible output. Inner monologue lets the model 'think aloud' in a scratchpad, improving reasoning quality while keeping the final output clean and concise.
J
JSON Mode
ToolsA model configuration that constrains the LLM to always produce valid JSON in its response. JSON mode guarantees parseable output, eliminating the need for fragile regex-based extraction from free-form text.
Jailbreak
SafetyA technique that circumvents an LLM's safety training and content policies to produce restricted or harmful outputs. Jailbreaks exploit edge cases in the model's alignment, often through role-playing scenarios, encoding tricks, or multi-turn manipulation.
L
Long-term Memory
MemoryPersistent storage that allows an agent to retain and recall information across separate sessions, conversations, or even days. Long-term memory is typically implemented using external databases, vector stores, or dedicated memory services like Mem0 or Zep.
LLM (Large Language Model)
LLM FundamentalsA neural network with billions of parameters trained on vast amounts of text data that can generate, understand, and reason about natural language. LLMs are the reasoning engine at the core of modern AI agents.
Latency
ProductionThe total time elapsed between when a user sends a request and when they receive the final response. Agent latency is often higher than simple LLM calls because it includes multiple reasoning steps, tool calls, and potential retries.
M
MCP Tool
ToolsA tool exposed through the Model Context Protocol that an LLM application can discover and invoke at runtime. MCP tools are self-describing: they include their name, description, and input schema, so the model knows how to call them.
MCP (Model Context Protocol)
ProtocolsAn open protocol created by Anthropic that standardizes how LLM applications connect to external tools, data sources, and services. MCP provides a universal interface so that any MCP-compatible client can use any MCP-compatible server, solving the M x N integration problem.
MCP Server
ProtocolsA lightweight service that exposes tools, resources, and prompts through the Model Context Protocol. MCP servers wrap existing APIs, databases, or capabilities in a standardized interface that any MCP client can discover and use.
MCP Client
ProtocolsAn application or agent that connects to one or more MCP servers to discover and invoke their tools, read their resources, and use their prompt templates. Examples include Claude Desktop, VS Code extensions, and custom agent applications.
MCP Transport
ProtocolsThe communication layer used to exchange MCP messages between client and server. MCP supports two transport types: stdio (for local processes communicating over standard input/output) and HTTP with Server-Sent Events (for remote servers).
MCP Resource
ProtocolsA read-only data source exposed by an MCP server, such as a file, database record, or API response. Resources provide contextual information that the LLM can read and reference, distinct from tools which perform actions.
MCP Prompt
ProtocolsA reusable prompt template exposed by an MCP server that the client can discover and use. MCP prompts provide standardized ways to interact with a server's capabilities, including predefined arguments and descriptions.
Multi-Agent System
Multi-AgentAn architecture where multiple specialized AI agents collaborate, communicate, and delegate tasks to achieve complex goals that would be difficult for a single agent. Each agent typically has a focused role, specific tools, and a tailored system prompt.
O
Orchestration
Core ConceptsThe coordination and management of multiple agents, tools, or workflow steps to accomplish a larger, composite task. An orchestrator decides which sub-tasks to delegate, in what order, and how to aggregate results.
Observation
Core ConceptsThe feedback or data an agent receives after taking an action. Observations inform the next reasoning step and can include tool outputs, API responses, error messages, or environmental state changes.
Output Validation
SafetyThe process of verifying that an agent's output meets expected format, safety, and correctness criteria before it is returned to the user or passed to the next step. Output validation can include schema checks, toxicity classifiers, and factual consistency verification.
Observability
ProductionThe ability to understand an agent system's internal state through its external outputs — logs, traces, metrics, and debugging tools. Observability is essential for diagnosing failures, optimizing performance, and ensuring reliability in production agent deployments.
P
Plan-and-Execute
ReasoningA two-phase reasoning approach where the agent first generates a high-level plan (a sequence of steps), then executes each step individually. This separates planning from execution, allowing re-planning if a step fails.
Peer Collaboration
PatternsA multi-agent pattern where agents of equal authority collaborate on a task by exchanging messages, sharing partial results, and building on each other's work. There is no central coordinator — agents self-organize through communication.
Prompt
LLM FundamentalsThe input text sent to an LLM that specifies the task, provides context, and guides the model's response. Effective prompts are the primary interface for controlling LLM behavior and output quality.
Prompt Injection
SafetyAn attack where malicious instructions are embedded in user input (or retrieved content) to override the model's system prompt and intended behavior. Prompt injection is the most critical security risk for LLM applications.
PII Detection
SafetyThe identification and handling of Personally Identifiable Information (names, emails, phone numbers, SSNs, etc.) in agent inputs and outputs. PII detection is critical for compliance with privacy regulations like GDPR and CCPA.
R
ReAct
ReasoningA reasoning paradigm where the agent interleaves Reasoning (thinking about what to do), Acting (calling tools or taking actions), and Observing (processing results). ReAct bridges the gap between chain-of-thought reasoning and practical tool use.
Reflection
ReasoningA technique where an agent reviews its own outputs, identifies mistakes or gaps, and revises its response. Reflection enables self-improvement within a single task by adding a critique-and-revision loop after initial generation.
RAG (Retrieval-Augmented Generation)
RAGA technique that enhances LLM responses by first retrieving relevant documents from an external knowledge base, then including those documents in the prompt. RAG grounds the model's answers in real data, reducing hallucinations and enabling access to up-to-date or proprietary information.
Retrieval
RAGThe process of finding and fetching the most relevant documents or chunks from a knowledge base given a user query. Retrieval is the 'R' in RAG and can use vector similarity, keyword matching, or hybrid approaches.
Re-ranking
RAGA second-stage relevance scoring step applied after initial retrieval. A re-ranker (often a cross-encoder model) evaluates each retrieved document against the query and reorders results by true relevance, significantly improving precision.
ReAct Pattern
PatternsAn implementation pattern based on the ReAct paradigm where an agent alternates between generating a thought (reasoning trace), executing an action (tool call), and processing the observation (tool result). This is the most common agent loop pattern in production systems.
Rate Limiting
ProductionControlling the number of requests an agent system sends to upstream APIs (LLM providers, tools) within a time window. Rate limiting prevents quota exhaustion, avoids API bans, and ensures fair resource allocation across users.
S
State
Core ConceptsThe accumulated information an agent tracks during execution, including conversation history, intermediate results, tool outputs, and any variables needed for decision-making. State management is critical for multi-step reasoning.
Self-Ask
ReasoningA prompting method where the model decomposes a complex question into simpler sub-questions, answers each one independently (often with search), and then synthesizes a final answer from the sub-results.
Structured Output
ToolsLLM responses formatted as structured data (JSON, XML, YAML) rather than free-form text. Structured output enables reliable programmatic parsing and is essential for function calling, data extraction, and integration with downstream systems.
Short-term Memory
MemoryInformation that persists only for the duration of a single conversation or task session. In most LLM systems, short-term memory corresponds to the current context window — once the conversation ends or the context is cleared, this memory is lost.
Semantic Memory
MemoryGeneral knowledge and facts that are not tied to a specific episode or event. Semantic memory stores concepts, relationships, and learned information (e.g., 'this user is a backend engineer who works with Go').
Semantic Search
RAGSearch based on the meaning of the query rather than exact keyword matching. Semantic search uses embeddings to find documents that are conceptually similar to the query, even if they use different words.
Supervisor Pattern
PatternsA multi-agent architecture where a central 'supervisor' agent receives the user request, decomposes it into sub-tasks, delegates each sub-task to a specialized worker agent, and aggregates their results into a final response.
System Prompt
LLM FundamentalsA special prompt set at the beginning of a conversation that defines the model's role, behavior, constraints, and personality. System prompts persist across all user messages in a session and take priority over user instructions.
Swarm Intelligence
Multi-AgentA multi-agent approach inspired by biological swarms (ants, bees) where simple agents following local rules produce emergent intelligent behavior at the system level. OpenAI's Swarm framework implements this pattern for lightweight multi-agent orchestration.
T
Tree of Thought (ToT)
ReasoningA reasoning strategy where the model explores multiple possible solution paths simultaneously, evaluates their promise, and prunes unproductive branches. ToT excels at problems requiring search, planning, or creative exploration.
Tool Use
ToolsAn agent's ability to invoke external capabilities — APIs, databases, code execution, web browsing, or file systems — to accomplish tasks beyond pure text generation. Tool use is what transforms a language model into a capable agent.
Token
LLM FundamentalsThe basic unit of text processing for LLMs. Text is split into tokens before processing — roughly 4 characters or 3/4 of a word in English. Token counts determine context window usage, API costs, and processing speed.
Temperature
LLM FundamentalsA parameter (typically 0.0 to 2.0) that controls the randomness of LLM output. Lower temperatures (e.g., 0.0) produce more deterministic, focused outputs; higher temperatures (e.g., 1.0+) produce more creative, varied, and sometimes unpredictable responses.
Top-p (Nucleus Sampling)
LLM FundamentalsA sampling parameter that limits token selection to the smallest set of tokens whose cumulative probability exceeds the threshold p. Top-p = 0.9 means only the tokens comprising the top 90% of probability mass are considered, filtering out very unlikely tokens.
Tracing
ProductionThe practice of recording the full execution path of an agent request — every LLM call, tool invocation, reasoning step, and their timing. Traces provide a detailed timeline for debugging and performance analysis.
Throughput
ProductionThe number of agent requests a system can process per unit of time. Throughput is affected by model inference speed, tool call latency, concurrency limits, and rate limiting from upstream API providers.
Task Decomposition
Multi-AgentThe process of breaking a complex goal into smaller, manageable sub-tasks that can be assigned to individual agents or executed sequentially. Effective task decomposition is the foundation of both single-agent planning and multi-agent coordination.
V
Vector Database
RAGA database optimized for storing, indexing, and querying high-dimensional vector embeddings using approximate nearest neighbor (ANN) algorithms. Vector databases power the retrieval component of RAG systems and enable sub-second similarity search over millions of documents.
W
Workflow
Core ConceptsA defined sequence of steps, decision points, and branching logic that an agent or system follows to complete a task. Workflows can be static (hardcoded) or dynamic (generated by the agent at runtime).
Working Memory
MemoryThe actively maintained subset of information that an agent uses for its current reasoning step. Working memory is analogous to a scratchpad — it holds the most relevant context, intermediate results, and current goals.
Z
Zero-shot Prompting
LLM FundamentalsA prompting technique where the model is asked to perform a task without any examples, relying solely on instructions and its pre-trained knowledge. Zero-shot works well for tasks the model has seen during training.
Showing 82 of 82 terms. This glossary is continuously updated as the agentic AI field evolves.