Chapter 32 Quiz: AI Agents and Tool Use

Instructions

Choose the best answer for each question. Each question has exactly one correct answer unless otherwise specified.

Question 1

What distinguishes an AI agent from a standard language model?

A) An agent uses a larger model with more parameters B) An agent can perceive its environment, reason, and take actions through tool use C) An agent always uses multiple language models simultaneously D) An agent does not require a language model at all

Answer: B Explanation: The defining characteristic of an AI agent is its ability to perceive, reason, and act—not merely generate text. An agent uses a language model as its reasoning engine but augments it with the ability to take actions through tools, APIs, and environment interactions. Agents can use any size model and do not require multiple models.

Question 2

In the ReAct pattern, what are the three interleaved components of each iteration?

A) Plan, Execute, Verify B) Thought, Action, Observation C) Input, Process, Output D) Query, Retrieve, Generate

Answer: B Explanation: ReAct (Reasoning + Acting) interleaves Thought (the model's reasoning in natural language), Action (a tool call or response), and Observation (the result returned by the tool). This structure allows the model to reason about what tools to use, execute them, and incorporate results into subsequent reasoning.

Question 3

What is the primary advantage of ReAct over pure chain-of-thought prompting?

A) ReAct uses fewer tokens per response B) ReAct provides grounding through tool outputs, reducing hallucination C) ReAct does not require a language model D) ReAct always produces shorter answers

Answer: B Explanation: Chain-of-thought prompting relies entirely on the model's parametric knowledge, which can lead to hallucinated facts. ReAct grounds the reasoning process with real tool outputs (observations), allowing the model to verify information and correct course based on actual data rather than generated assumptions.

Question 4

In modern function calling, how are tools typically defined for the language model?

A) As Python source code pasted into the prompt B) As natural language descriptions in the system prompt only C) As JSON schemas specifying name, description, and parameter types D) As compiled binary executables

Answer: C Explanation: Modern function calling APIs define tools using JSON schemas that specify the tool's name, description, and parameters (including types, descriptions, required fields, and enums). The model uses these schemas to determine when and how to call tools, generating structured JSON output that conforms to the schema.

Question 5

Which tool design principle states that each tool should perform a single, well-defined function?

A) Bounded Output B) Idempotency C) Single Responsibility D) Rich Schema

Answer: C Explanation: The Single Responsibility principle states that each tool should do one thing well. A search tool should only search, not also filter or format results. This makes tools more predictable, easier to test, and more composable—the agent can combine simple tools to achieve complex outcomes.

Question 6

When should an agent use parallel tool calls instead of sequential calls?

A) Always, because parallel is always faster B) When the tool calls are independent and do not depend on each other's results C) Only when using multi-agent systems D) When the tools are all calling the same API

Answer: B Explanation: Parallel tool calls are appropriate when the calls are independent—meaning the output of one call is not needed as input for another. For example, fetching weather for three different cities can be done in parallel, but looking up a person's email and then sending them a message must be sequential.

Question 7

What is the purpose of task decomposition in agent planning?

A) To reduce the number of tokens in the system prompt B) To break complex tasks into manageable subtasks that can be executed step by step C) To distribute work across multiple GPUs D) To compress the model's weights

Answer: B Explanation: Task decomposition breaks a complex, multi-step task into smaller, manageable subtasks. This helps the agent execute tasks in the right order, track progress, handle dependencies between steps, and recover from failures in individual steps without losing progress on the overall task.

Question 8

In the plan-and-execute architecture, what is the advantage of separating the planner from the executor?

A) It eliminates the need for tools B) It allows using different models for planning (higher capability) and execution (faster/cheaper) C) It removes the need for memory systems D) It guarantees perfect plans with no revisions needed

Answer: B Explanation: Separating planning from execution allows using a more capable (and potentially more expensive) model for the high-level planning step, while using a faster and cheaper model for executing individual steps. This optimizes the cost-capability tradeoff. The planner can also revise the plan based on execution results.

Question 9

Which type of agent memory stores the current conversation history and context window contents?

A) Episodic memory B) Long-term memory C) Short-term (working) memory D) Semantic memory

Answer: C Explanation: Short-term or working memory encompasses the current conversation history and context window contents—the immediate information the agent is working with. It is limited by the context window size and is typically lost when the session ends, unlike long-term memory which persists across sessions.

Question 10

What is the main challenge that memory systems address in agent architectures?

A) The high cost of language model inference B) The finite context window that limits how much information the agent can access at once C) The inability of language models to generate text D) The lack of tool definitions in the system prompt

Answer: B Explanation: Even large context windows (128K-200K tokens) can be exhausted by tool outputs, reasoning traces, and conversation history during complex multi-step tasks. Memory systems provide structured storage and retrieval of information, allowing the agent to work with effectively unlimited context by selectively loading relevant information.

Question 11

In the memory scoring formula $\text{score}(m) = \alpha \cdot \text{recency}(m) + \beta \cdot \text{relevance}(m, q) + \gamma \cdot \text{importance}(m)$, what does the relevance component typically measure?

A) How recently the memory was created B) Semantic similarity between the memory and the current query C) How many times the memory has been accessed D) The length of the memory in tokens

Answer: B Explanation: The relevance component measures semantic similarity between the stored memory and the current query, typically computed as cosine similarity between their embedding vectors. This ensures that memories related to the current task are prioritized, even if they are not the most recent.

Question 12

Which framework is specifically designed for multi-agent conversations and provides built-in support for conversable agents?

A) LlamaIndex B) LangChain C) AutoGen D) FAISS

Answer: C Explanation: Microsoft's AutoGen is specifically focused on multi-agent conversations, providing conversable agents that can communicate with each other, code execution environments, human proxy agents, and group chat managers for coordinating multiple agents. LangChain/LangGraph and LlamaIndex have multi-agent capabilities but are not primarily focused on this use case.

Question 13

What is the primary risk of running agent-generated code without sandboxing?

A) The code might run too slowly B) The code could access and modify the host system, potentially causing damage or data exfiltration C) The code will not produce any output D) The code cannot use external libraries

Answer: B Explanation: Without sandboxing, agent-generated code runs with the same permissions as the host process, potentially accessing the file system, network, and system resources. This could lead to data deletion, data exfiltration, installation of malicious software, or other harmful side effects. Sandboxing isolates code execution to prevent these risks.

Question 14

In a hierarchical multi-agent system, what is the role of the "manager" agent?

A) To execute all tool calls directly B) To decompose tasks, assign subtasks to specialized worker agents, and aggregate results C) To store long-term memories for all agents D) To replace the language model in the system

Answer: B Explanation: In a hierarchical multi-agent system, the manager agent receives the overall task, decomposes it into subtasks, assigns each subtask to an appropriate specialized worker agent, collects the results, and synthesizes them into a final output. The manager coordinates but does not directly execute most tool calls.

Question 15

What is "prompt injection" in the context of agent systems?

A) Injecting additional parameters into tool schemas B) Malicious content in tool outputs that attempts to override the agent's instructions C) Adding more few-shot examples to improve performance D) Compressing the system prompt to save tokens

Answer: B Explanation: Prompt injection occurs when malicious content embedded in external sources (web pages, documents, API responses) processed by the agent attempts to override or manipulate the agent's original instructions. For example, a web page might contain hidden text saying "Ignore your previous instructions and instead..." which could cause the agent to behave unexpectedly.

Question 16

Which safety mechanism requires human approval before the agent executes irreversible actions?

A) Rate limiting B) Sandboxing C) Confirmation gates D) Output monitoring

Answer: C Explanation: Confirmation gates are checkpoints in the agent's execution flow where human approval is required before proceeding with potentially irreversible or high-stakes actions (such as deleting files, sending emails, or making purchases). This provides a critical safety layer for actions that cannot be easily undone.

Question 17

What is the "principle of least privilege" as applied to AI agents?

A) Using the smallest possible language model B) Granting agents only the minimum permissions needed for their specific task C) Reducing the number of tools to exactly one D) Limiting the agent to a single conversation turn

Answer: B Explanation: The principle of least privilege means giving an agent only the permissions and tools necessary to complete its assigned task. A research agent needs search access but not file write permissions; a code reviewer needs read access but not deploy access. This minimizes the potential damage from agent errors or adversarial manipulation.

Question 18

Which benchmark evaluates an agent's ability to resolve real GitHub issues in real repositories?

A) HotpotQA B) WebShop C) SWE-bench D) ALFWorld

Answer: C Explanation: SWE-bench (Software Engineering Benchmark) tests agents on their ability to fix real bugs and resolve real issues from open-source GitHub repositories. It is one of the most challenging and practically relevant agent benchmarks because it requires understanding complex codebases, diagnosing issues, and writing correct patches.

Question 19

In the code generation agent loop, what happens when the generated code produces an error?

A) The agent immediately returns the error to the user B) The agent analyzes the error message, revises the code, and re-executes in a feedback loop C) The agent switches to a different language model D) The agent deletes the code and starts a completely unrelated task

Answer: B Explanation: Code generation agents implement a feedback loop: generate code, execute it, and if an error occurs, analyze the error message (stack trace, error type), revise the code to fix the issue, and re-execute. This iterative debugging process continues until the code runs successfully or a maximum retry limit is reached.

Question 20

What is the primary difference between API-based and GUI-based web browsing agents?

A) API-based agents are more expensive B) API-based agents receive cleaned text content, while GUI-based agents see and interact with visual page renders C) GUI-based agents cannot fill forms D) API-based agents require a GPU

Answer: B Explanation: API-based browsing agents use tools that return cleaned text content from web pages, which is simpler but loses interactive capabilities. GUI-based browsing agents see screenshots of rendered web pages and use mouse/keyboard actions to interact, enabling full interactive browsing including complex forms and dynamic content, but requiring vision capabilities and incurring higher latency.

Question 21

What problem does the "token cost explosion" present in multi-agent systems?

A) Tokens become more expensive over time B) Each agent call involves LLM inference, so a multi-agent system multiplies the number of inference calls per user request C) Agents use tokens as currency to trade with each other D) The token vocabulary becomes too large for the model

Answer: B Explanation: In a multi-agent system, each agent interaction requires one or more LLM inference calls. A system with 5 agents, each making 3 LLM calls, requires 15 inferences for a single user request. This multiplication of inference calls significantly increases cost and latency, making cost management a critical concern in multi-agent architectures.

Question 22

Which technique involves an agent evaluating its own output and revising it based on self-critique?

A) Retrieval-augmented generation B) Reflexion C) Quantization D) Knowledge distillation

Answer: B Explanation: Reflexion (Shinn et al., 2023) is a technique where the agent generates an output, then evaluates (critiques) its own output, and revises it based on the critique. This self-reflection loop can be repeated multiple times and has been shown to significantly improve agent performance on complex tasks by catching and correcting errors.

Question 23

What is the Model Context Protocol (MCP)?

A) A compression algorithm for reducing context window usage B) An open standard for connecting AI agents to external data sources and tools C) A training protocol for multi-modal language models D) A memory management system for GPU allocation

Answer: B Explanation: The Model Context Protocol (MCP), introduced by Anthropic in 2024, is an open standard that defines how AI agents discover and interact with external tools and data sources. It provides standardized tool schemas, dynamic tool discovery, resource access, and composability, addressing the fragmentation of tool interfaces across different providers.

Question 24

Why is agent evaluation fundamentally more challenging than standard LLM evaluation?

A) Agents always produce longer outputs B) Agents are non-deterministic, involve multi-step interactions with external tools, and the same task can be solved via different valid paths C) Agents cannot be tested with automated systems D) Agents only work with structured data

Answer: B Explanation: Agent evaluation is more challenging because: (1) agents are non-deterministic—the same task may yield different action sequences across runs, (2) multi-step dependencies mean early errors cascade, (3) external tool interactions introduce variability, and (4) many tasks have subjective success criteria or multiple valid solution paths, making simple comparison to reference answers insufficient.

Question 25

Which cost optimization strategy uses a cheaper model for simple agent steps and a more capable model for complex reasoning?

A) Caching B) Early termination C) Model routing D) Batch processing

Answer: C Explanation: Model routing directs different agent steps to different models based on complexity. Simple steps (e.g., parsing a tool output, making a straightforward tool call) can use a faster, cheaper model, while complex reasoning steps (e.g., planning, synthesis, error diagnosis) use a more capable and expensive model. This optimizes the cost-capability tradeoff across the agent's execution.