Chapter 36: Quiz

AI Coding Agents and Autonomous Workflows

Test your understanding of AI coding agents, the plan-act-observe loop, tool use, guardrails, and agent architecture. Each question has one best answer unless otherwise noted. Click the answer toggle to reveal the correct response and explanation.


Question 1

What is the fundamental difference between an AI coding assistant and an AI coding agent?

A) Agents use larger language models than assistants B) Agents can autonomously plan, execute, and iterate on multi-step tasks C) Agents are always faster than assistants D) Agents do not use natural language for interaction

Answer **B) Agents can autonomously plan, execute, and iterate on multi-step tasks** The defining characteristic of an agent is its ability to operate in a goal-directed, autonomous loop. While assistants respond to individual prompts, agents plan a sequence of actions, execute them via tools, observe the results, and iterate until the goal is achieved. Model size, speed, and interface are orthogonal concerns.

Question 2

Which of the following is the correct order of the plan-act-observe loop?

A) Act, Plan, Observe, Update B) Observe, Plan, Act, Update C) Plan, Observe, Act, Update D) Plan, Act, Update, Observe

Answer **B) Observe, Plan, Act, Update** The agent first observes the current state of the environment, then plans based on that observation and its goal, then acts by executing one step of the plan, and finally updates its memory with the results. The "observe" step comes first because the agent needs current information before it can plan effectively.

Question 3

In the autonomy spectrum described in this chapter, a "Level 2 -- Supervised Agent" is best characterized by which behavior?

A) It only generates text responses without any tool use B) It plans and executes multi-step tasks but pauses for human approval at key checkpoints C) It executes end-to-end workflows with no human intervention at all D) It improves its own capabilities over time

Answer **B) It plans and executes multi-step tasks but pauses for human approval at key checkpoints** Level 2 (Supervised Agent) represents the balance between autonomy and oversight. The agent can plan and execute complex tasks but requires human approval at defined checkpoints. This is how Claude Code operates in its default configuration. Level 0 is pure assistant, Level 3 is autonomous, and Level 4 is self-improving.

Question 4

What is function calling (tool use) in the context of AI agents?

A) The agent directly invokes operating system functions B) The model generates structured requests to invoke predefined functions, whose results are returned to the model C) The user calls Python functions that interact with the model D) The agent writes function definitions in source code

Answer **B) The model generates structured requests to invoke predefined functions, whose results are returned to the model** Function calling is the mechanism by which a model expresses its intent to use a tool. The model generates a structured tool call (function name and arguments), the host system executes the function, and the result is fed back to the model. The model itself does not directly execute code.

Question 5

Which of the following is NOT a recommended principle for designing agent tools?

A) Clear, detailed descriptions B) Typed, well-documented parameters C) A single generic tool that can handle all operations D) Informative return values

Answer **C) A single generic tool that can handle all operations** A single generic tool is the opposite of good tool design. Tools should be specific, well-described, and have clear parameter schemas. The model chooses tools based on their descriptions, so vague or overly broad tools lead to poor tool selection and unreliable behavior.

Question 6

In the "Issue-to-Pull-Request" workflow pattern, what is the typical FIRST step the agent takes?

A) Create a feature branch B) Write tests for the new feature C) Read the issue description and explore the relevant codebase D) Open a pull request

Answer **C) Read the issue description and explore the relevant codebase** Before making any changes, the agent must understand the task (by reading the issue) and the context (by exploring the relevant code). Creating a branch, writing code, and opening a PR come later in the workflow. Understanding before acting is a fundamental principle of effective agent behavior.

Question 7

What is the primary purpose of sandboxing in agent systems?

A) To speed up the agent's execution B) To confine the agent to a restricted environment so damage is contained even if guardrails fail C) To improve the quality of generated code D) To reduce the cost of API calls

Answer **B) To confine the agent to a restricted environment so damage is contained even if guardrails fail** Sandboxing is a defense-in-depth measure. Even if other guardrails (permission checks, blocked command lists) fail to prevent a destructive action, sandboxing ensures the damage is limited to the sandbox environment and cannot affect the host system or production infrastructure.

Question 8

Which guardrail approach follows the "defense in depth" principle?

A) Using only a blocked command list B) Using only sandboxing C) Using multiple layers: permission systems, sandboxing, cost limits, output validation, and time limits D) Relying on the LLM to avoid dangerous actions

Answer **C) Using multiple layers: permission systems, sandboxing, cost limits, output validation, and time limits** Defense in depth means using multiple overlapping layers of protection, each catching failures that slip through the others. No single guardrail is sufficient on its own. Relying on the LLM's judgment alone is particularly unreliable since language models can be unpredictable.

Question 9

In a tiered permission system, which category of action typically requires explicit human approval?

A) Reading source files B) Searching code with grep C) Deleting files or running arbitrary commands D) Running read-only terminal commands

Answer **C) Deleting files or running arbitrary commands** In a tiered permission system, read-only actions (reading files, searching code, read-only commands) typically require no approval. Writing files in the project directory may require soft approval. Destructive or potentially dangerous actions (deleting files, running arbitrary commands, network access) require explicit human approval.

Question 10

What is "confidence-based escalation" in human-in-the-loop patterns?

A) The human rates their confidence in the agent's ability B) The agent self-assesses its confidence and only requests human input when confidence is low C) The agent always asks for approval on every action D) The system uses statistical models to predict whether the agent will succeed

Answer **B) The agent self-assesses its confidence and only requests human input when confidence is low** Confidence-based escalation allows the agent to operate autonomously for actions it is highly confident about while escalating uncertain decisions to a human. This balances efficiency (no unnecessary approval requests) with safety (human oversight on risky decisions).

Question 11

What problem does context window management solve for coding agents?

A) It reduces the cost of the language model B) It prevents the conversation history from exceeding the model's input capacity during long-running tasks C) It improves the speed of code generation D) It ensures the agent uses the correct programming language

Answer **B) It prevents the conversation history from exceeding the model's input capacity during long-running tasks** During long-running tasks, the agent's conversation history grows with every file read, tool result, and reasoning step. Context window management techniques (summarization, selective retention, sliding windows) prevent this history from exceeding the model's maximum input size, which would cause the agent to fail.

Question 12

What is the role of a CLAUDE.md file in agent memory?

A) It contains the agent's source code B) It serves as long-term project memory, storing conventions, architecture information, and instructions that persist across sessions C) It logs all agent actions for debugging D) It stores the agent's API keys

Answer **B) It serves as long-term project memory, storing conventions, architecture information, and instructions that persist across sessions** The CLAUDE.md file is a persistent knowledge base that is read at the start of every agent interaction. It stores project-specific context, conventions, known issues, and instructions, giving the agent immediate access to accumulated knowledge without consuming context window space on re-exploration.

Question 13

When an agent encounters a "rate limit" error from an API, what is the most appropriate recovery strategy?

A) Abort the entire task immediately B) Retry with exponential backoff C) Escalate to a human immediately D) Switch to a different API

Answer **B) Retry with exponential backoff** Rate limit errors are transient---they resolve when the rate limit window resets. Exponential backoff (waiting increasingly longer between retries) is the standard approach because it avoids hammering the API while still eventually completing the request. Aborting is premature, escalating is unnecessary, and switching APIs is disproportionate.

Question 14

What is "graceful degradation" in agent workflows?

A) The agent gradually slows down as it runs longer B) When the agent cannot fully complete a task, it reports what was done, what remains, and what went wrong rather than failing silently C) The agent reduces its output quality to save costs D) The agent's performance naturally degrades over time

Answer **B) When the agent cannot fully complete a task, it reports what was done, what remains, and what went wrong rather than failing silently** Graceful degradation means providing maximum value even when full success is not possible. A partial result with clear documentation of completed steps, remaining work, and encountered errors is far more useful than an opaque failure. It allows the human to efficiently pick up where the agent left off.

Question 15

Which of the following is the MOST important metric for evaluating a coding agent?

A) Number of tokens consumed per task B) Task completion rate with correct, tested code C) Speed of code generation D) Number of tools available to the agent

Answer **B) Task completion rate with correct, tested code** While efficiency, speed, and tool count all matter, the most fundamental metric is whether the agent actually completes its tasks successfully with code that works correctly and passes tests. An agent that is fast but produces broken code, or efficient but rarely completes tasks, is not useful.

Question 16

What is SWE-bench?

A) A software engineering certification exam B) A benchmark of real-world GitHub issues used to evaluate coding agents' ability to produce working patches C) A tool for measuring code performance D) A development environment for building agents

Answer **B) A benchmark of real-world GitHub issues used to evaluate coding agents' ability to produce working patches** SWE-bench is considered the gold standard for evaluating autonomous coding agents. It consists of real GitHub issues from popular open-source projects, paired with their actual fixes. The agent receives the issue description and the full repository and must produce a patch that resolves the issue.

Question 17

In the simple agent architecture described in Section 36.10, what are the four main components?

A) Frontend, Backend, Database, API B) Agent Controller, LLM Interface, Tool Registry, Guardrails Layer C) Planner, Executor, Validator, Reporter D) Input Parser, Code Generator, Test Runner, Output Formatter

Answer **B) Agent Controller, LLM Interface, Tool Registry, Guardrails Layer** The four-component architecture described in the chapter is: the Agent Controller (which manages the plan-act-observe loop), the LLM Interface (which handles API calls to the language model), the Tool Registry (which manages available tools like file I/O, terminal, and search), and the Guardrails Layer (which enforces permissions, limits, and validation).

Question 18

What is the "self-healing loop" pattern in agent error recovery?

A) The agent repairs its own source code B) The agent writes code, runs it, detects errors in the output, analyzes the error, fixes the code, and reruns C) The agent restores from a backup after a failure D) The agent switches to a different model when it encounters errors

Answer **B) The agent writes code, runs it, detects errors in the output, analyzes the error, fixes the code, and reruns** The self-healing loop is one of the most powerful agent patterns. It mirrors how human developers work: write code, run it, read the error message, fix the issue, and try again. The key insight is that error messages contain diagnostic information that a capable agent can use to identify and correct problems.

Question 19

What is the principle of least privilege as applied to coding agents?

A) The agent should use the smallest possible language model B) The agent should be granted only the minimum permissions necessary to complete its specific task C) The agent should generate the least amount of code possible D) The agent should make the fewest API calls possible

Answer **B) The agent should be granted only the minimum permissions necessary to complete its specific task** The principle of least privilege means that an agent performing a code review should have read-only access, while an agent implementing a feature should have write access only to specific directories. This minimizes the potential damage from bugs, misuse, or compromised agents.

Question 20

Which termination condition would MOST reliably prevent an infinite agent loop?

A) Goal achieved check B) Maximum iteration count C) Human intervention D) Cost monitoring

Answer **B) Maximum iteration count** While all four are valid termination conditions, a maximum iteration count is the most reliable hard stop because it is simple and deterministic. Goal achievement checks depend on correctly identifying completion (which can fail). Human intervention depends on someone monitoring. Cost monitoring depends on accurate cost tracking. An iteration limit provides a guaranteed upper bound regardless of other factors.

Question 21

In the ReAct (Reasoning + Acting) pattern, what is the purpose of the explicit reasoning steps?

A) To reduce the number of API calls B) To make the agent's decision process interpretable and to improve the quality of action selection C) To slow down the agent for safety purposes D) To generate documentation automatically

Answer **B) To make the agent's decision process interpretable and to improve the quality of action selection** In the ReAct pattern, the agent alternates between reasoning (thinking through the problem in natural language) and acting (using tools). The reasoning steps serve two purposes: they create an interpretable chain of thought useful for debugging, and they improve the quality of subsequent actions by encouraging the model to think before acting.

Question 22

What is "adaptive planning" in the context of coding agents?

A) Planning that adjusts the plan after each action based on new observations B) Planning that uses machine learning to improve over time C) Planning that adapts to different programming languages D) Planning that adjusts the budget based on task complexity

Answer **A) Planning that adjusts the plan after each action based on new observations** Adaptive planning means the agent's plan is not fixed at the start. After each action, the agent observes the result and revises its plan accordingly. This is crucial for software development, where unexpected issues (a file in a different location, a test failing for an unexpected reason) require on-the-fly plan adjustments.

Question 23

The "80/20 rule of agent autonomy" states that:

A) 80% of agents fail and 20% succeed B) Agents excel at automating 80% of routine development work while the remaining 20% of novel or ambiguous tasks benefits from human judgment C) Agents should be autonomous 80% of the time and supervised 20% D) 80% of agent cost comes from 20% of the tasks

Answer **B) Agents excel at automating 80% of routine development work while the remaining 20% of novel or ambiguous tasks benefits from human judgment** This principle recognizes that most development work is routine (boilerplate, standard patterns, test writing, formatting) and well-suited for automation, while a smaller proportion involves novel decisions, complex business logic, or ambiguous requirements that benefit from human expertise. Effective agent workflows are designed around this reality.

Question 24

What is the MOST important quality of a tool's description for an AI agent?

A) It should be as short as possible B) It should include example inputs and outputs C) It should be clear, specific, and detailed enough for the model to understand when and how to use the tool D) It should be written in formal academic language

Answer **C) It should be clear, specific, and detailed enough for the model to understand when and how to use the tool** The model selects tools based on their descriptions. A description like "Execute command" is far less useful than one that explains what the tool does, what constraints it has, and when to use it. While examples can help, the core requirement is clarity and specificity. Brevity at the expense of clarity hurts tool selection.

Question 25

When building a coding agent from scratch, which component should be implemented and tested FIRST?

A) The LLM integration B) The tool definitions and their execution C) The guardrails layer D) The user interface

Answer **B) The tool definitions and their execution** Tools form the foundation of the agent's ability to interact with the environment. Without working tools, the agent cannot take any actions. By implementing and testing tools first, you ensure the agent has reliable building blocks. The LLM integration, guardrails, and UI can then be layered on top of a solid tool foundation.