Chapter 8 Quiz: Self-Assessment

Instructions: Answer each question without looking back at the chapter. After completing all questions, check your answers against the key at the bottom. If you score below 70%, revisit the relevant sections before moving on to Chapter 9.


Multiple Choice

Q1. The explore/exploit tradeoff refers to:

a) The tension between theoretical research and practical application b) The tension between gathering new information and acting on information already obtained c) The tension between individual success and collective welfare d) The tension between short-term and long-term planning

Q2. In the multi-armed bandit problem, the "arms" represent:

a) Different strategies for solving a single problem b) Different options of unknown quality from which you must choose repeatedly c) Different time periods in which decisions must be made d) Different mathematical frameworks for optimization

Q3. In bacterial chemotaxis, the "tumble" phase serves the function of:

a) Exploitation -- the bacterium swims toward known food sources b) Exploration -- the bacterium randomly reorients to try a new direction c) Communication -- the bacterium signals other bacteria about food locations d) Rest -- the bacterium conserves energy between runs

Q4. What triggers a bacterium to shift from running (exploitation) to tumbling (exploration)?

a) A timer that automatically triggers tumbling at regular intervals b) Increasing attractant concentration (things getting better) c) Flat or decreasing attractant concentration (things not improving) d) The presence of other bacteria in the vicinity

Q5. Why do venture capitalists fund many startups knowing most will fail?

a) Because they have too much money to invest in a few companies b) Because startup returns follow a power-law distribution, where rare extreme successes dominate total returns c) Because government regulations require portfolio diversification d) Because they cannot predict which startups will succeed based on available information

Q6. In jazz improvisation, a "lick" (a practiced, familiar phrase) represents:

a) Exploration -- trying something new b) Exploitation -- deploying known-good material c) A failure of creativity d) An interruption of the improvisational flow

Q7. According to the chapter, why do toddlers explore more broadly than adults?

a) Because they lack the intelligence to focus on one thing b) Because their brains are wired for novelty-seeking, and the optimal explore/exploit ratio favors exploration when knowledge is low and time horizon is long c) Because their parents encourage exploration over exploitation d) Because they have not yet learned the rules of social behavior

Q8. Premature convergence occurs when:

a) A system explores too many options and never commits to one b) A system locks onto an option too early, before sufficient exploration has revealed the full landscape c) A system converges on the mathematically optimal solution too quickly for human operators to verify d) Multiple systems converge on the same solution independently

Q9. Exploitation myopia refers to:

a) The inability to see long-term consequences of exploitative behavior b) The tendency to overvalue certain, immediate rewards from exploitation over uncertain, delayed rewards from exploration c) The tendency of exploiting systems to develop tunnel vision about their environment d) The inability to exploit known resources efficiently

Q10. The Upper Confidence Bound (UCB) strategy selects options based on:

a) The option with the highest known average reward b) The option with the least uncertainty c) The option with the highest plausible reward given its uncertainty -- favoring both known-good options and poorly understood options d) A random selection weighted by past performance

Q11. Thompson sampling works by:

a) Systematically trying each option an equal number of times before choosing b) Drawing random samples from probability distributions for each option and choosing the option with the highest sample c) Computing exact optimal solutions using dynamic programming d) Following the option most recently recommended by other users

Q12. The cooling schedule refers to:

a) The rate at which a system's performance degrades over time b) The optimal shift from more exploration early to more exploitation later, as time passes and knowledge accumulates c) The process of gradually reducing an organization's risk tolerance d) The decreasing rate of innovation in mature industries

Q13. According to the chapter, the right balance between exploration and exploitation depends on:

a) Intelligence and education level b) How much you already know, how much time you have left, how variable the environment is, and how fast it is changing c) Cultural norms and institutional expectations d) Whether the decision-maker is an individual or an organization

Q14. In the context of career decisions, David Epstein's Range argument suggests that:

a) Specialists always outperform generalists b) Early specialization is the only path to excellence c) Broad early exploration followed by later specialization often produces superior outcomes, because it samples more of the landscape before committing d) Career exploration should continue indefinitely throughout one's life

Q15. Why does a non-stationary environment (one where conditions change over time) favor maintaining a permanent exploration budget?

a) Because exploration is inherently more enjoyable than exploitation b) Because a system that has stopped exploring is optimized for conditions that may no longer exist and will be unable to adapt when they change c) Because non-stationary environments are less profitable than stationary ones d) Because mathematical proofs show that exploration is always optimal

Q16. In the chapter's analysis, what role does the rhythm section play in enabling jazz exploration?

a) It distracts the audience from the soloist's mistakes b) It provides a stable exploitation base (reliable structure) from which the soloist can safely depart and return c) It forces the soloist to stay within the chord changes d) It reduces the musical complexity so the soloist can focus

Q17. Regret in multi-armed bandit theory is defined as:

a) The emotional response to a bad decision b) The difference between the reward actually received and the reward that would have been received by always choosing the best arm c) The total cost of all exploration activities d) The probability of choosing the wrong arm on any given trial

Q18. 3M's "15 percent time" rule is an example of:

a) Exploitation -- maximizing productivity of existing employees b) Institutionalized exploration -- accepting a guaranteed cost for uncertain future benefit c) A cooling schedule -- reducing exploration over time d) Premature convergence -- forcing employees into narrow specialization

Q19. Which of the following best describes the chapter's threshold concept, "The Optimal Balance Shifts"?

a) There is a single correct ratio of exploration to exploitation that applies universally b) Exploration is always better than exploitation for organisms and organizations that want to survive c) The right amount of exploration depends on time remaining, knowledge accumulated, and environmental stability -- and this balance should shift toward exploitation as the horizon shrinks d) The balance between exploration and exploitation cannot be analyzed rationally and must be determined by intuition

Q20. A child who is forced into narrow specialization very early (for example, intensive training in a single sport from age three) risks:

a) Premature convergence -- locking into a single domain before adequate exploration has revealed whether it is the best fit b) Excessive exploration -- sampling too many domains without developing depth c) Exploitation myopia -- becoming too focused on immediate rewards d) Both (a) and (c)


Short Answer

Q21. In one or two sentences, explain why the multi-armed bandit problem cannot be solved by simply "exploring first, then exploiting."

Q22. Give one example each of: (a) a system that under-explores, and (b) a system that under-exploits. For each, describe the consequence of the imbalance.

Q23. The chapter draws a parallel between the bacterium's adaptation mechanism (methylation that prevents permanent lock-in to exploitation) and institutional mechanisms that serve the same function. Name one such institutional mechanism and explain how it prevents lock-in.

Q24. Explain the difference between UCB and Thompson sampling in plain language (no mathematical notation).

Q25. The chapter states that "exploration is easier when exploitation provides a stable base." Give one example from the chapter and one original example from your own experience.


Answer Key

Q1: b) The tension between gathering new information and acting on information already obtained.

Q2: b) Different options of unknown quality from which you must choose repeatedly.

Q3: b) Exploration -- the bacterium randomly reorients to try a new direction.

Q4: c) Flat or decreasing attractant concentration (things not improving).

Q5: b) Because startup returns follow a power-law distribution, where rare extreme successes dominate total returns.

Q6: b) Exploitation -- deploying known-good material.

Q7: b) Because their brains are wired for novelty-seeking, and the optimal explore/exploit ratio favors exploration when knowledge is low and time horizon is long.

Q8: b) A system locks onto an option too early, before sufficient exploration has revealed the full landscape.

Q9: b) The tendency to overvalue certain, immediate rewards from exploitation over uncertain, delayed rewards from exploration.

Q10: c) The option with the highest plausible reward given its uncertainty -- favoring both known-good options and poorly understood options.

Q11: b) Drawing random samples from probability distributions for each option and choosing the option with the highest sample.

Q12: b) The optimal shift from more exploration early to more exploitation later, as time passes and knowledge accumulates.

Q13: b) How much you already know, how much time you have left, how variable the environment is, and how fast it is changing.

Q14: c) Broad early exploration followed by later specialization often produces superior outcomes, because it samples more of the landscape before committing.

Q15: b) Because a system that has stopped exploring is optimized for conditions that may no longer exist and will be unable to adapt when they change.

Q16: b) It provides a stable exploitation base (reliable structure) from which the soloist can safely depart and return.

Q17: b) The difference between the reward actually received and the reward that would have been received by always choosing the best arm.

Q18: b) Institutionalized exploration -- accepting a guaranteed cost for uncertain future benefit.

Q19: c) The right amount of exploration depends on time remaining, knowledge accumulated, and environmental stability -- and this balance should shift toward exploitation as the horizon shrinks.

Q20: d) Both (a) and (c).

Q21: Because exploration has a cost -- every trial spent on a suboptimal option is a missed opportunity to exploit the best-known option. Since you cannot explore all options exhaustively without consuming your entire budget, you must interleave exploration and exploitation, progressively shifting toward exploitation as you learn more.

Q22: (a) Under-exploration: A company that never invests in R&D, relying entirely on existing products. Consequence: the company is blindsided when competitors innovate and its products become obsolete. (b) Under-exploitation: A researcher who reads widely across many fields but never writes papers or develops deep expertise in any one area. Consequence: the researcher generates no original contributions despite broad awareness.

Q23: Term limits (in government) or mandatory rotation policies (in corporate management) prevent leaders from permanently exploiting a single strategy by forcing periodic re-evaluation and fresh perspectives. Fund lifecycles in venture capital force the return of capital and the re-evaluation of strategy, preventing indefinite commitment to underperforming investments.

Q24: UCB looks at each option and asks: "Given what I know, what is the best this option could be?" It picks the option with the most optimistic plausible value, which means it naturally gravitates toward options that are either known to be good or not well enough understood to rule out. Thompson sampling instead rolls dice for each option -- the dice are weighted by what is known about each option -- and picks whichever option rolls highest. Options that could be great (because we are uncertain) have a chance of rolling high, which gives them a fair shot at being tried.

Q25: From the chapter: Securely attached toddlers explore more freely because the caregiver provides a reliable safe haven to return to. Original example (answers will vary): A professional with savings or a working spouse may take more career risks (exploration) because financial security (exploitation of existing resources) provides a safety net.


Scoring Guide

  • 20-25 correct (80-100%): Strong understanding. Proceed to Chapter 9.
  • 15-19 correct (60-79%): Adequate understanding with some gaps. Review the sections corresponding to missed questions before proceeding.
  • Below 15 (below 60%): Significant gaps in understanding. Re-read the chapter, focusing on the sections that correspond to missed questions, before attempting Chapter 9.