Chapter 14 Exercises
How to use these exercises: Work through the parts in order. Part A builds recognition skills, Part B develops analysis, Part C applies concepts to your own domain, Part D requires synthesis across multiple ideas, Part E stretches into advanced territory, and Part M provides interleaved practice that mixes skills from all levels.
For self-study, aim to complete at least Parts A and B. For a course, your instructor will assign specific sections. For the Deep Dive path, do everything.
Part A: Pattern Recognition
These exercises develop the fundamental skill of recognizing overfitting -- and distinguishing it from underfitting -- across domains.
A1. For each of the following scenarios, identify whether the primary risk is overfitting or underfitting. Explain your reasoning by identifying the model, the data, and the degrees of freedom.
a) A doctor treats all patients with chest pain as heart attack cases, regardless of age, medical history, or symptom profile.
b) A stock picker reads financial news, follows social media sentiment, tracks lunar cycles, monitors sunspot activity, and uses all of these to predict daily stock movements.
c) A teacher gives every student the same grade, reasoning that individual assignment scores fluctuate too much to be meaningful.
d) A political analyst attributes the outcome of every election to a single factor: the state of the economy.
e) A historian explains the French Revolution by citing 23 distinct causal factors, each supported by specific pieces of evidence.
f) A machine learning model with 50 million parameters is trained on a dataset of 500 examples.
A2. For each of the following, identify the "training data" and the "test data." Then explain what it would mean for the model to overfit.
a) A chef develops a recipe by cooking for her family every night for a month, adjusting based on their feedback.
b) A basketball coach designs a game strategy based on reviewing footage of the opposing team's last ten games.
c) A therapist develops a treatment approach based on her first fifty patients.
d) A city planner designs traffic flow based on measurements taken during three consecutive weekdays.
e) A hiring manager develops a profile of "successful employees" based on the characteristics of the company's ten best performers.
A3. Classify each of the following as an example of apophenia, narrative overfitting, the multiple testing problem, or data snooping. Some may involve more than one.
a) A grandmother insists that her arthritis flares up before rain, even though weather records show no consistent correlation.
b) A pharmaceutical company tests its drug on 20 different patient subgroups and reports the one subgroup where the drug showed a significant effect.
c) A sports commentator explains a team's winning streak by pointing to the new assistant coach hired three weeks ago, a change in the pre-game meal routine, and the star player's new haircut.
d) A financial analyst tests 500 trading rules on historical data and presents the five most profitable ones to investors.
e) A conspiracy theorist notes that three powerful CEOs all traveled to the same city within the same month and interprets this as evidence of a secret summit.
A4. The chapter identifies Occam's razor as a regularization technique. For each of the following pairs of explanations, identify which is simpler (fewer degrees of freedom) and which is more complex. Then explain under what circumstances you might prefer the complex explanation.
a) "The patient has the flu" vs. "The patient has a rare autoimmune condition triggered by a specific environmental toxin."
b) "Sales declined because we raised prices" vs. "Sales declined because of a simultaneous shift in consumer preferences, a competitor's viral marketing campaign, seasonal demand variation, and a change in the algorithm of our primary advertising platform."
c) "The student failed because they didn't study" vs. "The student failed because of test anxiety, a learning disability, family stress, poor teaching, and an unfair grading rubric."
A5. The chapter discusses how the human brain evolved to overfit (to err on the side of seeing patterns that aren't there) because false negatives were more costly than false positives in the ancestral environment. For each of the following modern situations, evaluate whether the ancestral calibration is still appropriate or whether the cost structure has changed.
a) Hearing a noise in your house at night and assuming it is an intruder.
b) Noticing that your stomach feels odd after eating at a new restaurant and concluding the food was contaminated.
c) Observing that your investment portfolio dropped 5% on a day when a specific political event occurred and concluding that the event caused the drop.
d) Meeting three unfriendly people from a particular city and concluding that people from that city are unfriendly.
Part B: Analysis
These exercises require deeper analysis of overfitting concepts.
B1. The Degrees of Freedom Audit. Choose a claim that you have recently encountered in the news, in a book, or in a conversation. Conduct a "degrees of freedom audit":
a) Identify the claim and its source.
b) What data or evidence supports the claim?
c) How many interpretive choices were made in connecting the data to the claim? (These are the degrees of freedom.)
d) How large and representative is the underlying dataset?
e) Compute the informal ratio of degrees of freedom to data points. Is it high (risky) or low (safer)?
f) Has the claim been tested on independent data? If not, how could it be?
B2. Regularization Mapping. The chapter identifies regularization techniques in machine learning (L1/L2 penalties, dropout, early stopping), science (peer review, replication), finance (diversification), governance (constitutional constraints), and personal cognition (humility).
a) For each of the following domains, propose at least one regularization technique that operates there and explain how it functions to prevent overfitting: (i) journalism, (ii) law enforcement, (iii) education, (iv) software engineering, (v) relationship advice.
b) For each technique you identified, what is the cost of the regularization? What genuine patterns might it cause you to miss (bias increase)?
c) Are there domains where regularization is notably absent? What consequences would you predict?
B3. The Replication Crisis in Depth. The chapter describes the replication crisis as overfitting at the level of scientific fields.
a) Explain why small sample sizes contribute to overfitting in science. Use the degrees-of-freedom framework.
b) Explain how "researcher degrees of freedom" differ from model parameters in machine learning. Are they more or less dangerous? Why?
c) Pre-registration requires researchers to specify their hypotheses and analysis plans before collecting data. Explain why this reduces overfitting using the vocabulary of this chapter.
d) Some researchers argue that pre-registration is overly constraining and prevents serendipitous discoveries. Evaluate this argument using the bias-variance tradeoff. Is there merit to the concern?
B4. Conspiracy Thinking and Falsifiability. The chapter argues that conspiracy thinking is overfitting applied to the interpretation of events.
a) Choose a well-known conspiracy theory. Identify the "training data" (the events and facts the theory explains), the "model" (the theory itself), and the "degrees of freedom" (the interpretive flexibility in the theory).
b) Does the theory make falsifiable predictions -- predictions about new evidence that could prove it wrong? If not, why does this make it more likely to be overfit?
c) How does Occam's razor apply? Is there a simpler explanation that accounts for the same key evidence?
d) Why is the unfalsifiability of a theory related to overfitting? What is the connection between a model that can explain any data and a model that has too many degrees of freedom?
B5. The Bias-Variance Tradeoff in Practice. Consider the following claim: "All swans are white."
a) A person who has observed 10 swans (all white) makes this claim. What is the risk of overfitting? What is the risk of underfitting?
b) A person who has observed 10,000 swans (all white) makes this claim. Has the risk of overfitting decreased? Why or why not?
c) What kind of evidence would constitute an "out-of-sample test" for this claim?
d) In 1697, Dutch explorers discovered black swans in Australia. Using the vocabulary of this chapter, explain what happened. Was the claim "all swans are white" overfit, underfit, or something else?
Part C: Application
These exercises ask you to apply overfitting concepts to your own experience and context.
C1. Identify three beliefs you currently hold -- one personal, one professional, and one about the world -- and for each:
a) What data or experience is the belief based on?
b) How large and representative is that dataset?
c) Could the pattern you see be noise rather than signal? What would it take to test this?
d) What regularization technique could you apply to prevent overfitting?
C2. Think of a time when you changed your mind about something important. Using the vocabulary of this chapter:
a) What was the original model (belief)?
b) What was the "training data" it was based on?
c) What "test data" (new evidence or experience) caused the model to fail?
d) Was the original belief overfit, underfit, or appropriately fit?
C3. In your professional domain, identify one common belief or practice that might be overfit to a specific context. Explain:
a) What the belief or practice is.
b) The specific context (the "training data") where it developed and works well.
c) The contexts (the "test data") where it might fail.
d) What regularization technique could make the belief or practice more robust.
C4. Design a personal "regularization routine" -- a set of habits or practices that would help you avoid overfitting in your everyday reasoning. Include at least one technique from each of the following categories:
a) Seeking disconfirming evidence.
b) Reducing degrees of freedom (simplifying your explanations).
c) Out-of-sample testing (checking your beliefs against new or unfamiliar data).
d) Peer review (consulting others who might see things differently).
Part D: Synthesis
These exercises require integrating overfitting with concepts from multiple chapters.
D1. Overfitting and Signal Detection (Chapter 6). The chapter argues that overfitting is "confusing noise for signal."
a) Using the vocabulary of Chapter 6 (sensitivity, specificity, false positives, false negatives), explain overfitting and underfitting. What is the sensitivity and specificity of an overfit model? An underfit model?
b) In signal detection theory, the detection threshold determines the tradeoff between false positives and false negatives. What is the analogue of the detection threshold in the bias-variance tradeoff?
c) The chapter on signal and noise introduced the concept of the noise floor. How does the concept of the noise floor relate to the concept of irreducible error in the bias-variance decomposition?
D2. Overfitting and Bayesian Reasoning (Chapter 10). The chapter connects overfitting to Bayesian reasoning through the concept of priors.
a) Explain how a strong prior functions as regularization. How does a strong prior reduce the risk of overfitting? What is the cost (in terms of bias)?
b) A Bayesian agent with a uniform prior (no prior information) is maximally flexible. Is this agent more or less prone to overfitting than one with a strong prior? Why?
c) Bayes' theorem says the posterior should combine the prior with the likelihood. If the prior is very strong and the data is noisy, the posterior will be dominated by the prior. Is this overfitting or underfitting? Explain.
d) The chapter on Bayesian reasoning discussed the replication crisis in terms of base rate neglect. How does this Bayesian perspective complement the overfitting perspective presented in this chapter?
D3. Overfitting and Satisficing (Chapter 12). The chapter argues that satisficing is a form of regularization.
a) Explain how "stopping early" (accepting a good-enough solution) prevents overfitting in machine learning. How does this map onto Herbert Simon's satisficing?
b) Schwartz showed that maximizers often make "better" choices but are less happy. Could this paradox be explained by overfitting? (Hint: is the maximizer overfitting their decision to a specific set of criteria that may not generalize to their future preferences?)
c) The 1/N rule (allocating equally across options) was discussed in Chapter 12 as a satisficing strategy. Explain why it is also a regularization strategy, using the vocabulary of this chapter.
d) If satisficing prevents overfitting, does optimizing always lead to overfitting? Under what conditions can optimization avoid overfitting?
D4. Overfitting and Feedback Loops (Chapter 2). The chapter briefly mentions feedback loops in the context of financial markets.
a) Explain how a positive feedback loop between a trading strategy and market behavior can cause the strategy to appear to work in backtesting but fail in live trading. How is this related to overfitting?
b) Peer review is described as regularization. But peer review can also create feedback loops -- reviewers may favor papers that confirm the prevailing paradigm. Could peer review, intended as regularization, itself become a source of underfitting (suppressing genuine but surprising findings)?
c) How does the concept of overfitting relate to the echo chamber effect in social media? Is an echo chamber an overfitting problem, an underfitting problem, or both?
Part E: Extension
These exercises push beyond the chapter's content into more advanced territory.
E1. The No-Free-Lunch Theorem. The "No Free Lunch" theorem in machine learning states that no learning algorithm is universally better than any other across all possible problems. This is a formalization of the bias-variance tradeoff.
a) Explain, in non-technical language, what the No Free Lunch theorem says and why it implies that the bias-variance tradeoff is inescapable.
b) What does this theorem imply for the search for a "theory of everything" in physics? Is a theory of everything possible in principle, or does the No Free Lunch theorem suggest that all theories must overfit some aspects of reality and underfit others?
c) If there is no universal best algorithm, how should a practitioner choose which algorithm to use? What role does domain knowledge play?
E2. Overfitting in Artificial Intelligence Alignment. The AI alignment problem asks how to build AI systems that pursue human-intended goals.
a) How could an AI system "overfit" to its training signal -- learning to optimize a reward function in ways that do not generalize to what humans actually want?
b) The concept of "reward hacking" -- where an AI finds an unintended way to maximize its reward -- is closely related to overfitting. Explain the connection.
c) What regularization techniques could be applied to AI alignment? How do they map onto the regularization techniques discussed in this chapter?
E3. Overfitting and the Philosophy of Science. Karl Popper argued that the mark of a scientific theory is falsifiability -- the ability to specify conditions under which the theory would be proven wrong.
a) Explain the connection between falsifiability and regularization. How does requiring falsifiability constrain a theory's degrees of freedom?
b) Thomas Kuhn argued that normal science operates within paradigms that are not easily falsified. Could a Kuhnian paradigm be understood as an overfitting problem? What would the "test data" be that would reveal the overfitting?
c) The chapter argues that Occam's razor is regularization. Is Occam's razor the same as falsifiability? If not, how do they differ as regularization techniques?
Part M: Mixed Practice (Interleaved Review)
These problems deliberately mix concepts from Chapters 6, 10, 12, and 14 to strengthen retrieval and transfer.
M1. A new medical screening test has been developed. In the development phase, the researchers tested 50 different biomarkers and found that biomarker #37 had a statistically significant correlation with the disease (p < 0.05). They developed a screening test based on biomarker #37 and reported high accuracy.
a) From an overfitting perspective (Chapter 14), what is the problem with this process? How does the multiple testing problem apply?
b) From a signal-and-noise perspective (Chapter 6), is biomarker #37 likely to be signal or noise? What would you need to determine this?
c) From a Bayesian perspective (Chapter 10), what is the prior probability that any randomly selected biomarker is associated with the disease? How does this prior affect the posterior probability that biomarker #37 is a real finding?
d) What regularization techniques should the researchers apply before deploying the test clinically?
M2. A manager at a software company notices that the three most productive employees all have standing desks. She proposes buying standing desks for the entire engineering team to boost productivity.
a) Using the vocabulary of Chapter 14, explain why this conclusion might be overfit. Identify the training data, the model, and the degrees of freedom.
b) Using the vocabulary of Chapter 6, identify the signal and the noise in this observation. What is the signal-to-noise ratio?
c) Using the vocabulary of Chapter 12, is the manager maximizing or satisficing? Would a satisficing approach lead to a different conclusion?
d) Using the vocabulary of Chapter 10, what is the manager's implicit prior? How should she update it given such limited evidence?
M3. A history student writes an essay arguing that World War I was inevitable because of the alliance system, arms race, and imperial competition. The student supports each causal factor with specific evidence from the pre-war period.
a) From an overfitting perspective, evaluate the essay. Is it overfit, underfit, or appropriately fit? Consider the degrees of freedom and the sample size (one war).
b) From a signal-and-noise perspective, how can the student distinguish between causes that were genuine drivers of the war (signal) and causes that merely coincided with it (noise)?
c) From a Bayesian perspective, how should the student's prior beliefs about the causes of war affect the essay's argument?
d) What "regularization" techniques should the student apply to strengthen the essay? (Hint: think about counterfactual analysis and comparison with other cases.)
M4. You are advising a friend who has tried three different diets, each for two weeks. On the third diet, she lost four pounds. She is convinced that the third diet is the right one for her body and plans to follow it for a year.
a) Is this conclusion likely overfit? Identify the degrees of freedom and the data.
b) What is the "test data" that would validate or invalidate her conclusion?
c) From a satisficing perspective, is two weeks on each diet sufficient? What threshold should she use?
d) What role might confirmation bias (a Bayesian concept) play in her interpretation of the data?
e) Design an experiment she could run to reduce the overfitting risk.