Chapter 7 Exercises
How to use these exercises: Work through the parts in order. Part A builds recognition skills, Part B develops analysis, Part C applies concepts to your own domain, Part D requires synthesis across multiple ideas, Part E stretches into advanced territory, and Part M provides interleaved practice that mixes skills from all levels.
For self-study, aim to complete at least Parts A and B. For a course, your instructor will assign specific sections. For the Deep Dive path, do everything.
Part A: Pattern Recognition
These exercises develop the fundamental skill of recognizing gradient descent across domains.
A1. For each of the following scenarios, identify: (a) the system performing gradient descent, (b) what quantity is being minimized or maximized, (c) the "gradient" the system follows, and (d) what a local optimum would look like.
a) A plant growing toward sunlight, bending its stem in the direction of brightest light.
b) A shopper comparing prices at several grocery stores and gradually settling on a preferred store for each category of goods.
c) A river delta forming as water spreads across a flat coastal plain.
d) A species of moth gradually evolving darker coloration in an industrialized area where tree bark is soot-covered.
e) A thermostat adjusting a furnace to maintain a set temperature.
A2. The chapter describes water flowing downhill as the simplest example of gradient descent. Identify three other physical systems that perform gradient descent on energy landscapes (systems that naturally minimize some form of energy). For each, describe what "downhill" means.
A3. Classify each of the following as a local optimum, a global optimum, or a saddle point. Justify your classification.
a) A ball resting at the bottom of a shallow dip on a hillside, with deeper valleys visible on either side.
b) A company that dominates its market and has the highest profit margins in its industry.
c) A mountain pass -- the lowest point along a ridge but higher than the valleys on either side.
d) A programming language that most developers know but that most language designers consider poorly designed.
e) A person stuck in a job they dislike but that pays well, in a field with few alternatives that do not require retraining.
A4. The chapter argues that the difficulty of an optimization problem is determined primarily by the landscape, not the algorithm. For each pair below, determine which problem has a more rugged landscape and explain why:
a) Finding the lowest point in a smooth, round valley vs. finding the lowest point in the Swiss Alps.
b) Optimizing a recipe by adjusting one ingredient at a time vs. optimizing a recipe where every ingredient's effect depends on the quantities of all other ingredients.
c) Finding the cheapest flight on a route with many daily departures vs. finding the cheapest combination of three connecting flights across different airlines.
A5. For each of the following, identify whether the system is performing gradient descent, gradient ascent, or both simultaneously:
a) Evolution by natural selection.
b) A ball rolling in a bowl.
c) A business maximizing profit while minimizing cost.
d) A neural network during training.
e) Water evaporating from a puddle on a warm day.
A6. Identify the "step size" in each of the following gradient descent processes. What determines how large or small each step is?
a) A mutation in a gene.
b) A daily price change in a stock market.
c) A single weight update in neural network training.
d) An ant laying pheromone on a trail.
e) A river eroding a channel through rock.
Part B: Analysis
These exercises require deeper analysis of gradient descent concepts.
B1. The chapter describes the vertebrate eye's backward wiring as an example of a local optimum trap. Identify two other examples from biology where organisms appear to be stuck with suboptimal designs because evolution could not cross a fitness valley. For each, explain: (a) what the current design is, (b) what a better design might look like, (c) why the intermediate steps between the two would be detrimental, and (d) why natural selection cannot make the transition.
B2. Consider the QWERTY keyboard as a local optimum in the landscape of keyboard designs.
a) What is the "fitness function" for a keyboard layout? What would you measure to determine whether one layout is better than another?
b) Why is QWERTY a local optimum rather than a global one? What evidence exists for superior alternatives?
c) Map the forces that maintain the QWERTY local optimum. What would have to change for the market to escape this basin of attraction?
d) Is QWERTY strictly analogous to the vertebrate eye (a design flaw locked in by path dependence), or are there important differences? Explain.
B3. The chapter draws a parallel between market equilibrium and gradient descent. Analyze the limits of this analogy:
a) In what ways does a real market not behave like a simple gradient descent algorithm?
b) What role do information asymmetries play? (In gradient descent, the algorithm has access to the true gradient. Do market participants have access to the true supply-demand gradient?)
c) How do speculative bubbles relate to the gradient descent framework? Is a bubble a failure of gradient following or a result of following a distorted gradient?
d) When do markets fail to converge to equilibrium? What landscape features would cause this?
B4. Compare and contrast the gradient descent performed by evolution and the gradient descent performed by neural network training. Fill in a comparison table with the following categories: (a) what is being optimized, (b) what constitutes a "step," (c) what determines the step size, (d) how the gradient is estimated, (e) the typical dimensionality of the landscape, (f) the typical ruggedness of the landscape, (g) strategies for escaping local optima.
B5. The chapter mentions that the local optima problem may be less severe in very high-dimensional spaces. Analyze this claim:
a) Why does increasing the dimensionality of the landscape make true local minima (where every dimension curves upward) increasingly rare?
b) What are saddle points, and why do they dominate the landscape in high dimensions?
c) What does this imply about the relationship between the complexity of a system (number of adjustable parameters) and the difficulty of optimizing it?
d) Does this insight apply to biological evolution? Why or why not?
B6. The chapter describes pheromone evaporation in ant colonies as a form of regularization that prevents lock-in. Identify analogous mechanisms in three other domains -- mechanisms that prevent systems from becoming permanently trapped in local optima by introducing controlled forgetting or decay.
Part C: Application to Your Domain
These exercises ask you to apply gradient descent concepts to a field you know well.
C1. Choose a domain you are familiar with (your profession, a hobby, an academic field). Describe a specific optimization problem in that domain using the landscape metaphor. Your description should include:
a) What is being optimized (the "height" on the landscape).
b) What the dimensions of the landscape are (the variables that can be adjusted).
c) Whether the landscape is smooth or rugged, and why.
d) At least one local optimum that practitioners commonly get stuck in.
e) Any strategies that practitioners use, consciously or unconsciously, to escape local optima.
C2. Think of a time in your own life when you were stuck in a local optimum -- a situation that was "good enough" locally but prevented you from finding something better. Using the landscape metaphor:
a) Describe the local peak you were on. What made it stable?
b) What did the valley between your local peak and a better solution look like? What costs or risks would you have had to accept?
c) Did you escape the local optimum? If so, how? If not, why not?
d) Could any of the escape strategies discussed in Section 7.10 have helped?
C3. Identify a gradient that you follow regularly in your daily life -- a quantity that you unconsciously try to minimize or maximize, step by step, based on local information. Describe the gradient, the steps you take, and whether you have ever noticed yourself getting stuck in a local optimum as a result.
Part D: Synthesis
These exercises require integrating ideas from multiple chapters.
D1. Gradient Descent and Emergence (Ch. 3). The chapter argues that ant foraging is both an example of emergence and an example of gradient descent. Explain how these two perspectives complement each other. What does the emergence lens reveal that the gradient descent lens misses? What does the gradient descent lens reveal that the emergence lens misses? Can you find another example where both lenses apply simultaneously?
D2. Gradient Descent and Phase Transitions (Ch. 5). Consider a system performing gradient descent on a landscape that undergoes a phase transition (the landscape itself changes suddenly). Using examples from the chapter or your own knowledge:
a) What happens to a gradient-following system when the landscape beneath it transforms?
b) Can a phase transition help a system escape a local optimum? Explain.
c) Can a phase transition trap a system in a new local optimum? Explain.
d) How does this connect to the concept of "creative destruction" in economics?
D3. Gradient Descent and Signal and Noise (Ch. 6). When a system follows a gradient, the gradient signal may be corrupted by noise. Analyze this interaction:
a) What happens to gradient descent when the gradient estimate is noisy? Does the system still converge?
b) In evolution, the "gradient signal" is the fitness difference between variants. Under what conditions is this signal too weak to detect (i.e., buried in noise)?
c) The chapter mentions that stochastic gradient descent (adding noise to the gradient) can actually help by preventing entrapment in local optima. How does this relate to the signal-and-noise framework? When is noise a problem, and when is it a solution?
D4. Gradient Descent and Feedback Loops (Ch. 2). Gradient descent relies on feedback -- the system takes a step, observes the result, and adjusts. Analyze the type of feedback involved:
a) Is gradient descent driven by positive or negative feedback? Explain.
b) What happens when there is a delay in the feedback? (The system takes a step but does not observe the result until much later.)
c) How does the ant pheromone system combine both positive feedback (trail reinforcement) and implicit negative feedback (pheromone evaporation)? Why is this combination important?
D5. Write a short essay (500-800 words) arguing either for or against the following claim: "Gradient descent is the most fundamental optimization strategy in nature, and all other optimization strategies are variations or extensions of it." Use examples from at least three domains.
Part E: Advanced Extensions
These exercises push beyond the chapter's scope into challenging territory.
E1. Research the concept of NK landscapes introduced by Stuart Kauffman. Explain how the parameters N (number of components) and K (degree of interaction) affect landscape ruggedness. What value of K produces a smooth landscape? What value produces maximum ruggedness? What does this framework predict about the difficulty of optimization in systems with high interdependence?
E2. The chapter mentions that momentum -- carrying forward velocity from previous steps -- can help gradient descent navigate complex landscapes. Analyze the physics analogy: compare gradient descent with momentum to a ball rolling downhill with inertia. What physical phenomena (oscillation, overshooting, damping) have analogues in optimization? Why do engineers often add a "damping" term to momentum-based methods?
E3. Consider the concept of a deceptive landscape -- a landscape in which the gradient consistently points away from the global optimum. (That is, following the gradient downhill reliably leads you to a local optimum that is far from the global one.) Give an example of a deceptive landscape from any domain. Why are deceptive landscapes particularly challenging? What strategies might work on a deceptive landscape where gradient descent fails?
E4. The chapter compares the local optima problem in low-dimensional versus high-dimensional landscapes. Research the concept of the loss surface in deep learning. What recent empirical evidence exists about the structure of loss surfaces in large neural networks? How does this evidence support or challenge the claim that local optima are less of a problem in high dimensions?
E5. Consider the philosophical question: Is evolution "optimal"? Given what you know about gradient descent and local optima, argue for or against the claim that natural selection produces the best possible organisms. What does "best possible" even mean in this context?
Part M: Mixed Practice (Interleaved Review)
These problems deliberately mix concepts from this chapter with concepts from Chapters 1-6 to strengthen cross-domain transfer.
M1. (Ch. 1 + Ch. 7) The chapter argues that gradient descent is substrate-independent. But are there features of specific substrates that affect how well gradient descent works? Compare the "hardware" on which gradient descent runs in three different domains (evolution, markets, neural networks) and explain how substrate-specific features (mutation rate, transaction costs, numerical precision) influence the algorithm's behavior.
M2. (Ch. 4 + Ch. 7) Chapter 4 discussed power law distributions. The chapter mentions that the distribution of local optima depths on rugged landscapes can follow power law patterns. What would this mean practically? If most local optima are shallow (easily escaped) but a few are very deep (nearly inescapable), how should this affect an optimization strategy?
M3. (Ch. 5 + Ch. 7) Design a thought experiment in which a phase transition (Ch. 5) helps a population escape a local optimum on a fitness landscape (Ch. 7). Be specific about what system you are considering, what the local optimum is, what the phase transition is, and how the transition reshapes the landscape.
M4. (Ch. 2 + Ch. 3 + Ch. 7) The chapter argues that ant foraging combines feedback loops (Ch. 2), emergence (Ch. 3), and gradient descent (Ch. 7). Identify another system that combines all three of these patterns. For your chosen system, explain how each pattern manifests and how they interact.
M5. (Ch. 6 + Ch. 7) A central bank adjusting interest rates to control inflation is performing gradient descent. Apply the signal-and-noise framework from Chapter 6: What is the signal (the true gradient the bank needs to follow)? What is the noise? How does the bank estimate the gradient in the presence of noise? What are the consequences of following a noisy gradient?