Case Study 10.1: David Learns Why Neural Networks Work the Way They Do

From Memorizing Steps to Understanding Mechanisms


David had been in tutorial hell for four months.

For those unfamiliar: tutorial hell is the state of consuming an endless stream of instructional content — videos, articles, courses, books — without making real progress. It feels productive. You're learning constantly. But you can't build anything you haven't seen before. You can't troubleshoot when something breaks. You can't explain what you've "learned" to someone else.

David had completed three separate machine learning courses online. He'd watched hours of video explanations of backpropagation. He'd run the code, gotten it to work, and moved on. By his own description: "I knew what the algorithm did, in the sense that I could follow the steps. I had no idea why it worked."

The distinction matters enormously. Someone who understands the steps of an algorithm can apply it when conditions are identical to what they learned. Someone who understands the mechanism can adapt, troubleshoot, extend, and transfer the knowledge to new situations.

David wanted the second kind of understanding. He just didn't know how to get it.


The Backpropagation Problem

Backpropagation is the algorithm used to train neural networks — to adjust the network's weights in response to prediction errors. It is the fundamental mechanism that makes modern deep learning possible.

Every introduction to neural networks explains backpropagation with the same general structure: 1. Make a forward pass (compute a prediction) 2. Calculate the error (how wrong is the prediction?) 3. Compute gradients (how much did each weight contribute to the error?) 4. Update the weights (adjust each weight by a small amount in the direction that reduces error) 5. Repeat

David could recite these steps. He could run backpropagation code. On a tutorial's exercises, he got everything right.

But he hit a wall when he tried to use a neural network on a real project and got poor training results. He couldn't diagnose the problem because he didn't have a working mental model of why each step produced the intended effect.

He was stuck not because he'd memorized the wrong steps, but because he'd memorized the steps without understanding the causal chain underneath them.


The Elaboration Shift: Asking "Why?" at Every Step

David had read Chapter 10 of a learning science text. He decided to apply it aggressively to backpropagation.

He printed out a worked example of backpropagation — a simple neural network with two layers and numerical values at every step. For each line of the computation, he forced himself to answer:

Why does this step follow from the previous one? What principle or concept is being applied here? What would happen if we did this step differently?

The first time through, he got stuck almost immediately.

Step 3 — "compute the gradient with respect to the loss" — had seemed straightforward in the tutorials. But when he asked himself "why does the chain rule apply here?" he realized he'd been treating the chain rule as a magical procedure rather than as a consequence of how derivatives work for composite functions.

He stopped and spent two hours specifically on calculus — not neural network calculus, just the chain rule, built up from first principles. He worked examples by hand. He asked: "Why does differentiating a composite function require multiplying derivatives? What does it mean to measure how much a change in x affects the output through an intermediate variable?"

By the end of those two hours, the chain rule wasn't a procedure — it was a logical consequence of how rates of change compose. He could see why it was the right tool for backpropagation.

He went back to his backpropagation walkthrough.


Building the Analogy Framework

As he worked through the mechanism, David found that elaboration through analogy was particularly powerful. Some of the analogies he developed:

Backpropagation as credit assignment:

"Each weight in a neural network is like an employee in a company. The company (the network) made a bad prediction (lost money). Who's responsible? Backpropagation figures out each employee's 'blame' — how much did this weight's value contribute to the overall error? Then it adjusts each weight proportionally to how much blame it deserves.

The chain rule is just the mechanism for computing that blame correctly when employees work in layers — my action affected your action affected the output, so I share proportional responsibility."

This framing — neural networks as organizations doing performance review — gave David a mental model he could think with, not just execute.

Gradient descent as guided mountainside descent:

David had already had the hiking breakthrough. But he extended it when he started working on more advanced topics like learning rates:

"A high learning rate is like taking huge steps downhill. You get places fast, but you might overshoot the valley and end up on the other side. A low learning rate is like tiny steps — you won't overshoot, but you'll take forever. The ideal is to start big and get smaller as you get closer to the bottom — which is exactly what learning rate schedules do."

He hadn't needed to be told about learning rate schedules. His analogy generated the concept before he encountered it formally.

Overfitting as memorization:

"An overfit model is like a student who memorizes every practice exam question verbatim but can't solve any novel problem. The student knows the training data perfectly — 100% accuracy on exercises they've seen before. But they've memorized the answers rather than learned the principles, so they fail on anything new."

This was, he later noted with amusement, an elaboration about learning science applied to machine learning. The domains had converged.


The Test: What Changes When You Have Mechanism

Three months after making the elaboration shift, David ran into a practical problem: a neural network he'd built for a project had validation accuracy that was much lower than training accuracy — classic overfitting. But it wasn't responding to the usual remedies.

Six months earlier, he would have googled "fix overfitting neural network" and tried each suggestion from a list, confused about why any of them would work or which to try first.

Instead, David reasoned from mechanism:

"My model has too much capacity for the amount of training data I have. It's learning noise patterns that are specific to the training set. The solutions should work by either reducing model capacity (fewer parameters, simpler architecture), adding noise during training (dropout — randomly turn off neurons so the model can't rely on any single neuron), or giving it more training data.

I don't have more data. Let me try dropout first because it directly addresses the capacity problem without requiring me to redesign the architecture."

He implemented dropout. It worked.

What had changed: he wasn't following a list of remedies. He was reasoning from a model of why the problem existed to a solution that addressed the underlying cause. This is the difference between technical competence and genuine expertise.


David's Reflection

In a message to a colleague who'd asked how he'd gotten better at ML so quickly, David wrote:

"For the first few months, I learned by doing the tutorials and trying to memorize the outputs. I knew what gradient descent produced — I didn't know what gradient descent was. I could run backpropagation — I didn't know what it was computing or why.

"The shift was when I started asking 'why' for every step and not being satisfied with 'that's just how it works.' Sometimes I had to go three or four levels deep. Why does this formula calculate what it says it calculates? Why is that the right thing to calculate? Why does calculating it in this order rather than that order matter?

"At each 'why,' I tried to connect the answer to something I already understood. Some of those connections were analogies from software engineering. Some were analogies from everyday experience. Some just came from thinking hard about what 'rate of change' actually means as a concept rather than as a formula.

"What I noticed is that once I understood a mechanism, I stopped forgetting it. The steps are things I have to look up. The mechanisms stay."

That last observation — the steps are things I have to look up; the mechanisms stay — captures the practical dividend of elaboration.


Principles Illustrated by This Case Study

  • Tutorial hell is often a symptom of shallow processing. Consuming content without elaboration produces familiarity, not understanding or transfer.
  • Asking "why?" at every step is uncomfortable and slow — and produces durable understanding. The discomfort is the depth of processing.
  • Analogies from familiar domains scaffold new learning. Backpropagation as credit assignment, gradient descent as hillside navigation — these analogies gave David cognitive handholds in unfamiliar conceptual terrain.
  • Mechanism-level understanding enables troubleshooting and adaptation. Someone who understands why an algorithm works can reason about why it might fail. Someone who only knows the steps can only follow them.
  • Elaboration produces durable retention for concepts; procedural steps can remain reference-dependent. This is not a failure — it's efficient. Know the why deeply; look up the how when needed.