Appendix B: Key Studies Summary
A quick-reference guide to the research behind the book. For each study: what was found, why it matters, and how well it's held up.
This appendix summarizes the most important studies referenced throughout the book. Each entry includes the researchers, what they found, why it matters for your learning, and the current replication status — because science is a conversation, not a verdict.
How to read the replication status ratings:
- Well-replicated: Reproduced many times across labs, populations, and materials. High confidence.
- Replicated with caveats: Core finding holds, but the effect is smaller or more context-dependent than originally reported.
- Mixed/Debated: Some replications succeed, others fail. Active scientific discussion.
- Largely debunked: Failed to replicate in well-powered studies. Original claims are not supported.
Memory and Forgetting
Ebbinghaus — The Forgetting Curve (1885)
Researcher: Hermann Ebbinghaus What he did: Memorized lists of nonsense syllables and tested himself at various intervals. What he found: Forgetting follows a predictable curve — steep at first (about 50% lost within an hour), then gradually leveling off. But each time material is relearned, the curve flattens and retention improves. Why it matters: The forgetting curve is the foundational insight behind spaced repetition. You will forget — the question is whether you plan for it or let it happen by surprise. Replication status: Well-replicated. The specific shape of the curve varies with material type and encoding quality, but the basic pattern is one of the most robust findings in all of psychology. Chapter reference: Ch 3
Atkinson & Shiffrin — Multi-Store Model of Memory (1968)
Researchers: Richard Atkinson, Richard Shiffrin What they proposed: Memory consists of three stores — sensory memory (brief), short-term/working memory (limited capacity, ~20 seconds), and long-term memory (essentially unlimited capacity). Why it matters: Provides the basic architecture for understanding why you forget (information never made it past working memory) and how to ensure it does (encoding strategies, rehearsal, elaboration). Replication status: Well-replicated as a useful model, though modern researchers recognize memory is more complex than three neat boxes. Working memory models (Baddeley & Hitch) have refined the short-term component significantly. Chapter reference: Ch 2
Craik & Lockhart — Levels of Processing (1972)
Researchers: Fergus Craik, Robert Lockhart What they found: Information processed at a "deeper" level (thinking about meaning, connecting to prior knowledge) is remembered better than information processed "shallowly" (focusing on surface features like font or sound). Why it matters: Explains why rereading and highlighting are weak strategies — they encourage shallow processing. Elaboration, self-explanation, and teaching force deep processing. Replication status: Well-replicated. The core finding is solid. Debate continues about how to define and measure "depth" precisely. Chapter reference: Ch 12
Loftus & Palmer — Memory Reconstruction (1974)
Researchers: Elizabeth Loftus, John Palmer What they found: The wording of a question about a car accident changed what people "remembered" seeing. Participants asked how fast cars were going when they "smashed" into each other gave higher speed estimates and were more likely to report (incorrectly) seeing broken glass than those asked about cars "hitting" each other. Why it matters: Memory is not a video recording. It is reconstructed each time you recall it, and the act of recall can alter the memory itself. This is why retrieval practice works (reconstruction strengthens the memory trace) and why re-studying feels deceptively effective (recognition is not the same as reconstruction). Replication status: Well-replicated across many variations. Chapter reference: Ch 2
Retrieval Practice and the Testing Effect
Roediger & Karpicke — The Testing Effect (2006)
Researchers: Henry Roediger III, Jeffrey Karpicke What they found: Students who read a passage and then took a practice test retained significantly more one week later than students who re-read the passage multiple times — even though the re-readers felt more confident about their learning immediately after studying. Why it matters: This is one of the most important findings in the book. Testing isn't just a way to measure learning; it is learning. The act of retrieving information from memory strengthens the memory trace far more than passively reviewing it. And — critically — the re-readers' false confidence illustrates why metacognitive monitoring is so important. Replication status: Well-replicated. Hundreds of studies across ages, subjects, and materials. Effect sizes typically d = 0.5–0.7. One of the most robust findings in cognitive psychology. Chapter references: Ch 7, 16
Karpicke & Blunt — Retrieval Practice vs. Concept Mapping (2011)
Researchers: Jeffrey Karpicke, Janell Blunt What they found: Retrieval practice produced better long-term retention than elaborative concept mapping, even for tasks that required drawing inferences and making connections — precisely the kind of "deep" learning you'd expect concept mapping to excel at. Why it matters: Retrieval practice isn't just good for rote memorization. It supports complex, meaningful learning too. Replication status: Well-replicated. The relative advantage of retrieval practice over other "active" strategies is consistently found, though the margin varies by task and material. Chapter reference: Ch 7
Spacing and Interleaving
Cepeda et al. — Spacing Effect Meta-Analysis (2006)
Researchers: Nicholas Cepeda, Harold Pashler, Edward Vul, John Wixted, Doug Rohrer What they found: Across 254 studies involving over 14,000 participants, distributing practice over time (spacing) consistently produced better long-term retention than concentrating practice into a single session (massing/cramming). The optimal spacing interval depends on when you need to remember: for a test one week away, spacing sessions 1–2 days apart is effective; for retention over months, longer gaps are needed. Why it matters: This meta-analysis put the spacing effect beyond reasonable doubt. It's not a laboratory curiosity — it's a universal principle of memory. Replication status: Well-replicated. One of the most robust effects in all of learning science. Chapter reference: Ch 3
Rohrer & Taylor — Interleaving in Mathematics (2007)
Researchers: Doug Rohrer, Kelli Taylor What they found: Students who practiced math problems in interleaved order (mixing different problem types) performed significantly better on a later test than students who practiced in blocked order (all problems of one type, then all of the next). Importantly, the blocked-practice group performed better during practice, creating the illusion that blocking was more effective. Why it matters: Demonstrates both the power of interleaving and the central paradox of this book: what feels effective during practice (blocking) is less effective for long-term learning. Replication status: Well-replicated across mathematics, category learning, and motor skills. Effects in other domains (e.g., text-based learning) are less consistent. Chapter references: Ch 7, 10
Learning Strategies: What Works and What Doesn't
Dunlosky et al. — Learning Strategy Effectiveness Review (2013)
Researchers: John Dunlosky, Katherine Rawson, Elizabeth Marsh, Mitchell Nathan, Daniel Willingham What they found: Rated 10 common learning strategies based on available evidence. High utility: practice testing, distributed (spaced) practice. Moderate utility: elaborative interrogation, self-explanation, interleaved practice. Low utility: summarization, highlighting/underlining, keyword mnemonic, imagery for text, rereading. Why it matters: This is the most comprehensive evidence-based ranking of study strategies ever published. It confirmed that the strategies most students rely on (highlighting, rereading) are among the least effective, while the strategies most students avoid (self-testing, spacing) are the most effective. Replication status: Well-replicated. The rankings have been broadly supported by subsequent research. Some researchers argue the "low utility" ratings for summarization and imagery are too harsh for skilled users, but the overall pattern is robust. Chapter references: Ch 7, 8
Pashler et al. — Learning Styles Debunked (2008)
Researchers: Harold Pashler, Mark McDaniel, Doug Rohrer, Robert Bjork What they found: After reviewing the entire learning styles literature, they concluded that while people have preferences for how information is presented, there is virtually no credible evidence that matching instruction to a student's preferred "style" (visual, auditory, kinesthetic, etc.) improves learning outcomes. Why it matters: Learning styles is perhaps the most widely believed myth in education. Teachers, students, and parents invest real time and money trying to match learning to supposed styles, when the evidence says it doesn't help. What does matter is matching the content to the best modality (anatomy should be visual; pronunciation should be auditory — regardless of the learner's "style"). Replication status: Well-replicated/Largely debunked. Multiple subsequent reviews and studies have confirmed the absence of evidence for the "meshing hypothesis." Despite this, belief in learning styles persists: surveys consistently show 80–90% of teachers endorse the concept. Chapter reference: Ch 8
Desirable Difficulties
Bjork — Desirable Difficulties Framework (1994)
Researcher: Robert Bjork (and subsequently Elizabeth Bjork) What they proposed: Certain difficulties during learning — spacing, interleaving, testing, varying practice conditions, reducing feedback — slow down performance during practice but enhance long-term learning and transfer. These are "desirable" because they strengthen the retrieval and storage processes that support durable learning. Why it matters: This framework unifies many of the book's key findings under a single principle. It also explains the central paradox: effective learning strategies feel harder and produce worse performance during practice, which is why students and teachers often abandon them in favor of easier (but less effective) approaches. Replication status: Well-replicated. The framework is broadly supported, though the boundary conditions (when do difficulties become undesirable?) remain an active research area. Chapter reference: Ch 10
Metacognition and Self-Regulation
Flavell — Metacognition Defined (1979)
Researcher: John Flavell What he proposed: Coined and defined "metacognition" as thinking about one's own thinking — specifically, knowledge about cognitive processes and the ability to monitor and regulate them. Why it matters: Gave learning science a name and framework for the most powerful lever in the entire book: the ability to accurately assess what you know and don't know, and to adjust your strategies accordingly. Replication status: Well-replicated as a conceptual framework. Subsequent research has refined the construct into components (metacognitive knowledge, metacognitive monitoring, metacognitive control). Chapter references: Ch 1, 2, 13
Koriat — Judgments of Learning (1997)
Researcher: Asher Koriat What he found: People's judgments of how well they've learned something (Judgments of Learning, or JOLs) are more accurate when made after a delay than immediately after studying. Immediate JOLs are heavily influenced by the fluency of processing — if the material felt easy to read, you predict you'll remember it, even if you won't. Why it matters: Explains why students systematically overestimate their knowledge right after studying (they confuse familiarity with true understanding). Delayed JOLs are a simple, powerful metacognitive strategy. Replication status: Well-replicated. The delayed-JOL effect is robust across many studies. Chapter references: Ch 13, 15
Kruger & Dunning — The Dunning-Kruger Effect (1999)
Researchers: Justin Kruger, David Dunning What they found: People with the least skill in a domain are the most likely to overestimate their competence, while experts tend to slightly underestimate theirs. The unskilled lack the very expertise needed to recognize their deficits. Why it matters: Highlights why metacognitive calibration is so difficult and so important. If you don't know much about a topic, you also don't know enough to realize how little you know. Replication status: Replicated with caveats. The basic pattern is real, but some of the original effect is due to statistical artifacts (regression to the mean). The "double curse" (unskilled and unaware) is genuine, though the effect size may be somewhat smaller than initially reported. Chapter references: Ch 1, 15
Motivation, Mindset, and Identity
Dweck — Growth Mindset (2006)
Researcher: Carol Dweck (and many collaborators over decades) What she found: Students who believe intelligence is malleable (growth mindset) respond differently to challenges and setbacks than those who believe intelligence is fixed. Growth-mindset students are more likely to persist, seek challenges, and use effective strategies. Why it matters: Mindset shapes how you interpret difficulty. If you believe struggle means you're not smart enough, you'll quit. If you believe struggle means you're learning, you'll persist. Replication status: Replicated with caveats. The basic association between mindset and academic behaviors is real. However, large-scale mindset interventions (brief exercises designed to shift students' mindsets) produce smaller effects than originally reported. The national study by Yeager et al. (2019) found a statistically significant but modest effect (d = 0.08 overall, larger for certain subgroups). Dweck and colleagues acknowledge the interventions are not "magic bullets" and work best when the school environment supports a growth-mindset culture. Chapter reference: Ch 18
Deci & Ryan — Self-Determination Theory (1985, 2000)
Researchers: Edward Deci, Richard Ryan What they proposed: Intrinsic motivation thrives when three basic psychological needs are met: autonomy (feeling in control of your choices), competence (feeling effective and capable), and relatedness (feeling connected to others). External rewards can actually undermine intrinsic motivation under certain conditions (the "overjustification effect"). Why it matters: Explains why forced studying often backfires and why giving students meaningful choices increases engagement. Also explains why strategies that build genuine competence (like retrieval practice with feedback) can increase motivation over time. Replication status: Well-replicated. The broad framework is supported across cultures and domains. Some specific predictions (e.g., the conditions under which external rewards undermine motivation) have been refined. Chapter reference: Ch 17
Expertise and Deliberate Practice
Ericsson et al. — Deliberate Practice (1993)
Researchers: K. Anders Ericsson, Ralf Krampe, Clemens Tesch-Romer What they found: Expert performance in music, chess, sports, and other domains is primarily explained by the accumulated amount of "deliberate practice" — structured, effortful practice with feedback aimed at improving specific aspects of performance. The famous "10,000-hour" estimate for reaching expert level was a rough average for violin students, not a universal rule. Why it matters: Expertise isn't magic or talent — it's the product of the right kind of practice over a sustained period. "Deliberate practice" is distinct from mere repetition: it targets weaknesses, requires concentration, involves feedback, and is inherently uncomfortable. Replication status: Replicated with caveats. Deliberate practice is clearly important, but large meta-analyses (Macnamara et al., 2014) found it explains only about 26% of variance in performance — meaning other factors (genetics, starting age, resources, coaching quality) also matter substantially. The "10,000 hours" figure was always an average, not a law, and Ericsson himself objected to how it was popularized. Chapter references: Ch 21, 25
Cognitive Load
Sweller — Cognitive Load Theory (1988)
Researcher: John Sweller What he proposed: Working memory has a limited capacity, and learning is impaired when this capacity is exceeded. He distinguished three types of cognitive load: intrinsic (inherent difficulty of the material), extraneous (unnecessary difficulty caused by poor instructional design), and germane (productive mental effort directed at building schemas). Why it matters: Provides a framework for understanding why some learning situations are overwhelming and others are productive. The goal is to minimize extraneous load, manage intrinsic load (through scaffolding and sequencing), and maximize germane load. Replication status: Well-replicated. Cognitive load theory has generated hundreds of empirical studies and is one of the most influential frameworks in instructional design. Chapter reference: Ch 5
Dual Coding
Paivio — Dual Coding Theory (1971, 1986)
Researcher: Allan Paivio What he proposed: Information is stored in memory through two complementary channels: verbal (words, language) and non-verbal (images, spatial information). When both channels are used, memory is stronger because there are two independent routes to retrieve the same information. Why it matters: Explains why combining words with visuals — drawing diagrams while studying text, creating mental images for abstract concepts — improves retention beyond either modality alone. This is not the same as "learning styles" (which claims people learn better in their preferred modality). Dual coding says everyone benefits from combining words and images. Replication status: Well-replicated. The dual coding advantage is one of the most consistently observed effects in memory research. Chapter reference: Ch 9
Transfer of Learning
Thorndike & Woodworth — Identical Elements Theory (1901)
Researchers: Edward Thorndike, Robert Woodworth What they found: Practice in one skill transfers to another skill only to the extent that the two skills share common elements. General "mental discipline" (the idea that studying Latin makes you smarter overall) is largely a myth. Why it matters: Transfer doesn't happen automatically. To learn something you can use in new contexts, you need to practice for transfer — by varying examples, comparing cases, and extracting underlying principles. Replication status: Well-replicated. Over a century of research confirms that transfer is difficult, limited, and depends on shared structural features between the learning context and the transfer context. Chapter reference: Ch 11
Summary Table
| Study / Framework | Year | Core Finding | Effect Robustness |
|---|---|---|---|
| Ebbinghaus — Forgetting Curve | 1885 | Memory decays predictably; relearning slows decay | Well-replicated |
| Atkinson & Shiffrin — Memory Stores | 1968 | Three-store model: sensory, working, long-term | Well-replicated (refined) |
| Craik & Lockhart — Levels of Processing | 1972 | Deeper processing = stronger memories | Well-replicated |
| Loftus & Palmer — Memory Reconstruction | 1974 | Memory is reconstructive, not reproductive | Well-replicated |
| Flavell — Metacognition | 1979 | Thinking about thinking is a distinct skill | Well-replicated |
| Sweller — Cognitive Load Theory | 1988 | Working memory overload impairs learning | Well-replicated |
| Ericsson — Deliberate Practice | 1993 | Structured, effortful practice drives expertise | Replicated with caveats |
| Bjork — Desirable Difficulties | 1994 | Harder practice = better long-term learning | Well-replicated |
| Koriat — Delayed JOLs | 1997 | Delayed judgments of learning are more accurate | Well-replicated |
| Kruger & Dunning — Dunning-Kruger | 1999 | Low skill predicts overconfidence | Replicated with caveats |
| Roediger & Karpicke — Testing Effect | 2006 | Retrieval > rereading for long-term retention | Well-replicated |
| Cepeda et al. — Spacing Meta-Analysis | 2006 | Spaced practice > massed practice (254 studies) | Well-replicated |
| Dweck — Growth Mindset | 2006 | Believing ability is malleable aids persistence | Replicated with caveats |
| Rohrer & Taylor — Interleaving | 2007 | Mixed practice > blocked practice for retention | Well-replicated |
| Pashler et al. — Learning Styles | 2008 | No evidence for style-matched instruction | Largely debunked |
| Dunlosky et al. — Strategy Ratings | 2013 | Ranked 10 strategies by evidence; testing and spacing top the list | Well-replicated |
For a plain-language explanation of the research methods behind these studies, see Appendix A. For full bibliographic citations, see Appendix J.