Appendix D: Key Studies Referenced in This Book

This appendix provides annotated descriptions of the primary research studies most frequently cited throughout the book. For readers who want to go beyond the summary level and engage with the original science, each entry includes the full citation, a plain-language summary of the key finding, an evidence grade consistent with the system described in Appendix A, and notes on the replication status of the finding.

Studies are organized by topic. Within topics, entries are roughly chronological to show how understanding developed over time.

Memory and Forgetting

Ebbinghaus, H. (1885/1913). Memory: A contribution to experimental psychology. Teachers College, Columbia University.

What it found: Hermann Ebbinghaus, a German psychologist, studied his own memory systematically for years using lists of nonsense syllables. He documented what he called the "forgetting curve" — the exponential decay of memory over time — and also discovered the "spacing effect": material studied in distributed sessions was retained far better than material studied in a single concentrated session. This is, by modern standards, single-subject research with serious methodological limitations. Its historical importance lies in being the first systematic empirical study of memory.

Evidence grade for the forgetting curve: [Evidence: Strong] — The general shape has been replicated continuously for 140 years. Evidence grade for the spacing effect: [Evidence: Strong] — Extensively replicated; see Cepeda et al. (2006) below. Replication status: The qualitative findings are robust; the specific quantitative parameters of Ebbinghaus's curve do not generalize well because the material (nonsense syllables) is atypically easy to forget.

Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). Academic Press.

What it found: The multi-store model of memory proposes three sequential stores: sensory memory (extremely brief, high capacity), short-term memory (limited capacity, rapid decay), and long-term memory (effectively unlimited capacity and duration). Information moves from sensory to short-term through attention, and from short-term to long-term through rehearsal.

Evidence grade: [Evidence: Moderate] — The three-store architecture captures something real, but the model is now considered a first-generation framework. Working memory (see Baddeley & Hitch below) provides a more accurate account of short-term processes, and long-term memory is known to have more structure than the model implies. Replication status: The model itself is a theoretical framework rather than a single empirical claim. Its components have been supported and refined rather than refuted.

Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 8, pp. 47–89). Academic Press.

What it found: Baddeley and Hitch proposed that what Atkinson and Shiffrin called "short-term memory" is actually a multi-component working memory system with a central executive (attentional control), a phonological loop (verbal and acoustic information), and a visuo-spatial sketchpad (visual and spatial information). A fourth component, the episodic buffer, was added by Baddeley in 2000.

Evidence grade: [Evidence: Strong] — The working memory model is one of the most tested and supported frameworks in cognitive psychology. The components have been dissociated by neuroimaging and neuropsychological studies with brain-damaged patients. Replication status: The basic architecture is robust. Debates continue about exact components and mechanisms, but the multi-component framework is well-established.

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). MIT Press.

What it found: Bjork introduced the distinction between storage strength (how well-consolidated a memory is in long-term storage) and retrieval strength (how easily a memory can currently be accessed). A key insight is that high retrieval strength — when material is fresh and easy to recall — does not indicate high storage strength. Conversely, failing to recall something doesn't mean it's gone; the retrieval pathway may be weak while storage strength remains intact. This framework explains why forgetting followed by successful retrieval (the "spacing effect" mechanism) produces stronger learning than continuous review.

Evidence grade: [Evidence: Strong] — The storage/retrieval distinction provides the theoretical foundation for decades of spacing and retrieval practice research and is well supported by the empirical literature. Replication status: The framework is a theoretical model with strong empirical backing; not a single replicable experiment but a well-supported interpretive framework.

Retrieval Practice

Roediger, H. L., & Karpicke, J. D. (2006a). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255.

What it found: In a landmark study, students who studied a passage once and then took multiple retrieval practice tests retained 61% of the material a week later, compared to 40% for students who studied the passage four times. The testing condition produced significantly better long-term retention despite providing less time with the material itself.

Evidence grade: [Evidence: Strong] Replication status: Widely replicated across labs, subjects, formats, and age groups. The testing effect is one of the most robust findings in cognitive psychology.

Roediger, H. L., & Karpicke, J. D. (2006b). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1(3), 181–210.

What it found: A review article that synthesized decades of research on the testing effect, argued for its educational implications, and helped catalyze the wave of subsequent research on retrieval practice. Essential context for understanding how the testing effect research developed.

Evidence grade: [Evidence: Strong] (as a review of the evidence base) Replication status: Review article; the findings reviewed have replicated strongly.

Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772–775.

What it found: Students who practiced retrieving information performed significantly better on a final test one week later (T = 67%) than students who spent the same time creating detailed concept maps of the material (T = 45%). This was a notable finding because concept mapping is an active, constructive activity — more effortful than simple rereading — yet retrieval practice outperformed it.

Evidence grade: [Evidence: Strong] Replication status: The general finding that retrieval practice outperforms more passive review strategies has replicated consistently. Some follow-up research suggests concept mapping has benefits in specific contexts; the relative advantage of retrieval practice is robust.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students' learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58.

What it found: Dunlosky and colleagues systematically evaluated ten commonly used study techniques against ten criteria, rating each as high, moderate, or low utility. Only two received "high utility" ratings: practice testing (retrieval practice) and distributed practice (spaced repetition). Techniques widely used by students — highlighting, rereading, summarizing, and keyword mnemonics — received low utility ratings. This paper is the closest thing learning science has to a consumer report.

Evidence grade: [Evidence: Strong] (as a systematic review) Replication status: Systematic review methodology; the underlying findings are themselves the subject of ongoing research, with the core conclusions holding well.

Spaced Repetition

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380.

What it found: A meta-analysis of 254 studies involving more than 14,000 participants found that distributed (spaced) practice produced better retention than massed (concentrated) practice in nearly all conditions. The analysis also found that the optimal spacing interval is not fixed but depends on the retention interval — how long you need to remember the material. As the target retention interval increases, longer spacing gaps become more effective.

Evidence grade: [Evidence: Strong] Replication status: The spacing effect itself is among the most replicated findings in all of cognitive psychology. The specific optimal interval parameters are harder to generalize and remain an active area of research.

Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study. Psychological Science, 18(3), 218–224.

What it found: Students overwhelmingly preferred to study in massed blocks (all examples of one category, then all of another) rather than interleaved, and believed blocked practice was more effective. In fact, interleaved practice produced dramatically better discrimination and long-term retention. This study demonstrated not only the interleaving advantage but the "interleaving illusion" — the systematic metacognitive error of believing blocked practice is more effective.

Evidence grade: [Evidence: Moderate] (for interleaving specifically; the metacognitive finding is strong) Replication status: The metacognitive finding (feeling that blocking is better while interleaving actually performs better) has replicated well. The magnitude of the interleaving advantage varies across task types.

Interleaving

Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35(6), 481–498.

What it found: College students who practiced mathematics problems in interleaved (shuffled) order — multiple problem types mixed together — performed better on a test one week later than students who practiced in blocked order (all problems of one type, then another), despite having equivalent practice time.

Evidence grade: [Evidence: Moderate] Replication status: The interleaving advantage in mathematics has been replicated in several studies. Effect sizes vary; the conditions under which interleaving is most beneficial are still being characterized.

Taylor, K., & Rohrer, D. (2010). The effects of interleaved practice. Applied Cognitive Psychology, 24(6), 837–848.

What it found: A replication and extension of the interleaving advantage in a fourth-grade classroom using mathematics problems. Interleaved practice produced substantially better performance one day after the practice session, even in young children. This study was important in extending findings from laboratory and college contexts to real classrooms and younger learners.

Evidence grade: [Evidence: Moderate] Replication status: Well-replicated for mathematics. Generalization to other domains is less fully established.

Elaboration and Dual Coding

Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684.

What it found: Proposed that memory strength depends on the "depth" of processing during encoding — where depth refers to the degree of semantic (meaning-based) elaboration. Material processed for meaning (e.g., "Does this word fit in the sentence ___?") is remembered far better than material processed for surface features (e.g., "Does this word contain the letter E?"). This framework replaced earlier theories that proposed memory strength was primarily a function of rehearsal quantity.

Evidence grade: [Evidence: Strong] (for the basic depth-of-processing phenomenon) Replication status: The basic finding is robust. The theoretical framework has been refined; "depth" is now understood as a metaphor for elaborative, self-referential, or distinctive processing rather than a literal neural mechanism.

Paivio, A. (1971). Imagery and verbal processes. Holt, Rinehart and Winston.

What it found: Paivio's dual coding theory proposes that humans process and store verbal and visual/imagistic information in two separate but interconnected systems. Material that engages both systems simultaneously — such as concrete words that readily evoke mental images, or information presented both verbally and visually — is encoded more richly and is easier to retrieve.

Evidence grade: [Evidence: Strong] (for the dual coding phenomenon); [Evidence: Moderate] (for the specific architectural claims about two distinct coding systems) Replication status: The memory advantage for concrete over abstract words, and for information presented in multiple formats, is well-replicated. The exact neural and cognitive architecture is an ongoing area of investigation.

Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13(2), 145–182.

What it found: Students who spontaneously generated explanations for why each step in a worked example was taken — self-explaining — learned significantly more than students who merely read the examples. The self-explainers identified gaps in their own understanding and generated inferences that filled those gaps. This is the original documentation of the "self-explanation effect."

Evidence grade: [Evidence: Strong] Replication status: The self-explanation effect has been replicated across domains, age groups, and formats.

Desirable Difficulties

Bjork, R. A. (1994). See above under Memory and Forgetting.

Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). Worth Publishers.

What it found: Summarizes the "desirable difficulties" framework: the principle that conditions that appear to impair performance during learning often produce better long-term retention and transfer. Desirable difficulties include spacing, interleaving, retrieval practice, generation, and varying conditions of practice. The paper is a readable introduction to the framework for a general audience.

Evidence grade: [Evidence: Strong] (for the framework as a synthesis of established findings)

Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4(6), 592–604.

What it found: Participants who generated target words from partial cues (e.g., completing "hot–c___" to generate "cold") remembered them significantly better than participants who simply read the word pairs. The act of generation — even when the generated answer is predictable — produces better retention than passive reading.

Evidence grade: [Evidence: Strong] Replication status: The generation effect has been replicated extensively.

Note-Taking

Mueller, P. A., & Oppenheimer, D. M. (2014). The pen is mightier than the keyboard: Advantages of longhand over laptop note taking. Psychological Science, 25(6), 1159–1168.

What it found: Three experiments found that students who took notes by hand (longhand) outperformed students who typed notes on laptops on tests of conceptual understanding (but not simple factual recall). The proposed mechanism is that typing allows verbatim transcription, which is cognitively shallow, while handwriting forces paraphrasing, which requires deeper processing.

Evidence grade: [Evidence: Moderate] Replication status: A 2019 replication by Morehead, Dunlosky, and Rawson found weaker and less consistent effects than the original study. The replication used more rigorous methods and found the handwriting advantage was not consistent. The finding is best treated as a moderate, context-dependent effect rather than a reliable prescription.

Expertise and Deliberate Practice

Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4(1), 55–81.

What it found: Chess experts could reconstruct the positions of pieces in a briefly viewed chess game far better than novices — but only when the positions came from real games. When pieces were placed randomly, experts showed no advantage over novices. This demonstrated that expertise relies on recognition of meaningful patterns (chunks) stored in long-term memory, not superior general memory.

Evidence grade: [Evidence: Strong] Replication status: The chunking finding and the domain-specificity of expert memory advantages have replicated across chess and other domains.

Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406.

What it found: A landmark study of musicians at the Berlin Academy found that accumulated hours of "deliberate practice" — focused, effortful practice designed to improve specific weaknesses, ideally with feedback from a teacher — was the strongest predictor of performance level, more than years of experience, age of first study, or hours of playful informal practice. Expert-level musicians had accumulated approximately 10,000 hours of deliberate practice by their mid-twenties; good amateurs had accumulated approximately 2,000 hours.

Evidence grade: [Evidence: Strong] (for the importance of deliberate practice as a construct); [Evidence: Moderate] (for the specific 10,000-hour figure and its generalizability) Replication status: The importance of deliberate practice is well-replicated across domains. The 10,000-hour number is often misunderstood — it is an observed average, not a threshold or guarantee, and it refers specifically to deliberate practice, not any practice.

Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5(2), 121–152.

What it found: Experts categorize physics problems by deep structural features (the underlying principle involved), while novices categorize them by surface features (the objects and terminology mentioned). This work established that expertise involves fundamentally different knowledge organization, not just more knowledge.

Evidence grade: [Evidence: Strong] Replication status: Expert-novice differences in knowledge organization have been replicated across medicine, mathematics, chess, and other domains.

Motivation and Mindset

Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Plenum Press.

Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268.

What it found: Self-determination theory proposes that human motivation is supported by three basic psychological needs: autonomy (feeling that one's actions are self-chosen), competence (feeling effective and capable), and relatedness (feeling connected to others). Satisfaction of these needs supports intrinsic motivation, well-being, and sustained effort. Conditions that undermine autonomy — especially controlling or contingent external rewards — tend to undermine intrinsic motivation.

Evidence grade: [Evidence: Strong] (for the basic need structure and its effects on motivation) Replication status: Self-determination theory is one of the most extensively tested frameworks in motivational psychology, with thousands of studies across cultures and contexts.

Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41(10), 1040–1048.

Dweck, C. S. (2006). Mindset: The new psychology of success. Random House.

What it found: Dweck's research distinguishes between fixed mindsets (belief that abilities are innate and unchangeable) and growth mindsets (belief that abilities can be developed through effort and strategy). Children with fixed mindsets tend to avoid challenges (where failure would threaten their identity), attribute failure to lack of ability, and show deteriorating performance under difficulty. Children with growth mindsets tend to embrace challenges, attribute failure to effort and strategy, and show resilient or improving performance.

Evidence grade: [Evidence: Strong] (for the descriptive theory of how mindsets affect learning behavior); [Evidence: Contested] (for large-scale mindset intervention programs producing reliable academic outcomes) Replication status: The core descriptive findings are robust. Large-scale attempts to improve academic outcomes through mindset interventions have produced inconsistent results. The theory is valid; the assumption that brief interventions reliably shift academic outcomes at scale has not been reliably confirmed.

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

What it found: Self-efficacy — the belief in one's capacity to execute behaviors necessary to produce specific outcomes — powerfully predicts whether people attempt challenging tasks, how much effort they invest, and how long they persist in the face of difficulty. Self-efficacy is domain-specific (high self-efficacy in mathematics does not imply high self-efficacy in writing), and is built through four sources: mastery experiences (successfully doing the task), vicarious experiences (seeing similar others succeed), social persuasion (credible encouragement), and physiological states (interpreting arousal as excitement rather than anxiety).

Evidence grade: [Evidence: Strong] Replication status: Self-efficacy's role in motivation and performance is among the most replicated findings in psychology. It predicts academic achievement across subjects, domains, cultures, and age groups.