Chapter 40: Further Reading

Test-Time Compute and Inference Scaling

  • Snell, C., Lee, J., Xu, K., & Kumar, A. (2024). "Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters." arXiv preprint arXiv:2408.03314. A rigorous analysis demonstrating that allocating compute at inference can be more cost-effective than training larger models.

  • Wei, J., Wang, X., Schuurmans, D., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." NeurIPS 2022. The foundational paper on chain-of-thought prompting, showing that intermediate reasoning steps improve performance on multi-step problems.

  • Brown, B., Juravsky, J., Ehrlich, R., et al. (2024). "Large Language Monkeys: Scaling Inference Compute with Repeated Sampling." arXiv preprint arXiv:2407.21787. Demonstrates that repeated sampling with verification can dramatically improve performance on coding and math benchmarks.

World Models

  • Ha, D. & Schmidhuber, J. (2018). "World Models." arXiv preprint arXiv:1803.10122. A seminal paper introducing the modern concept of learned world models for reinforcement learning agents.

  • Bruce, J., Dennis, M., Edwards, A., et al. (2024). "Genie: Generative Interactive Environments." arXiv preprint arXiv:2402.15391. Demonstrates learning interactive world models from unlabeled video data.

  • NVIDIA. (2024). "Cosmos: World Foundation Models." Technical Report. NVIDIA's approach to building world foundation models for physical AI and robotics.

Neurosymbolic AI

  • Garcez, A. d'A. & Lamb, L. C. (2023). "Neurosymbolic AI: The 3rd Wave." Artificial Intelligence Review, 56, 12387--12406. A comprehensive survey of neurosymbolic integration approaches and their historical context.

  • Li, Z., Huang, J., & Naik, M. (2023). "Scallop: A Language for Neurosymbolic Programming." PLDI 2023. Introduces Scallop, a probabilistic programming language for neurosymbolic applications.

  • Gao, L., Madaan, A., Zhou, S., et al. (2023). "PAL: Program-Aided Language Models." ICML 2023. Demonstrates LLMs generating code to offload computation to symbolic executors.

Continual Learning

  • Kirkpatrick, J., Pascanu, R., Rabinowitz, N., et al. (2017). "Overcoming Catastrophic Forgetting in Neural Networks." PNAS, 114(13), 3521--3526. The original EWC paper, introducing Fisher information regularization for continual learning.

  • De Lange, M., Aljundi, R., Masana, M., et al. (2022). "A Continual Learning Survey: Defying Forgetting in Classification Tasks." IEEE TPAMI, 44(7), 3366--3385. A comprehensive survey covering regularization, replay, and architecture-based continual learning methods.

  • Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). "A Comprehensive Survey of Continual Learning: Theory, Method and Application." IEEE TPAMI. An updated survey covering continual learning in the era of foundation models.

AI for Science

  • Jumper, J., Evans, R., Pritzel, A., et al. (2021). "Highly Accurate Protein Structure Prediction with AlphaFold." Nature, 596, 583--589. The paper that solved the protein folding problem using deep learning.

  • Merchant, A., Batzner, S., Schoenholz, S. S., et al. (2023). "Scaling Deep Learning for Materials Discovery." Nature, 624, 80--85. GNoME's discovery of millions of new stable materials using graph neural networks.

  • Lu, C., Lu, C., Lange, R. T., et al. (2024). "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery." arXiv preprint arXiv:2408.06292. An early prototype of an autonomous AI system for scientific research.

Autonomous AI Systems

  • Yao, S., Zhao, J., Yu, D., et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. A foundational framework for language model agents that interleave reasoning and tool use.

  • Jimenez, C. E., Yang, J., Wettig, A., et al. (2024). "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" ICLR 2024. A benchmark for evaluating coding agents on real software engineering tasks.

  • Zhou, S., Xu, F. F., Zhu, H., et al. (2024). "WebArena: A Realistic Web Environment for Building Autonomous Agents." ICLR 2024. A benchmark for evaluating web-browsing AI agents.

The Path to AGI

  • Morris, M. R., Sohl-Dickstein, J., Fiedel, N., et al. (2024). "Levels of AGI: Operationalizing Progress on the Path to AGI." arXiv preprint arXiv:2311.02462. A framework for measuring AGI progress using a graduated capability scale.

  • Sutskever, I. (2024). "An Observation on Generalization." Various talks and interviews on the scaling hypothesis and its limits.

  • Chollet, F. (2019). "On the Measure of Intelligence." arXiv preprint arXiv:1911.01547. A thoughtful analysis of what intelligence measurement should capture, beyond narrow benchmarks.

Quantum Machine Learning

  • Cerezo, M., Arrasmith, A., Babbush, R., et al. (2021). "Variational Quantum Algorithms." Nature Reviews Physics, 3, 625--644. A comprehensive review of variational quantum algorithms including quantum ML approaches.

  • Schuld, M. & Petruccione, F. (2021). Machine Learning with Quantum Computers, 2nd ed. Springer. A textbook covering quantum ML from foundations to applications.

  • McClean, J. R., Boixo, S., Smelyanskiy, V. N., et al. (2018). "Barren Plateaus in Quantum Neural Network Training Landscapes." Nature Communications, 9, 4812. The paper identifying the barren plateau problem in parameterized quantum circuits.

Career Development

  • Ng, A. (2024). "How to Build Your Career in AI." DeepLearning.AI Reading List. Practical advice on building an AI career from one of the field's most influential educators.

  • Huyen, C. (2022). Designing Machine Learning Systems. O'Reilly Media. An excellent guide to the practical challenges of building production ML systems.

AI Governance and Ethics

  • European Union. (2024). "EU Artificial Intelligence Act." Official regulation text. The first comprehensive AI regulation, establishing risk-based requirements for AI systems.

  • Weidinger, L., Mellor, J., Rauh, M., et al. (2021). "Ethical and Social Risks of Harm from Language Models." arXiv preprint arXiv:2112.04359. A taxonomy of risks from large language models with mitigation strategies.

  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" FAccT 2021. An influential paper on the environmental and social costs of large language models.