Appendix I: Bibliography
References are organized by topic and chapter relevance, then alphabetically within each section. Academic papers follow APA 7th edition format. Books follow Chicago author-date format. Web sources include access date where content may change.
Part 1: Foundational AI and LLM Technology
Seminal Papers
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR) 2015. https://arxiv.org/abs/1409.0473
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://arxiv.org/abs/1810.04805
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851. https://arxiv.org/abs/2006.11239
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners [Technical report]. OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://arxiv.org/abs/2201.11903
Technical Reports and Model Documentation
Anthropic. (2024). Claude 3 model card. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
Anthropic. (2024). Claude's constitution. https://www.anthropic.com/index/claudes-constitution
Google DeepMind. (2023). Gemini: A family of highly capable multimodal models. https://arxiv.org/abs/2312.11805
OpenAI. (2023). GPT-4 technical report. https://arxiv.org/abs/2303.08774
OpenAI. (2022). Introducing ChatGPT. https://openai.com/blog/chatgpt
Part 2: Prompting Research and Techniques
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199–22213. https://arxiv.org/abs/2205.11916
Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Proceedings of EMNLP 2021, 3045–3059. https://arxiv.org/abs/2104.08691
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://arxiv.org/abs/2307.03172
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://arxiv.org/abs/2203.02155
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://arxiv.org/abs/1707.06347
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations 2023. https://arxiv.org/abs/2203.11171
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. Transactions on Machine Learning Research. https://arxiv.org/abs/2206.07682
Part 3: Hallucination, Accuracy, and Trust
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730
Kryscinski, W., McCann, B., Xiong, C., & Socher, R. (2020). Evaluating the factual consistency of abstractive text summarization. Proceedings of EMNLP 2020, 9332–9346. https://arxiv.org/abs/1910.12840
Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of ACL 2020, 1906–1919. https://arxiv.org/abs/2005.00661
Metz, C. (2023, February 3). What makes A.I. chatbots go wrong? The New York Times. https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-intelligence.html
Ren, S., Sun, Z., Zhao, Q., Zhang, S., & Liu, Y. (2023). Do large language models know what they don't know? Findings of EMNLP 2023. https://arxiv.org/abs/2305.14658
Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. arXiv. https://arxiv.org/abs/2401.11817
Part 4: AI Productivity and Workplace Research
Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Generative AI at work (NBER Working Paper No. 31161). National Bureau of Economic Research. https://www.nber.org/papers/w31161
Dell'Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological frontier: Field experimental evidence on the effects of AI on knowledge worker productivity and quality (Harvard Business School Working Paper No. 24-013). Harvard Business School. https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative AI. Science, 381(6654), 187–192. https://doi.org/10.1126/science.adh2586
Peng, S., Kalliamvakou, E., Croft, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv. https://arxiv.org/abs/2302.06590
Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., & Aftandilian, E. (2022). Productivity assessment of neural code completion. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, 21–29. https://arxiv.org/abs/2205.06537
Part 5: AI Bias and Fairness Research
Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 298–306. https://arxiv.org/abs/2101.05783
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of FAccT 2021, 610–623. https://doi.org/10.1145/3442188.3445922
Blodgett, S. L., Barocas, S., Daumé, H., III, & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. Proceedings of ACL 2020, 5454–5476. https://arxiv.org/abs/2005.14050
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29. https://arxiv.org/abs/1607.06520
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 77–91. http://proceedings.mlr.press/v81/buolamwini18a.html
Part 6: Cognitive Science and Human-AI Interaction
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886
Paul, A. M. (2021). The extended mind: The power of thinking outside the brain. Houghton Mifflin Harcourt.
Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: Cognitive consequences of having information at our fingertips. Science, 333(6043), 776–778. https://doi.org/10.1126/science.1207745
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4
Part 7: AI Ethics, Safety, and Governance
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. https://arxiv.org/abs/1606.06565
Bommasani, R., Hudson, D. A., Aditi, E., Altman, R., Arora, S., Bernstein, S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., Donahue, C., … Liang, P. (2021). On the opportunities and risks of foundation models. arXiv. https://arxiv.org/abs/2108.07258
European Parliament. (2024). Artificial Intelligence Act (Regulation (EU) 2024/1689). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689
Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437. https://doi.org/10.1007/s11023-020-09539-2
National Institute of Standards and Technology. (2023). Artificial intelligence risk management framework (AI RMF 1.0). NIST. https://doi.org/10.6028/NIST.AI.100-1
Russell, S. (2019). Human compatible: Artificial intelligence and the problem of control. Viking.
Part 8: Organizational AI Adoption and Strategy
Acemoglu, D., & Restrepo, P. (2022). Tasks, automation, and the rise in U.S. wage inequality. Econometrica, 90(5), 1973–2016. https://doi.org/10.3982/ECTA19815
Davenport, T. H., & Mittal, N. (2022). All in on AI: How smart companies win big with artificial intelligence. Harvard Business Review Press.
McKinsey Global Institute. (2021). The future of work after COVID-19. McKinsey & Company. https://www.mckinsey.com/featured-insights/future-of-work/the-future-of-work-after-covid-19
McKinsey Global Institute. (2023). The economic potential of generative AI: The next productivity frontier. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
MIT Sloan Management Review & Boston Consulting Group. (2023). The AI-powered organization: Lessons from the frontier. MIT Sloan Management Review.
Part 9: Recommended Books for Practitioners
Christian, B. (2020). The alignment problem: Machine learning and human values. W. W. Norton.
Daugherty, P. R., & Wilson, H. J. (2018). Human + machine: Reimagining work in the age of AI. Harvard Business Review Press.
Fry, H. (2018). Hello world: Being human in the age of algorithms. W. W. Norton.
Karpathy, A. (n.d.). Neural networks: Zero to hero [Video course]. https://karpathy.ai/zero-to-hero.html
Lee, K.-F. (2018). AI superpowers: China, Silicon Valley, and the new world order. Houghton Mifflin Harcourt.
Marcus, G., & Davis, E. (2019). Rebooting AI: Building artificial intelligence we can trust. Pantheon.
Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Farrar, Straus and Giroux.
Mollick, E. (2024). Co-intelligence: Living and working with AI. Portfolio/Penguin.
Murphy, K. P. (2022). Probabilistic machine learning: An introduction. MIT Press.
Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Basic Books.
Raschka, S., Liu, Y. H., & Mirjalili, V. (2022). Machine learning with PyTorch and Scikit-Learn. Packt Publishing.
Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.). Pearson.
Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf.
Part 10: Online Resources, Courses, and Documentation
Anthropic. (2024). Anthropic documentation. https://docs.anthropic.com
Anthropic. (2024). Claude prompt engineering guide. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
DeepLearning.AI. (2023). ChatGPT prompt engineering for developers [Course]. https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/
Google. (2024). Introduction to generative AI [Course]. Google Cloud Skills Boost. https://www.cloudskillsboost.google/paths/118
Hugging Face. (2024). NLP course. https://huggingface.co/learn/nlp-course
Karpathy, A. (2023). Let's build GPT: From scratch, in code, spelled out [Video]. YouTube. https://www.youtube.com/watch?v=kCc8FmEb1nY
Liang, P. et al. (2023). Holistic evaluation of language models (HELM). Stanford CRFM. https://crfm.stanford.edu/helm/
OpenAI. (2024). OpenAI documentation. https://platform.openai.com/docs
OpenAI. (2024). Prompt engineering guide. https://platform.openai.com/docs/guides/prompt-engineering
Wolfram, S. (2023). What is ChatGPT doing… and why does it work? Wolfram Media. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
Part 11: Key Articles and Reports
Cade Metz, C., & Nicas, J. (2023, March 14). How ChatGPT suddenly got much better at its job. The New York Times.
Doshi, A. R., & Hauser, O. (2023). Generative AI enhances individual creativity but reduces the collective diversity of novel content (SSRN Working Paper). https://doi.org/10.2139/ssrn.4535536
Heaven, W. D. (2023, January 27). ChatGPT is everywhere. Here's where it came from. MIT Technology Review. https://www.technologyreview.com/2023/01/27/1066538/chatgpt-is-everywhere-heres-where-it-came-from/
Lawton, G. (2023). What is prompt engineering? A detailed overview. TechTarget. https://www.techtarget.com/searchenterpriseai/definition/prompt-engineering
Mollick, E., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. SSRN. https://doi.org/10.2139/ssrn.4391243
Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red teaming language models with language models. arXiv. https://arxiv.org/abs/2202.03286
Roose, K. (2023, March 29). A.I. poses 'risk of extinction,' industry leaders warn. The New York Times. https://www.nytimes.com/2023/05/30/technology/ai-threat-warning.html
Suleyman, M., & Bhaskar, M. (2023). The coming wave: Technology, power, and the twenty-first century's greatest dilemma. Crown.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, A., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from language models. arXiv. https://arxiv.org/abs/2112.04359
Note on citations: The AI research landscape moves quickly, and preprints (arXiv papers) often appear months or years before formal journal publication. Where both a preprint and a formal publication exist, the formal publication is cited. All URLs were verified as of early 2025; link stability is not guaranteed for external sources.