Appendix I: Bibliography

References are organized by topic and chapter relevance, then alphabetically within each section. Academic papers follow APA 7th edition format. Books follow Chicago author-date format. Web sources include access date where content may change.


Part 1: Foundational AI and LLM Technology

Seminal Papers

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR) 2015. https://arxiv.org/abs/1409.0473

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://arxiv.org/abs/1810.04805

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851. https://arxiv.org/abs/2006.11239

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners [Technical report]. OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://arxiv.org/abs/2201.11903

Technical Reports and Model Documentation

Anthropic. (2024). Claude 3 model card. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf

Anthropic. (2024). Claude's constitution. https://www.anthropic.com/index/claudes-constitution

Google DeepMind. (2023). Gemini: A family of highly capable multimodal models. https://arxiv.org/abs/2312.11805

OpenAI. (2023). GPT-4 technical report. https://arxiv.org/abs/2303.08774

OpenAI. (2022). Introducing ChatGPT. https://openai.com/blog/chatgpt


Part 2: Prompting Research and Techniques

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199–22213. https://arxiv.org/abs/2205.11916

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Proceedings of EMNLP 2021, 3045–3059. https://arxiv.org/abs/2104.08691

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://arxiv.org/abs/2307.03172

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://arxiv.org/abs/2203.02155

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://arxiv.org/abs/1707.06347

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations 2023. https://arxiv.org/abs/2203.11171

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. Transactions on Machine Learning Research. https://arxiv.org/abs/2206.07682


Part 3: Hallucination, Accuracy, and Trust

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730

Kryscinski, W., McCann, B., Xiong, C., & Socher, R. (2020). Evaluating the factual consistency of abstractive text summarization. Proceedings of EMNLP 2020, 9332–9346. https://arxiv.org/abs/1910.12840

Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of ACL 2020, 1906–1919. https://arxiv.org/abs/2005.00661

Metz, C. (2023, February 3). What makes A.I. chatbots go wrong? The New York Times. https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-intelligence.html

Ren, S., Sun, Z., Zhao, Q., Zhang, S., & Liu, Y. (2023). Do large language models know what they don't know? Findings of EMNLP 2023. https://arxiv.org/abs/2305.14658

Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. arXiv. https://arxiv.org/abs/2401.11817


Part 4: AI Productivity and Workplace Research

Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Generative AI at work (NBER Working Paper No. 31161). National Bureau of Economic Research. https://www.nber.org/papers/w31161

Dell'Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological frontier: Field experimental evidence on the effects of AI on knowledge worker productivity and quality (Harvard Business School Working Paper No. 24-013). Harvard Business School. https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf

Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative AI. Science, 381(6654), 187–192. https://doi.org/10.1126/science.adh2586

Peng, S., Kalliamvakou, E., Croft, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv. https://arxiv.org/abs/2302.06590

Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., & Aftandilian, E. (2022). Productivity assessment of neural code completion. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, 21–29. https://arxiv.org/abs/2205.06537


Part 5: AI Bias and Fairness Research

Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 298–306. https://arxiv.org/abs/2101.05783

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of FAccT 2021, 610–623. https://doi.org/10.1145/3442188.3445922

Blodgett, S. L., Barocas, S., Daumé, H., III, & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. Proceedings of ACL 2020, 5454–5476. https://arxiv.org/abs/2005.14050

Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29. https://arxiv.org/abs/1607.06520

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 77–91. http://proceedings.mlr.press/v81/buolamwini18a.html


Part 6: Cognitive Science and Human-AI Interaction

Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886

Paul, A. M. (2021). The extended mind: The power of thinking outside the brain. Houghton Mifflin Harcourt.

Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: Cognitive consequences of having information at our fingertips. Science, 333(6043), 776–778. https://doi.org/10.1126/science.1207745

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4


Part 7: AI Ethics, Safety, and Governance

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. https://arxiv.org/abs/1606.06565

Bommasani, R., Hudson, D. A., Aditi, E., Altman, R., Arora, S., Bernstein, S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., Donahue, C., … Liang, P. (2021). On the opportunities and risks of foundation models. arXiv. https://arxiv.org/abs/2108.07258

European Parliament. (2024). Artificial Intelligence Act (Regulation (EU) 2024/1689). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437. https://doi.org/10.1007/s11023-020-09539-2

National Institute of Standards and Technology. (2023). Artificial intelligence risk management framework (AI RMF 1.0). NIST. https://doi.org/10.6028/NIST.AI.100-1

Russell, S. (2019). Human compatible: Artificial intelligence and the problem of control. Viking.


Part 8: Organizational AI Adoption and Strategy

Acemoglu, D., & Restrepo, P. (2022). Tasks, automation, and the rise in U.S. wage inequality. Econometrica, 90(5), 1973–2016. https://doi.org/10.3982/ECTA19815

Davenport, T. H., & Mittal, N. (2022). All in on AI: How smart companies win big with artificial intelligence. Harvard Business Review Press.

McKinsey Global Institute. (2021). The future of work after COVID-19. McKinsey & Company. https://www.mckinsey.com/featured-insights/future-of-work/the-future-of-work-after-covid-19

McKinsey Global Institute. (2023). The economic potential of generative AI: The next productivity frontier. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier

MIT Sloan Management Review & Boston Consulting Group. (2023). The AI-powered organization: Lessons from the frontier. MIT Sloan Management Review.


Christian, B. (2020). The alignment problem: Machine learning and human values. W. W. Norton.

Daugherty, P. R., & Wilson, H. J. (2018). Human + machine: Reimagining work in the age of AI. Harvard Business Review Press.

Fry, H. (2018). Hello world: Being human in the age of algorithms. W. W. Norton.

Karpathy, A. (n.d.). Neural networks: Zero to hero [Video course]. https://karpathy.ai/zero-to-hero.html

Lee, K.-F. (2018). AI superpowers: China, Silicon Valley, and the new world order. Houghton Mifflin Harcourt.

Marcus, G., & Davis, E. (2019). Rebooting AI: Building artificial intelligence we can trust. Pantheon.

Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Farrar, Straus and Giroux.

Mollick, E. (2024). Co-intelligence: Living and working with AI. Portfolio/Penguin.

Murphy, K. P. (2022). Probabilistic machine learning: An introduction. MIT Press.

Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Basic Books.

Raschka, S., Liu, Y. H., & Mirjalili, V. (2022). Machine learning with PyTorch and Scikit-Learn. Packt Publishing.

Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.). Pearson.

Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. Knopf.


Part 10: Online Resources, Courses, and Documentation

Anthropic. (2024). Anthropic documentation. https://docs.anthropic.com

Anthropic. (2024). Claude prompt engineering guide. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview

DeepLearning.AI. (2023). ChatGPT prompt engineering for developers [Course]. https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/

Google. (2024). Introduction to generative AI [Course]. Google Cloud Skills Boost. https://www.cloudskillsboost.google/paths/118

Hugging Face. (2024). NLP course. https://huggingface.co/learn/nlp-course

Karpathy, A. (2023). Let's build GPT: From scratch, in code, spelled out [Video]. YouTube. https://www.youtube.com/watch?v=kCc8FmEb1nY

Liang, P. et al. (2023). Holistic evaluation of language models (HELM). Stanford CRFM. https://crfm.stanford.edu/helm/

OpenAI. (2024). OpenAI documentation. https://platform.openai.com/docs

OpenAI. (2024). Prompt engineering guide. https://platform.openai.com/docs/guides/prompt-engineering

Wolfram, S. (2023). What is ChatGPT doing… and why does it work? Wolfram Media. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/


Part 11: Key Articles and Reports

Cade Metz, C., & Nicas, J. (2023, March 14). How ChatGPT suddenly got much better at its job. The New York Times.

Doshi, A. R., & Hauser, O. (2023). Generative AI enhances individual creativity but reduces the collective diversity of novel content (SSRN Working Paper). https://doi.org/10.2139/ssrn.4535536

Heaven, W. D. (2023, January 27). ChatGPT is everywhere. Here's where it came from. MIT Technology Review. https://www.technologyreview.com/2023/01/27/1066538/chatgpt-is-everywhere-heres-where-it-came-from/

Lawton, G. (2023). What is prompt engineering? A detailed overview. TechTarget. https://www.techtarget.com/searchenterpriseai/definition/prompt-engineering

Mollick, E., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. SSRN. https://doi.org/10.2139/ssrn.4391243

Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red teaming language models with language models. arXiv. https://arxiv.org/abs/2202.03286

Roose, K. (2023, March 29). A.I. poses 'risk of extinction,' industry leaders warn. The New York Times. https://www.nytimes.com/2023/05/30/technology/ai-threat-warning.html

Suleyman, M., & Bhaskar, M. (2023). The coming wave: Technology, power, and the twenty-first century's greatest dilemma. Crown.

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, A., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from language models. arXiv. https://arxiv.org/abs/2112.04359


Note on citations: The AI research landscape moves quickly, and preprints (arXiv papers) often appear months or years before formal journal publication. Where both a preprint and a formal publication exist, the formal publication is cited. All URLs were verified as of early 2025; link stability is not guaranteed for external sources.