Chapter 31: Further Reading

Foundational RAG

Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. The foundational paper that formalized the RAG paradigm, combining a pre-trained seq2seq model with a dense retriever and demonstrating improvements on knowledge-intensive benchmarks.
Gao, Y., Xiong, Y., Diber, A., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv preprint arXiv:2312.10997. A comprehensive survey covering RAG architectures, retrieval strategies, generation techniques, and evaluation methods.
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." ICLR 2024. Introduces a model that decides when to retrieve, evaluates retrieval quality, and critiques its own generation.

Reimers, N. & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. The foundational paper for sentence transformers, enabling efficient dense retrieval through siamese and triplet network fine-tuning.
Xiao, S., Liu, Z., Zhang, P., & Muennighoff, N. (2024). "C-Pack: Packaged Resources to Advance General Chinese Embedding." arXiv preprint arXiv:2309.07597. Introduces the BGE embedding models that top the MTEB leaderboard, along with training data and methodology.
Karpukhin, V., Oguz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020. The DPR paper that demonstrated dense retrieval outperforming BM25 for open-domain QA, establishing the bi-encoder paradigm.
Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). "MTEB: Massive Text Embedding Benchmark." EACL 2023. The standard benchmark for evaluating text embedding models across retrieval, classification, clustering, and semantic similarity tasks.

Johnson, J., Douze, M., & Jegou, H. (2021). "Billion-Scale Similarity Search with GPUs." IEEE Transactions on Big Data, 7(3), 535--547. The FAISS paper describing GPU-accelerated approximate nearest neighbor search algorithms used in production at Meta.
Malkov, Y. A. & Yashunin, D. A. (2020). "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs." IEEE TPAMI, 42(4), 824--836. The HNSW algorithm paper, the most widely used ANN index in production vector databases.
Robertson, S. E. & Zaragoza, H. (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in Information Retrieval, 3(4), 333--389. The definitive reference for BM25, the sparse retrieval algorithm used in hybrid search systems.

Kamradt, G. (2024). "5 Levels of Text Splitting." Blog post and tutorial. A practical guide to chunking strategies from simple character splitting to semantic chunking with embedding-based boundary detection.
Shi, W., Min, S., Yasunaga, M., et al. (2024). "REPLUG: Retrieval-Augmented Black-Box Language Models." NAACL 2024. Demonstrates that retrieval augmentation can improve black-box LLMs without access to model weights or gradients.

Gao, L., Ma, X., Lin, J., & Callan, J. (2023). "Precise Zero-Shot Dense Retrieval without Relevance Labels." ACL 2023. The HyDE paper, showing that generating a hypothetical document and using its embedding for retrieval outperforms direct query embedding.
Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). "Query Rewriting for Retrieval-Augmented Large Language Models." EMNLP 2023. Demonstrates that LLM-based query rewriting significantly improves retrieval quality in RAG systems.
Creswell, A., Shanahan, M., & Higgins, I. (2023). "Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning." ICLR 2023. Multi-step reasoning with interleaved retrieval and generation, relevant to query decomposition approaches.

Nogueira, R. & Cho, K. (2020). "Passage Re-ranking with BERT." arXiv preprint arXiv:1901.04085. The foundational paper on using cross-encoders for passage reranking, demonstrating large improvements over bi-encoder retrieval alone.
Sun, W., Yan, L., Ma, X., et al. (2024). "Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents." EMNLP 2024. Explores using LLMs as rerankers through listwise and pointwise prompting strategies.

Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." EACL 2024. The RAGAS framework for evaluating RAG systems across faithfulness, answer relevance, context precision, and context recall.
Saad-Falcon, J., Barber, O., Sinha, A., et al. (2024). "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems." NAACL 2024. An evaluation framework that uses LLM judges calibrated with human annotations for scalable RAG evaluation.

Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). "Seven Failure Points When Engineering a Retrieval Augmented Generation System." arXiv preprint arXiv:2401.05856. Identifies common failure modes in production RAG systems with practical mitigation strategies.
Anthropic. (2024). "Contextual Retrieval." Technical blog post. Describes techniques for adding contextual information to chunks before embedding, improving retrieval quality by grounding each chunk in its broader document context.