Chapter 31: Further Reading
Foundational RAG
-
Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. The foundational paper that formalized the RAG paradigm, combining a pre-trained seq2seq model with a dense retriever and demonstrating improvements on knowledge-intensive benchmarks.
-
Gao, Y., Xiong, Y., Diber, A., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv preprint arXiv:2312.10997. A comprehensive survey covering RAG architectures, retrieval strategies, generation techniques, and evaluation methods.
-
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." ICLR 2024. Introduces a model that decides when to retrieve, evaluates retrieval quality, and critiques its own generation.
Embedding Models and Dense Retrieval
-
Reimers, N. & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. The foundational paper for sentence transformers, enabling efficient dense retrieval through siamese and triplet network fine-tuning.
-
Xiao, S., Liu, Z., Zhang, P., & Muennighoff, N. (2024). "C-Pack: Packaged Resources to Advance General Chinese Embedding." arXiv preprint arXiv:2309.07597. Introduces the BGE embedding models that top the MTEB leaderboard, along with training data and methodology.
-
Karpukhin, V., Oguz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020. The DPR paper that demonstrated dense retrieval outperforming BM25 for open-domain QA, establishing the bi-encoder paradigm.
-
Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). "MTEB: Massive Text Embedding Benchmark." EACL 2023. The standard benchmark for evaluating text embedding models across retrieval, classification, clustering, and semantic similarity tasks.
Vector Databases and Search
-
Johnson, J., Douze, M., & Jegou, H. (2021). "Billion-Scale Similarity Search with GPUs." IEEE Transactions on Big Data, 7(3), 535--547. The FAISS paper describing GPU-accelerated approximate nearest neighbor search algorithms used in production at Meta.
-
Malkov, Y. A. & Yashunin, D. A. (2020). "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs." IEEE TPAMI, 42(4), 824--836. The HNSW algorithm paper, the most widely used ANN index in production vector databases.
-
Robertson, S. E. & Zaragoza, H. (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in Information Retrieval, 3(4), 333--389. The definitive reference for BM25, the sparse retrieval algorithm used in hybrid search systems.
Chunking and Document Processing
-
Kamradt, G. (2024). "5 Levels of Text Splitting." Blog post and tutorial. A practical guide to chunking strategies from simple character splitting to semantic chunking with embedding-based boundary detection.
-
Shi, W., Min, S., Yasunaga, M., et al. (2024). "REPLUG: Retrieval-Augmented Black-Box Language Models." NAACL 2024. Demonstrates that retrieval augmentation can improve black-box LLMs without access to model weights or gradients.
Query Transformation and Advanced Retrieval
-
Gao, L., Ma, X., Lin, J., & Callan, J. (2023). "Precise Zero-Shot Dense Retrieval without Relevance Labels." ACL 2023. The HyDE paper, showing that generating a hypothetical document and using its embedding for retrieval outperforms direct query embedding.
-
Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). "Query Rewriting for Retrieval-Augmented Large Language Models." EMNLP 2023. Demonstrates that LLM-based query rewriting significantly improves retrieval quality in RAG systems.
-
Creswell, A., Shanahan, M., & Higgins, I. (2023). "Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning." ICLR 2023. Multi-step reasoning with interleaved retrieval and generation, relevant to query decomposition approaches.
Reranking
-
Nogueira, R. & Cho, K. (2020). "Passage Re-ranking with BERT." arXiv preprint arXiv:1901.04085. The foundational paper on using cross-encoders for passage reranking, demonstrating large improvements over bi-encoder retrieval alone.
-
Sun, W., Yan, L., Ma, X., et al. (2024). "Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents." EMNLP 2024. Explores using LLMs as rerankers through listwise and pointwise prompting strategies.
Evaluation
-
Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." EACL 2024. The RAGAS framework for evaluating RAG systems across faithfulness, answer relevance, context precision, and context recall.
-
Saad-Falcon, J., Barber, O., Sinha, A., et al. (2024). "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems." NAACL 2024. An evaluation framework that uses LLM judges calibrated with human annotations for scalable RAG evaluation.
Production RAG Systems
-
Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). "Seven Failure Points When Engineering a Retrieval Augmented Generation System." arXiv preprint arXiv:2401.05856. Identifies common failure modes in production RAG systems with practical mitigation strategies.
-
Anthropic. (2024). "Contextual Retrieval." Technical blog post. Describes techniques for adding contextual information to chunks before embedding, improving retrieval quality by grounding each chunk in its broader document context.