Chapter 31: Further Reading

Foundational RAG

  • Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. The foundational paper that formalized the RAG paradigm, combining a pre-trained seq2seq model with a dense retriever and demonstrating improvements on knowledge-intensive benchmarks.

  • Gao, Y., Xiong, Y., Diber, A., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv preprint arXiv:2312.10997. A comprehensive survey covering RAG architectures, retrieval strategies, generation techniques, and evaluation methods.

  • Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." ICLR 2024. Introduces a model that decides when to retrieve, evaluates retrieval quality, and critiques its own generation.

Embedding Models and Dense Retrieval

  • Reimers, N. & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. The foundational paper for sentence transformers, enabling efficient dense retrieval through siamese and triplet network fine-tuning.

  • Xiao, S., Liu, Z., Zhang, P., & Muennighoff, N. (2024). "C-Pack: Packaged Resources to Advance General Chinese Embedding." arXiv preprint arXiv:2309.07597. Introduces the BGE embedding models that top the MTEB leaderboard, along with training data and methodology.

  • Karpukhin, V., Oguz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020. The DPR paper that demonstrated dense retrieval outperforming BM25 for open-domain QA, establishing the bi-encoder paradigm.

  • Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). "MTEB: Massive Text Embedding Benchmark." EACL 2023. The standard benchmark for evaluating text embedding models across retrieval, classification, clustering, and semantic similarity tasks.

  • Johnson, J., Douze, M., & Jegou, H. (2021). "Billion-Scale Similarity Search with GPUs." IEEE Transactions on Big Data, 7(3), 535--547. The FAISS paper describing GPU-accelerated approximate nearest neighbor search algorithms used in production at Meta.

  • Malkov, Y. A. & Yashunin, D. A. (2020). "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs." IEEE TPAMI, 42(4), 824--836. The HNSW algorithm paper, the most widely used ANN index in production vector databases.

  • Robertson, S. E. & Zaragoza, H. (2009). "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in Information Retrieval, 3(4), 333--389. The definitive reference for BM25, the sparse retrieval algorithm used in hybrid search systems.

Chunking and Document Processing

  • Kamradt, G. (2024). "5 Levels of Text Splitting." Blog post and tutorial. A practical guide to chunking strategies from simple character splitting to semantic chunking with embedding-based boundary detection.

  • Shi, W., Min, S., Yasunaga, M., et al. (2024). "REPLUG: Retrieval-Augmented Black-Box Language Models." NAACL 2024. Demonstrates that retrieval augmentation can improve black-box LLMs without access to model weights or gradients.

Query Transformation and Advanced Retrieval

  • Gao, L., Ma, X., Lin, J., & Callan, J. (2023). "Precise Zero-Shot Dense Retrieval without Relevance Labels." ACL 2023. The HyDE paper, showing that generating a hypothetical document and using its embedding for retrieval outperforms direct query embedding.

  • Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). "Query Rewriting for Retrieval-Augmented Large Language Models." EMNLP 2023. Demonstrates that LLM-based query rewriting significantly improves retrieval quality in RAG systems.

  • Creswell, A., Shanahan, M., & Higgins, I. (2023). "Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning." ICLR 2023. Multi-step reasoning with interleaved retrieval and generation, relevant to query decomposition approaches.

Reranking

  • Nogueira, R. & Cho, K. (2020). "Passage Re-ranking with BERT." arXiv preprint arXiv:1901.04085. The foundational paper on using cross-encoders for passage reranking, demonstrating large improvements over bi-encoder retrieval alone.

  • Sun, W., Yan, L., Ma, X., et al. (2024). "Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents." EMNLP 2024. Explores using LLMs as rerankers through listwise and pointwise prompting strategies.

Evaluation

  • Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). "RAGAS: Automated Evaluation of Retrieval Augmented Generation." EACL 2024. The RAGAS framework for evaluating RAG systems across faithfulness, answer relevance, context precision, and context recall.

  • Saad-Falcon, J., Barber, O., Sinha, A., et al. (2024). "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems." NAACL 2024. An evaluation framework that uses LLM judges calibrated with human annotations for scalable RAG evaluation.

Production RAG Systems

  • Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). "Seven Failure Points When Engineering a Retrieval Augmented Generation System." arXiv preprint arXiv:2401.05856. Identifies common failure modes in production RAG systems with practical mitigation strategies.

  • Anthropic. (2024). "Contextual Retrieval." Technical blog post. Describes techniques for adding contextual information to chunks before embedding, improving retrieval quality by grounding each chunk in its broader document context.