A Practitioners Guide to Retrieval Augmented Generation (RAG)

关键字: Retrieval Augmented Generation, RAG, Large Language Models, LLMs, Knowledge Injection

概述: 本文深入探讨了检索增强生成（RAG）技术，这是一种通过将大型语言模型（LLMs）与外部数据源相结合来提高其性能的方法。文章详细解释了RAG的工作原理，包括数据预处理、检索机制和生成过程。RAG通过在提示中加入相关上下文，显著减少了LLM的幻觉问题，并使其能够访问最新的信息。此外，RAG还提供了数据安全性和易于实现的优势，使其成为知识注入的首选方法。文章还回顾了RAG的起源，并探讨了其在现代LLM应用中的演变。最后，文章提供了RAG应用的实用技巧，包括优化检索管道、上下文窗口和数据处理。

分节阅读:

What is Retrieval Augmented Generation? RAG通过将相关上下文插入提示中来增强LLM的知识库，利用LLM的上下文学习能力来产生更好的输出。RAG结合了LLM和可搜索的知识库，首先使用输入查询在外部数据集中搜索相关信息，然后将找到的信息添加到模型提示中。RAG需要访问正确且有用的信息数据集，并构建一个管道来搜索相关数据，通常需要对数据进行清理和分块。
The Benefits of RAG RAG的主要优点是减少LLM的幻觉，通过提供数据引用来提高输出的可验证性。RAG还允许LLM访问最新的信息，而无需重新训练模型，并提高了数据安全性，因为它避免了在模型训练中包含专有数据。RAG的实现相对简单，不需要对LLM进行微调，只需专注于改进检索模型。
From the Origins of RAG to Modern Usage RAG最初是为了解决知识密集型任务而提出的，通过将预训练的语言模型与非参数存储器连接起来，动态检索相关信息。RAG的早期版本使用DPR模型进行检索，BART模型进行生成，并且在训练过程中对检索器和生成器进行了微调。现代RAG应用通常依赖于LLM的上下文学习能力，并且可以同时将多个文档传递到模型的输入中。
Using RAG in the Age of LLMs 现代RAG系统通常不进行微调，而是依赖于LLM的上下文学习能力，并且可以使用多种检索方法，包括混合检索。研究表明，RAG在知识注入方面优于微调，并且可以与外部工具结合使用。RAG的评估是一个复杂的过程，需要考虑检索的相关性、答案的质量和上下文的忠实度。
Practical Tips for RAG Applications RAG的检索管道本质上是一个搜索引擎，应该使用混合检索方法，并使用用户反馈来优化检索结果。RAG的评估应该包括检索和生成两个方面，并使用自动化指标进行评估。RAG的迭代改进应该包括模型和数据的改进，并使用AB测试来验证改进效果。
Optimizing the Context Window RAG需要一个较大的上下文窗口，以便在提示中包含足够多的文本块。为了最大化上下文的多样性，可以使用多样性排序器来选择最不相似的文本块。为了优化上下文布局，可以将最相关的文本块放在上下文窗口的开头和结尾。
Data Cleaning and Formatting 数据清理和格式化对于RAG的性能至关重要，可以显著提高LLM生成答案的正确性，并减少传递到模型中的令牌数量。数据清理管道应该根据应用程序和数据进行定制，并可以使用LLM作为评估器来自动化数据清理过程。

相关工具:

LangChain: https://python.langchain.com/docs/use_cases/summarization
Neo4J: https://neo4j.com/
Serper: https://serper.dev/
RAGAS: https://github.com/explodinggradients/ragas
RAGAS Docs: https://docs.ragas.io/en/stable/
ColBERT: https://arxiv.org/abs/2004.12832

参考文献:

[1] Lewis, Patrick, et al. “Retrieval-augmented generation for knowledge-intensive nlp tasks.” Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
[2] Karpukhin, Vladimir, et al. “Dense passage retrieval for open-domain question answering.” arXiv preprint arXiv:2004.04906 (2020).
[3] Lewis, Mike, et al. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.” arXiv preprint arXiv:1910.13461 (2019).
[4] Petroni, Fabio, et al. “How context affects language models’ factual predictions.” arXiv preprint arXiv:2005.04611 (2020).
[5] Patil, Shishir G., et al. “Gorilla: Large language model connected with massive apis.” arXiv preprint arXiv:2305.15334 (2023).
[6] Wang, Yizhong, et al. “Self-instruct: Aligning language model with self generated instructions.” arXiv preprint arXiv:2212.10560 (2022).
[7] Ovadia, Oded, et al. “Fine-tuning or retrieval? comparing knowledge injection in llms.” arXiv preprint arXiv:2312.05934 (2023).
[8] Es, Shahul, et al. “Ragas: Automated evaluation of retrieval augmented generation.” arXiv preprint arXiv:2309.15217 (2023).
[9] Zheng, Lianmin, et al. “Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.” arXiv preprint arXiv:2306.05685 (2023).
[10] Khattab, Omar, and Matei Zaharia. “Colbert: Efficient and effective passage search via contextualized late interaction over bert.” Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 2020.
[11] Liu, Nelson F., et al. “Lost in the middle: How language models use long contexts.” arXiv preprint arXiv:2307.03172 (2023).
[12] Leng, Quinn, el al. “Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2.” https://www.databricks.com/blog/announcing-mlflow-28-llm-judge-metrics-and-best-practices-llm-evaluation-rag-applications-part (2023).
[13] Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.
[14] Wang, Yan, et al. “Enhancing recommender systems with large language model reasoning graphs.” arXiv preprint arXiv:2308.10835 (2023).
[15] Zhou, Chunting, et al. “Lima: Less is more for alignment.” arXiv preprint arXiv:2305.11206 (2023).
[16] Guu, Kelvin, et al. “Retrieval augmented language model pre-training.” International conference on machine learning. PMLR, 2020.
[17] Glaese, Amelia, et al. “Improving alignment of dialogue agents via targeted human judgements.” arXiv preprint arXiv:2209.14375 (2022).

原文链接: https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval?utm_source=profile&utm_medium=reader2

source: https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval?utm_source=profile&utm_medium=reader2