Introducing Contextual Retrieval Anthropic

Okay, here’s the analysis of the provided text, following your specified format:

语言: 中文

关键字: RAG, Contextual Retrieval, Embeddings, BM25, Reranking

概述: 本文介绍了 Anthropic 提出的一种名为“Contextual Retrieval”的新方法，旨在显著提高 RAG（Retrieval-Augmented Generation）系统中信息检索的准确性。传统 RAG 方法在编码信息时会丢失上下文，导致检索失败。Contextual Retrieval 通过 Contextual Embeddings 和 Contextual BM25 两种子技术，在分块文本中加入特定上下文，从而提升检索效果。实验表明，该方法能显著降低检索失败率，结合重排序技术后效果更佳。此外，文章还讨论了实施 Contextual Retrieval 的一些注意事项，并提供了一个 cookbook 供开发者参考。

分节阅读:

A primer on RAG: scaling to larger knowledge bases: 针对大型知识库，RAG 是一种典型的解决方案，它通过将知识库分解为小文本块、将这些文本块转换为向量嵌入并存储在向量数据库中来实现。在运行时，RAG 系统会根据用户查询的语义相似性找到最相关的文本块，并将它们添加到发送给生成模型的提示中。结合使用嵌入和 BM25 技术可以更准确地检索最适用的文本块，从而平衡精确的术语匹配和更广泛的语义理解。
The context conundrum in traditional RAG: 传统的 RAG 系统通常将文档分割成更小的块以实现高效检索，但当单个块缺乏足够的上下文时，可能会导致问题。例如，如果一个块只包含“公司收入比上一季度增长了 3%”，而没有指定公司名称或时间段，那么很难检索到正确的信息或有效地使用该信息。
Introducing Contextual Retrieval: Contextual Retrieval 通过在嵌入（“Contextual Embeddings”）和创建 BM25 索引（“Contextual BM25”）之前，将特定于块的解释性上下文添加到每个块来解决此问题。这种方法通过为每个块提供额外的上下文信息，从而提高检索的准确性和相关性。
Implementing Contextual Retrieval: 使用 Claude 模型生成简洁的、特定于块的上下文，该上下文使用整个文档的上下文来解释该块。生成的上下文文本（通常为 50-100 个 token）会添加到块的前面，然后再进行嵌入和创建 BM25 索引。
Using Prompt Caching to reduce the costs of Contextual Retrieval: 借助 Claude 的 prompt caching 功能，Contextual Retrieval 可以在低成本下实现，因为无需为每个块传递参考文档。只需将文档加载到缓存一次，然后引用之前缓存的内容即可。
Performance improvements: 实验表明，Contextual Embeddings 将 top-20-chunk 检索失败率降低了 35%，而结合使用 Contextual Embeddings 和 Contextual BM25 则将 top-20-chunk 检索失败率降低了 49%。
Implementation considerations: 实施 Contextual Retrieval 时，需要考虑块边界、嵌入模型、自定义上下文提示和块数量等因素，并始终运行评估以优化响应生成。
Further boosting performance with Reranking: 通过将 Contextual Retrieval 与重排序技术相结合，可以进一步提高性能，重排序是一种常用的过滤技术，可确保只有最相关的块传递给模型。
Performance improvements: 实验表明，经过重排序的 Contextual Embedding 和 Contextual BM25 将 top-20-chunk 检索失败率降低了 67%。
Cost and latency considerations: 使用重排序的一个重要考虑因素是它对延迟和成本的影响，尤其是在对大量块进行重排序时，建议针对特定用例尝试不同的设置，以找到合适的平衡。
Conclusion: 为了最大限度地提高性能，可以将来自 Voyage 或 Gemini 的 Contextual Embeddings 与 Contextual BM25 以及重排序步骤相结合，并将 20 个块添加到提示中。

相关工具:

Claude: https://www.anthropic.com/
Gemini: https://ai.google.dev/gemini-api/docs/embeddings
Voyage: https://www.voyageai.com/
Cohere reranker: https://cohere.com/rerank

参考文献:

Chunking strategies: https://www.pinecone.io/learn/chunking-strategies/
Evaluating chunking: https://research.trychroma.com/evaluating-chunking
adding generic document summaries to chunks: https://aclanthology.org/W02-0405.pdf
hypothetical document embedding: https://arxiv.org/abs/2212.10496
summary-based indexing: https://www.llamaindex.ai/blog/a-new-document-summary-index-for-llm-powered-qa-systems-9a32ece2f9ec

原文链接: https://www.anthropic.com/news/contextual-retrieval

source: https://www.anthropic.com/news/contextual-retrieval