Skip to content

Over 700 million events/second: How we make sense of too much data

Published: at 01:00

语言: 中文

关键字: 数据管道, 下采样, Horvitz-Thompson 估计器, 置信区间, 自适应采样

概述: Cloudflare 的数据管道每秒处理超过 7 亿个事件,为了应对如此庞大的数据量,文章介绍了 Cloudflare 如何通过下采样技术在数据管道的各个阶段控制数据丢失,并使用 Horvitz-Thompson 估计器从下采样数据中提取有意义的分析结果。文章还强调了在采样过程中可能出现的错误,以及如何通过自适应采样和置信区间来提高分析的准确性。Cloudflare 通过这些技术,确保即使在数据量巨大且系统过载的情况下,也能为客户提供可靠的分析服务。

分节阅读:

相关工具:

参考文献:

原文链接: https://blog.cloudflare.com/how-we-make-sense-of-too-much-data/?utm_campaign=cf_blog&utm_content=20250127&utm_medium=organic_social&utm_source=twitter/

source: https://blog.cloudflare.com/how-we-make-sense-of-too-much-data/?utm_campaign=cf_blog&utm_content=20250127&utm_medium=organic_social&utm_source=twitter/


Previous Post
Real World Use Cases: Strategies that Will Bridge the Gap Between Development and Productionizing | by Hampus Gustavsson | Jan, 2025 | Towards Data Science
Next Post
Building Knowledge Graphs with LLM Graph Transformer | by Tomaz Bratanic | Towards Data Science