Discussion about this post

User's avatar
Nitesh Tripathi's avatar

Hi Ravi, thanks for sharing this insightful blog.

One question:

Computing global top k elements using local top k elements might not be always correct.

eg.

Suppose we are interested in top 2 elements and local frequency count is as below,

Local1 = A:8, B:7, C:6

Local2 = A:3, B:2, C:6

Top 2 from local1 is A, B and from local2 is C,A

Global top 2 will be A:11, B:7

But it should be C:12, A:11.

Please let me know what exactly i am missing here. Thanks

Expand full comment
Stanley's avatar

i think the major problem of the approach is

let say i have a CMS for every 1 min , K = 2

1:00pm - 1:01, total frequency: A:8, B:7, C:6, top K = A:8, B:7

1:01pm - 1:02, total frequency: A:3, B:2, C:6, top K = C:6, A:3

when user wanna see topK from 1:00pm - 1:02pm, we merge every topk list from each min

A:8, B:7 + C:6, A:3 = A:11, B:7,C:6 = A:11, B:7

but in fact, it should be C12, A11, how do we solve this problem? relie on slow path? it may have the same problem if we ask top K for 1:00pm - 3:02pm, it is never accuracy

Expand full comment
7 more comments...

No posts