STREAM SAMPLING FOR FREQUENCY CAP STATISTICS Edith Cohen ענבר שולמן
Frequency - סטטיסטיקה תדירות statistics
� � Distinct – cap 1 Sum - cap∞
� fixed-threshold • Fixed-size
Algo 1 Input: Foreach Countersh =in{ S: } x is in Counters then Size – k T =if +∞ Counters[x]+= w Stream – S of Foreach h = (x, w) in S: (T, in{(x, Counters[x]) h = (x, w) Return if x is Counters then for x in Counters}) Output: seed(x) = min{seed(x), ES(h)} Result – (x, wx) else s = ES(h) if(s<T) then seed(x) = s; Counters[x] = 0 if(|Counters|>k) then Delete highest in Counters update T;
Algo 2 Input: Threshold – T Stream – S of h = (x, 1) Output: Result – (x, cx) Counters = { } Foreach h =(x, 1) in S: if x is in Counters then Counters[x]++ else if(ES(h) < T) then Counters[x] = 1 Return ({(x, Counters[x]) for x in Counters})
Counters = { } T=1 Foreach h =(x, 1) in S: if x is in Counters then Counters[x]++ else score = ES(h) if(score < T) then seed(x) = score; Counters[x] = 1 while( |Counters| > k): y = argmax{seed(x)|x in Counters} T = seed(y) Input: while(Counters[y]>0 and seed(y)>=T): Size – k Counters[y] -= 1 Stream – S of seed(y) = ES(y) h = (x, 1) if(Counters[y] == 0) then Output: delete Counters[y], seed(y) Result – (x, cx) Return (T, {(x, Counters[x]) for x in Counters}) Algo 3