STREAM SAMPLING FOR FREQUENCY CAP STATISTICS Edith Cohen

  • Slides: 21
Download presentation
STREAM SAMPLING FOR FREQUENCY CAP STATISTICS Edith Cohen ענבר שולמן

STREAM SAMPLING FOR FREQUENCY CAP STATISTICS Edith Cohen ענבר שולמן

Frequency - סטטיסטיקה תדירות statistics

Frequency - סטטיסטיקה תדירות statistics

� � Distinct – cap 1 Sum - cap∞

� � Distinct – cap 1 Sum - cap∞

� fixed-threshold • Fixed-size

� fixed-threshold • Fixed-size

Algo 1 Input: Foreach Countersh =in{ S: } x is in Counters then Size

Algo 1 Input: Foreach Countersh =in{ S: } x is in Counters then Size – k T =if +∞ Counters[x]+= w Stream – S of Foreach h = (x, w) in S: (T, in{(x, Counters[x]) h = (x, w) Return if x is Counters then for x in Counters}) Output: seed(x) = min{seed(x), ES(h)} Result – (x, wx) else s = ES(h) if(s<T) then seed(x) = s; Counters[x] = 0 if(|Counters|>k) then Delete highest in Counters update T;

Algo 2 Input: Threshold – T Stream – S of h = (x, 1)

Algo 2 Input: Threshold – T Stream – S of h = (x, 1) Output: Result – (x, cx) Counters = { } Foreach h =(x, 1) in S: if x is in Counters then Counters[x]++ else if(ES(h) < T) then Counters[x] = 1 Return ({(x, Counters[x]) for x in Counters})

Counters = { } T=1 Foreach h =(x, 1) in S: if x is

Counters = { } T=1 Foreach h =(x, 1) in S: if x is in Counters then Counters[x]++ else score = ES(h) if(score < T) then seed(x) = score; Counters[x] = 1 while( |Counters| > k): y = argmax{seed(x)|x in Counters} T = seed(y) Input: while(Counters[y]>0 and seed(y)>=T): Size – k Counters[y] -= 1 Stream – S of seed(y) = ES(y) h = (x, 1) if(Counters[y] == 0) then Output: delete Counters[y], seed(y) Result – (x, cx) Return (T, {(x, Counters[x]) for x in Counters}) Algo 3

An estimator – discrete SHl

An estimator – discrete SHl

Questions? ? ? ? ?

Questions? ? ? ? ?