DISTRIBUTED EVENT AGGREGATION FOR CONTENT BASED PUBLISHSUBSCRIBE SYSTEMS

DISTRIBUTED EVENT AGGREGATION FOR CONTENT -BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 1 University of Oslo Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2 University of Toronto

Motivation: Intelligent Transport System (ITS) • Information providers: road sensors, crowdsourced mobile apps • Information seekers: commuters, police, first responders, radio networks etc. • Aggregate subscriptions • • Count number of cars passing a street light per hour • Non-aggregate subscriptions • Accident reports • Traffic violation reports Average speed of cars on a road segment per day 2 http: //www. wired. com/images_blogs/autopia/2012/08/12 A 914. jpg

Aggregation in pub/sub • Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks. • However, pub/sub does not support aggregation, which is required in emerging applications. • Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation. 3

Contributions: aggregation in pub/sub • We propose a framework and baseline approaches for aggregation in content-based pub/sub systems (CBPS). • We show the relative performance of the baseline approaches varies with workload properties. • We propose a per-broker distributed adaptive approach. 4

Advertisement-based pub/sub model B P[val, 8] A[val, > , 4] Broker Subscription Delivery Tree (SDT) Bp BBII BS S[val, > , 3] Bq 5

Comparison with stream processing Aggregation in pub/sub Requires global view of topology Topology is not known to individual broker nodes Requires a priori knowledge of publication sources Publication sources and sinks are dynamic Needs control layer Brokers are loosely coupled Usually have a static query plan SDTs are dynamic and determined by the pub/sub implementation Optimized for continuous data streams Publications come at an irregular rate 6

Proposed aggregation framework Publication filtering procedure (PFP) Subscription: { Road. ID = 101, speed > 10, op=‘avg’ , Duration (ω) = 2 hour, shift size (δ) = 1 hour} Pub 2 NWR 1 subscription Pub 1 Pub 3 NWR 2 NWR 3 0 1 2 3 Time Notification window ranges (NWR) A single publication can participate in several NWRs, even for the same subscription. 7

Proposed aggregation framework Publication filtering procedure (PFP) subscription Pub 1 Initial computation procedure (ICP) Pub 2 NWR 1 Pub 3 NWR 2 x 0 1 NWR 3 2 3 Time Notification window ranges (NWR) Outgoing messages: { avg(Pub 1, Pub 2, Pub 3), avg(Pub 2, Pub 3) } Outgoing messages: { avg(Pub 1, Pub 2), avg(Pub 2), Pub 3 } Processing start time presents a trade-off between communication cost and end-to-end delay. 8

Proposed aggregation framework Publication filtering procedure (PFP) Initial computation procedure (ICP) avgp Recurrent processing procedure (RPP) Collection delay Bp avgq Bq BI avgpq Collection delay is another parameter affecting the delay-communication trade-off. 9

Late aggregation approach PFS P[val, 3] ICP RPP X X X P[val, 5] Messages exchanged in Late aggregation: 6 Bp Subscriber Publishers BI BSs X Bq X X P[val, 9] Smin[val, >, 2] P[Valmin, 3] P[val, 2] Late approach aggregates messages at subscriber-edge brokers. 10

Early aggregation approach PFS P[val, 3] Messages exchanged in Late aggregation: 6 P[val, 9] P[val, 2] Messages exchanged in Early aggregation: 3 P[valmin, 3] Subscriber X Publishers Bq RPP X X X P[val, 5] Bp. A ICP X BI X P[valmin, 3] BS Smin[val, >, 2] X P[valmin, 9] X X P[valmin, 3] Early approach aggregates messages at publisher-edge brokers. 11

Early does not always outperform Late P[val, 3] P[val, 5] Bp P[valmin, 3] P[valmax, 5] P[valcount, 3] P[valcount, 2] BI Bq P[val, 9] P[valcount, 1] P[valmax, 9] P[valmin, 3] BS Smin[val, >, 2] Smax[val, >, 2] Scount[val, >, 2] P[valmax, 9] P[valmin, 9] P[val, 2] Late aggregation Messages exchanged: 6 Early aggregation Messages exchanged: 9 12

Comparison between Early and Late Several parameters affect the performance of our baselines: Increasing parameter Favors Publication matching rate Early Matching number of NWRs Late Overlap among aggregate subscriptions Late Ratio between aggregate and regular subscriptions Early Reducing the communication cost requires an adaptive solution 13

Benefits of adaptive aggregation P[val, 3] P[val, 5] BAp P[valmin, 3] P[val, 9] Early 6 5 P[valmin, 3] BAI Bq. F Late P[val, 9] BAS Smin[val, >, 2] S[val, >, 6] P[valmin, 9] P[val, 2] 14

Benefits of adaptive aggregation P[val, 3] P[val, 5] BAp P[valmin, 3] Early Adaptive 6 5 4 P[valmin, 3] BI Bq Late P[val, 9] BAS Smin[val, >, 2] S[val, >, 6] P[val, 9] P[val, 2] Per-broker adaptation reduces communication cost 15

Adaptation process (MAPE-K) Plan Analyze • Compare the ratio between Pubs vs. NWRs • Estimate the notification rate • Choose the suitable mode • Transition between aggregate and forward mode Knowledge Information at a broker • Registered subscriptions • Current execution mode Monitor • Matching publications within sampling period • Changes in subscription set Execute • Start/stop aggregation at broker 16

Experimental setup • Implemented in Java over the PADRES framework • Topology: 16 brokers – Combination of publisher-edge only, subscriber-edge only and mixed brokers B B B B • Real life datasets: • Traffic dataset from the ONE-ITS service 1 • Yahoo! Finance Stock dataset • Metrics: • Number of messages exchanged • Processing overhead • End-to-end delay 1 http: //one-its-webapp 1. transport. utoronto. ca 17

Results (Stock dataset) Decision becomes more accurate when available information is sufficient Varying Publication/second Varying number of subscriptions • Early perform better at high pub rates whereas Late is better with large number of subscriptions. • Adaptive aggregation performs close to the best among Early and Late for all settings. 18

Results (Traffic dataset) Varying Publication/second Varying number of subscriptions Per-Broker adaptation cause individual brokers to make incorrect decisions 19

Processing overhead (Stock) Predicate matching cost Aggregation-related overhead Adaptation overhead is dominating the aggregation overhead 20

Conclusions • We provide an aggregation framework for CBPS with baseline solutions. • We demonstrate that neither baseline is dominant and depends upon workload parameters. • We provide a generic adaptive aggregation framework. • We experimentally demonstrate that our distributed adaptive solution performs close to the best baseline across all settings. 21

Thank you! For questions and comments Contact: navneet@ifi. uio. no 22