Approx Io T Approximate Analytics for Edge Computing

Modern online services Stream aggregator Stream analytics system Useful Information Processing streaming data from

Modern online services Tension Efficient resource utilization Low latency Approximate computing

Approximate computing Many applications: Approximate output is good enough! The proportion of data is

Approximate computing Idea: To achieve low latency, compute over a sub-set of data items

State-of-the-art system Stream. Approx [Middleware’ 17] S 1 S 2 … Sn Stream aggregator

Edge computing Allows data to be processed at the edge node before it’s sent

Edge infrastructure Azure Io. T edge Watson Io. T AWS Io. T Source: https:

Problem statement To build a stream analytics system • By utilizing the cloud and

Outline • Motivation • Design • Implementation • Evaluation

Approx. Io. T: Overview Regional edge Query Continental node S 1 Cloud Edge nodes

Naïve algorithm Simple random sampling (SRS) SRS Overlooked Query Sampled unfairly Approximate output ±

Background: Stratified sampling Advantage: The sub-streams are sampled fairly Disadvantage: Requires the knowledge of

Background: Reservoir sampling The 5 th item The 6 th item Reservoir Reservoir sampling

Approx. Io. T sampling algorithm Weighted hierarchical sampling (WHS) Combining stratified and reservoir sampling

WHS on edge nodes Regional edge Continental node Edge nodes Regional edge W=1 Cloud

Approx. Io. T in the cloud Regional edge The weights are carried Continental node

Implementation S 1 … S 2 See the paper for more details Kafka cluster

Experimental setup • Evaluation questions • Accuracy vs. sample size • Throughput vs. sample

Accuracy vs. sample size 80 Accuracy loss(%) SRS Lower the better Approx. Io. T

Throughput vs. sample size Throughput(k) items/s 120 Native SRS Higher the better Approx. Io.

Conclusion Approx. Io. T: Approximate analytics for edge computing Efficiency Efficient computing and bandwidth

Slides: 23

Download presentation

Approx. Io. T Approximate Analytics for Edge Computing https: //Approx. Io. T. github. io/Approx. Io. T/ Zhenyu Wen, Do Le Quoc, Pramod Bhatotia, Ruichuan Chen, Myungjin Lee

Modern online services Stream aggregator Stream analytics system Useful Information Processing streaming data from different sources

Modern online services Tension Efficient resource utilization Low latency Approximate computing

Approximate computing Many applications: Approximate output is good enough! The proportion of data is useful for this application Live taxi heatmap

Approximate computing Idea: To achieve low latency, compute over a sub-set of data items instead of the entire data-set Approximate computing (sampling) Analyze Approximate output ± error bound

State-of-the-art system Stream. Approx [Middleware’ 17] S 1 S 2 … Sn Stream aggregator Data stream Stream. Approx Cloud datacenter Limitations: • It wastes bandwidth • It utilizes only cloud datacenter resources Approximate output ± error bound

Edge computing Allows data to be processed at the edge node before it’s sent to the cloud Source of data Edge node Local processing Gateway Opportunities: • Providing more computing resources • Saving bandwidth Cloud

Edge infrastructure Azure Io. T edge Watson Io. T AWS Io. T Source: https: //peering. google. com/#/infrastructure

Problem statement To build a stream analytics system • By utilizing the cloud and edge computing resources • By leveraging approximate computing Design goals • Efficiency: Efficient utilization of computing resources • Adaptability: Adaptive execution based on the available resources • Transparency: No code change required and resource management

Outline • Motivation • Design • Implementation • Evaluation

Approx. Io. T: Overview Regional edge Query Continental node S 1 Cloud Edge nodes … Si … Central node Sm Approximate output ± error bound … Sn Approx. Io. T employs sampling in the distributed environment of edge + cloud

Naïve algorithm Simple random sampling (SRS) SRS Overlooked Query Sampled unfairly Approximate output ± error bound Low accuracy

Background: Stratified sampling Advantage: The sub-streams are sampled fairly Disadvantage: Requires the knowledge of each sub-stream size

Background: Reservoir sampling The 5 th item The 6 th item Reservoir Reservoir sampling sampling Advantage: • No pre-knowledge required of sub-stream size Disadvantages: • The sub-streams are sampled unfairly • Difficult to run on multiple nodes

Approx. Io. T sampling algorithm Weighted hierarchical sampling (WHS) Combining stratified and reservoir sampling With initial weight 1 Reservoir size N=4 C=6 W=1 W=1 WHS Weight: C/N, if C>N 1, if C <=N W=6/4 W=1 Easy to parallelize, requires no synchronization between sub-streams

WHS on edge nodes Regional edge Continental node Edge nodes Regional edge W=1 Cloud W=1 Central node WHS Carried weight Continental node W=4 W=1 W=3 Reservoir size equals 2 WHS W=6/2=3 W=4/2=2 W=1 Current weight W=4*5/2=10 W=1*3/2=3/2 W=3 Easy to parallelize, requires no synchronization between computing nodes

Approx. Io. T in the cloud Regional edge The weights are carried Continental node Edge nodes W=4/3 W=1 Cloud W=4/3*6/1 =8 WHS Central node W=1*4/1=4 W=1*2/1=2 Query (sum) Approximate output: 8* +4* +2* ± error bound Reservoir size equals 1

Outline • Motivation • Design • Implementation • Evaluation

Implementation S 1 … S 2 See the paper for more details Kafka cluster Sn Edge nodes Data stream Sampled data stream Stream pub/sub Sampled data stream Cloud datacenter Kafka Streams

Experimental setup • Evaluation questions • Accuracy vs. sample size • Throughput vs. sample size • Testbed: 25 nodes • 15 nodes for Approx. Io. T deployment • 10 nodes for Kafka cluster • Datasets: • Synthetic: Poisson and Gaussian distribution • Real: Brasvo pollution and New York Taxi Ride See the paper for more results!

Accuracy vs. sample size 80 Accuracy loss(%) SRS Lower the better Approx. Io. T 60 The average is 0. 035% 40 20 0 10 20 40 60 Sampling fraction(%) 80 Approx. Io. T: ~2600 X higher accuracy over SRS

Throughput vs. sample size Throughput(k) items/s 120 Native SRS Higher the better Approx. Io. T 80 40 0 10 20 40 60 80 90 Sampling fraction(%) 100 • Approx. Io. T has low overhead compared to the native execution • Approx. Io. T has similar throughput as SRS

Conclusion Approx. Io. T: Approximate analytics for edge computing Efficiency Efficient computing and bandwidth resource utilization Adaptability Adaptive execution based on the available resources Transparency Requires no code changes and resource management Thank you! More details on the project website: https: //Approx. Io. T. github. io/Approx. Io. T/