Approx Io T Approximate Analytics for Edge Computing





![State-of-the-art system Stream. Approx [Middleware’ 17] S 1 S 2 … Sn Stream aggregator State-of-the-art system Stream. Approx [Middleware’ 17] S 1 S 2 … Sn Stream aggregator](https://slidetodoc.com/presentation_image/0270eb7446045a5405e11083fb10f5b8/image-6.jpg)

















- Slides: 23
Approx. Io. T Approximate Analytics for Edge Computing https: //Approx. Io. T. github. io/Approx. Io. T/ Zhenyu Wen, Do Le Quoc, Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Modern online services Stream aggregator Stream analytics system Useful Information Processing streaming data from different sources
Modern online services Tension Efficient resource utilization Low latency Approximate computing
Approximate computing Many applications: Approximate output is good enough! The proportion of data is useful for this application Live taxi heatmap
Approximate computing Idea: To achieve low latency, compute over a sub-set of data items instead of the entire data-set Approximate computing (sampling) Analyze Approximate output ± error bound
State-of-the-art system Stream. Approx [Middleware’ 17] S 1 S 2 … Sn Stream aggregator Data stream Stream. Approx Cloud datacenter Limitations: • It wastes bandwidth • It utilizes only cloud datacenter resources Approximate output ± error bound
Edge computing Allows data to be processed at the edge node before it’s sent to the cloud Source of data Edge node Local processing Gateway Opportunities: • Providing more computing resources • Saving bandwidth Cloud
Edge infrastructure Azure Io. T edge Watson Io. T AWS Io. T Source: https: //peering. google. com/#/infrastructure
Problem statement To build a stream analytics system • By utilizing the cloud and edge computing resources • By leveraging approximate computing Design goals • Efficiency: Efficient utilization of computing resources • Adaptability: Adaptive execution based on the available resources • Transparency: No code change required and resource management
Outline • Motivation • Design • Implementation • Evaluation
Approx. Io. T: Overview Regional edge Query Continental node S 1 Cloud Edge nodes … Si … Central node Sm Approximate output ± error bound … Sn Approx. Io. T employs sampling in the distributed environment of edge + cloud
Naïve algorithm Simple random sampling (SRS) SRS Overlooked Query Sampled unfairly Approximate output ± error bound Low accuracy
Background: Stratified sampling Advantage: The sub-streams are sampled fairly Disadvantage: Requires the knowledge of each sub-stream size
Background: Reservoir sampling The 5 th item The 6 th item Reservoir Reservoir sampling sampling Advantage: • No pre-knowledge required of sub-stream size Disadvantages: • The sub-streams are sampled unfairly • Difficult to run on multiple nodes
Approx. Io. T sampling algorithm Weighted hierarchical sampling (WHS) Combining stratified and reservoir sampling With initial weight 1 Reservoir size N=4 C=6 W=1 W=1 WHS Weight: C/N, if C>N 1, if C <=N W=6/4 W=1 Easy to parallelize, requires no synchronization between sub-streams
WHS on edge nodes Regional edge Continental node Edge nodes Regional edge W=1 Cloud W=1 Central node WHS Carried weight Continental node W=4 W=1 W=3 Reservoir size equals 2 WHS W=6/2=3 W=4/2=2 W=1 Current weight W=4*5/2=10 W=1*3/2=3/2 W=3 Easy to parallelize, requires no synchronization between computing nodes
Approx. Io. T in the cloud Regional edge The weights are carried Continental node Edge nodes W=4/3 W=1 Cloud W=4/3*6/1 =8 WHS Central node W=1*4/1=4 W=1*2/1=2 Query (sum) Approximate output: 8* +4* +2* ± error bound Reservoir size equals 1
Outline • Motivation • Design • Implementation • Evaluation
Implementation S 1 … S 2 See the paper for more details Kafka cluster Sn Edge nodes Data stream Sampled data stream Stream pub/sub Sampled data stream Cloud datacenter Kafka Streams
Experimental setup • Evaluation questions • Accuracy vs. sample size • Throughput vs. sample size • Testbed: 25 nodes • 15 nodes for Approx. Io. T deployment • 10 nodes for Kafka cluster • Datasets: • Synthetic: Poisson and Gaussian distribution • Real: Brasvo pollution and New York Taxi Ride See the paper for more results!
Accuracy vs. sample size 80 Accuracy loss(%) SRS Lower the better Approx. Io. T 60 The average is 0. 035% 40 20 0 10 20 40 60 Sampling fraction(%) 80 Approx. Io. T: ~2600 X higher accuracy over SRS
Throughput vs. sample size Throughput(k) items/s 120 Native SRS Higher the better Approx. Io. T 80 40 0 10 20 40 60 80 90 Sampling fraction(%) 100 • Approx. Io. T has low overhead compared to the native execution • Approx. Io. T has similar throughput as SRS
Conclusion Approx. Io. T: Approximate analytics for edge computing Efficiency Efficient computing and bandwidth resource utilization Adaptability Adaptive execution based on the available resources Transparency Requires no code changes and resource management Thank you! More details on the project website: https: //Approx. Io. T. github. io/Approx. Io. T/