From Rivulets to Rivers Elastic Stream Processing in


























- Slides: 26
From Rivulets to Rivers: Elastic Stream Processing in Heron Bill Graham , Twitter - @billgraham Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft
Prediction is very difficult, especially if it’s about the future. - Nils Bohr We cannot direct the wind, but we can adjust the sails. - Dolly Parton
Outline § Heron Overview § Elastic Scaling Challenges § Current Implementation § Work in Progress – Auto-scaling
Heron A realtime, distributed, fault-tolerant stream processing engine.
About Heron § Developed by Twitter in 2014 § Open sourced in May 2016 § Storm API compatible § Isolation at all levels: - Topology - Container - Task (process-based) § At least once, at most once semantics § Backpressure § Low resource overhead (< 10%)
Logical Topology Bolt 1 Bolt 4 Spout 1 Bolt 2 Bolt 5 Spout 2 Bolt 3
Physical Execution Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3
Packing Plan § How to distribute instances onto containers? § IPacking. pack()
Containers Allocated Processes Initialize Instances Register Stream Manager Registers Data Flows heron submit Heron Client Packing Plan Heron Scheduler Topology Submission Container 1 Container 2 Container 3 S 1 S 2 B 3 S 1 B 2 B 3 B 4 B 5 B 6 Stream Manager Container 0 Topology Master
Data Rate Variations
Parallelism Challenges § Anticipating component parallelism is difficult § Changing parallelism is costly - O(hour) - code change, review, merge, build, kill, submit § Tuning for load spikes or valleys is manual - O(day) § Under-provisioning leads to back pressure leads to support costs § Over-provisioning is the norm
Over-provisioning CPU Requested CPU Used 40% 25%
Elastic Scaling Opportunity § Reduce administration cost § Reduce support cost § Reduce hardware cost § Provide better SLA
Ordinary Topology Management Process User Tasks Releases Resources Heron System Tasks Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks
Low-cost Topology “update” 2 2 3 4 4 3
Optimized Topology Scale-up Process Heron System Tasks User Tasks Kill Topology Submit Topology Update Topology Create Packing Pause Topology Un-Pause Topology Monitor / Estimate Build State Acquire Resources Add / Reduce Resources Prepare Component s Start Topology Install Topology
heron “update” … $ heron update my_cluster/user/dev My. Topology --component-parallelism=bolt 1: 20 --component-parallelism=bolt 2: 40 Available in 0. 14. 5 Aims to Maintain Uniform Component Distribution Execution Time O(mins) Aggressively Prunes Containers Minimizes Disruption Customizable Through IRepacking. repack()
Current Limitations § Automated state transition not yet supported - Component scaling event notification : IUpdatable. update() - Example: Kafka. Spout queue partition mappings § Fields group routing might change - Workaround: pause topology > cache flush interval before scaling § Algorithmic Auto-Scaling
Algorithmic Auto-Scaling … User Tasks Submit Topology Build State Heron. Tasks System Tasks Heron System Create Packing Update Topology Pause Topology Monitor / Estimate Un-Pause Topology Acquire Resources Add / Reduce Resources Prepare Component s Start Topology Install Topology
Auto-Scaling Heron should automatically identify variations in the incoming load and react to them. Heron uses Dhalion to adjust to external shocks. Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down.
Using Dhalion to Auto-Scale Pending Packets Detector Metrics Backpressure Detector Symptoms as needed while still keeping the topology in a steady state where backpressure is not observed Resource Overprovisioning Diagnoser Resource Underprovisioning Diagnoser Data Skew Diagnoser Processing Rate Skew Detector Symptom Detection Slow Instances Diagnoser Diagnosis Generation Bolt Scale Down Resolver Diagnosis Bolt Scale Up Resolver Data Skew Resolver Restart Instances Resolver Resolution Resolver Invocation
Initial Results Splitter Bolt Counter Bolt Scale Up 1. 00 0. 80 S 1 S 3 Scale Down S 2 0. 40 0. 20 0. 00 Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized. 0. 60 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 Normalized Throughput 1. 20 Millions Spout 1. 40 Time (in minutes)
Future Plans Open-source Dhalion and the auto-scaling policy as part of Heron. Use Dhalion to enforce throughput and latency SLOs and to auto-tune Heron topologies. Combine scaling with stateful stream processing.
Get Involved http: //github. com/twitter/heron http: //heronstreaming. io @heronstreaming
Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter
Questions?