The SEEC Computational Model Henry Hoffmann Anant Agarwal









































- Slides: 41
The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011
In the beginning…* Application programmers had one goal: Performance *The beginning, in this case, refers to the beginning of my career (1999) 2
But Modern Systems Have Increased the Burden on Application Programmers Lo Hi Power cy en i l i s Re Many additional constraints y alit u Q Performance beat/s Even worse, constraints can change dynamically E. g. power cap, workload fluctuation, core failure 3
Most Execution Models Designed for Performance Coherent Shared Memory Message Passing Global Shared Cache data Local Cache Local Memory Local Cache data store load RF RF Core 0 Core 1 Local Memory data store RF Network Interface Network load Network Interface Core 0 RF Core 1 Concurrency Multi-threaded Multi-process Communication Through Memory Through Network Coordination Locks Messages Control Procedural control insufficient to meet the needs of modern systems 5
Angstrom Replaces Procedural Control with Self-Aware Control Procedural Control Self-Aware Control Decide Observe Act Decide Act • Run in open loop • Run in closed loop • Assumptions made at design time • Based on guesses about future • Understand user goals • Monitor the environment • Application optimized for system • No flexibility to adapt to changes • System optimizes for application • Flexibly adapt behavior The self-aware model allows the system to solve constrained optimization problems dynamically 6
Outline • Introduction/Motivating Example • The SEEC Model and Implementation • Experimental Validation • Conclusions 7
Angstrom’s SElf-awar. E Computing (SEEC) Model • Goal: Reduce programmer burden by continuously optimizing online • Key Features: 1. Applications explicitly state goals and progress 2. System software and hardware state available actions 3. The SEEC runtime system dynamically selects actions to maintain goals Application Developer Observe Act API Systems Developer Decide SEEC Runtime A unified decision engine adapts applications, system software, and hardware 8
Example Self-Aware System Built from SEEC Observe Video Encoder 20 b/s 30 33 Goals: 30 beat/s, Minimize Power Decide Control and Learning System Actuators Algorithm Cores Frequency Bandwidth 9
Roles in the SEEC Model Application Developer Observe Systems Developer Express application goals and progress (e. g. frames/ second) Read goals and performance Determine how to adapt (e. g. How much to speed up the application) Decide Act SEEC Runtime System Provide a set of actions and a callback function (e. g. allocation of cores to process) Initiate actions based on results of decision phase 10
Registering Application Goals Observe Application Performance Quality Lo Hi SEEC Decision Engine Power • Performance – Goals: target heart rate and/or latency between tagged heartbeats – Progress: issue heartbeats at important intervals • Quality – Goals: distortion (distance from application defined nominal value) – Progress: distortion over last heartbeat • Power – Goals: target heart rate / Watt and/or target energy between tagged heartbeats – Progress: Power/energy over last heartbeat interval Research to date focuses on meeting performance while minimizing power/maximizing quality 11
Registering System Actions Actuators Estimated Speedup SEEC Decision Engine Cost Callback Algorithm Cores Frequency Bandwidth Each action has the following attributes: • Estimated Speedup – Predicted benefit of taking an action • Cost – Predicted downside of taking an action (increased power, lowered quality) • Callback – A function that takes an id an implements the associated action 12
Picking Actions to Meet Goals Application H i System Services SEEC Decision Engine Performance Goal L o Decide - Controller (Decide) Actuator (Act) Application (Observe) Current Performance Power Decisions are made to select actions given observations: • Read application goals and heartbeats • Determine speedup with control system • Translate speedup into one or more actions The control system provides predictable and analyzable behavior 13
Optimizing Resource Allocation with SEEC • SEEC can observe, decide and act • How does this enable optimal resource allocation? • Let’s implement the video encoder example from the introduction 14
Performance/Watt Adaptation in Video Encoding Time (s) 0 2 4 6 8 10 12 14 16 80 180 Performance 60 Power 170 50 160 40 30 150 Power (W) Performance (Frame/s) 70 Performance goal 20 140 10 0 130 50 150 250 Time (Heartbeat) 350 450 15
System Models in SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions What if the models are inaccurate? 16
Updating Models in SEEC Application Adaptation Level 2 SEEC Decision Engine Adaptation Level 1 System Model (Decide) Decide System Services Application Model (Decide) Adaptation Level 0 L o H i Power Performance Goal - Controller (Decide) Actuator (Act) Application (Observe) Current Performance • After every action, SEEC updates application and system models • Kalman filters used to estimate true state of application and system – We are exploring a number of different alternatives here SEEC combines predictability of control systems with adaptability of learning systems 17
SEEC Online Learning of Speedup Model for Application with Local Minima 18
Handling Multiple Applications Application Adaptation Level 2 SEEC Decision Engine Adaptation Level 1 L o System Services Application Model (Decide) H i Power L o System Model (Decide) Adaptation Level 0 H i L o Decide H i Power L o H i Power Performance Goal - Controller (Decide) Actuator (Act) Application (Observe) Power Current Performance • Control actions computed separately for each application • For finite resources, several alternatives: • Priorities (for real-time, admission control) • Weighting/Centroid method (for overall system throughput) 19
Outline • Introduction/Motivating Example • The SEEC Model and Implementation • Experimental Validation • Conclusions 20
Systems Built with SEEC System Actions Tradeoff Benchmarks Dynamic Loop Perforation Skip some loop iterations Performance vs. Quality 7/13 PARSECs Dynamic Knobs Make static parameters dynamic Performance vs. Quality bodytrack, swaptions, x 264, SWISH++ Core Scheduler Assign N cores to application Compute vs. Power 11/13 PARSECs Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs Bandwidth Allocator Assign memory controllers to application Memory vs. Power STREAM (doesn’t make a difference for PARSEC) Power Manager Combination of the three above Performance vs. Power PARSEC, STREAM, simple test apps (mergesort, binary search) Learned Models Power Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App Control Power Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs 21
Systems Built with SEEC System Actions Tradeoff Benchmarks Dynamic Loop Perforation Skip some loop iterations Performance vs. Quality 7/13 PARSECs Dynamic Knobs Make static parameters dynamic Performance vs. Quality bodytrack, swaptions, x 264, SWISH++ Core Scheduler Assign N cores to application Compute vs. Power 11/13 PARSECs Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs Bandwidth Allocator Assign memory controllers to application Memory vs. Power STREAM (doesn’t make a difference for PARSEC) Power Manager Combination of the three above Performance vs. Power PARSEC, STREAM, extra test apps (mergesort, binary search) Learned Models Power Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App Control Power Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs 22
Dynamic Knobs: Creating Adaptive Applications Turn static command line parameters into dynamic structure Application Goals Maintain performance and minimize quality loss System Actions Adjust memory locations to change application settings Benchmarks: bodytrack, swaptions, SWISH++, x 264 Experiment Maintain performance when clock speed changes Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011 23
Results Enabling Dynamic Applications bodytrack Clock rises 1. 6 -2. 4 GHz Clock drops 2. 4 -1. 6 GHz w/o SEEC perf. drops w/ SEEC perf. recovers SEEC returns quality to baseline 24
Results Enabling Dynamic Applications SWISH++ swaptions x 264 Maintains performance despite noise Perfect behavior Maintains baseline performance Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism 25
Optimizing Performance per Watt for Video Encoding Adapt system behavior to needs of individual inputs Application Goals System Actions Maintain 30 frame/s while minimizing power Change cores, clock speed, and memory bandwidth Benchmark: x 264 w/ 16 different 1080 p inputs Experiment Compare performance/Watt w/ SEEC to best static allocation of resources for each input 26
1, 2 static worst static oracle AL 0 AL 1 AL 2 1 0, 8 0, 6 0, 4 0, 2 ge 0 av er a Normalized Performance/Watt Average Performance per Watt for Video Encode Input 27
Performance per Watt for All Inputs 28
Learning Models Online Adapt system behavior when initial models are wrong Four targets: Min, Median, Max+ Application Goals Minimize power consumption for each target Change cores, clock speed, and mem. bandwidth System Actions Initial models are incredibly optimistic (Assume linear speedup with any resource increase) Benchmark: STREAM Experiment Converge to within 5% of target performance, measure power and convergence time 29
Complex Response to Resources 2. 4 GHz, 2 MC 2, 6 2. 4 GHz, 1 MC Cores Can 1 MC 1. 6 Ghz, 2 Adding MC 1. 6 GHz, Slow the App Down Adding Memory Bandwidth Can Slow Down 2, 4 2, 2 Speedup 2 1, 8 1, 6 1, 4 1, 2 1 0, 8 1 2 3 4 5 6 7 8 Cores 30
Power Savings Oracle Total System Power (Watts) 250 AL 1 AL 2 can provide 30 Watt power savings over AL 1 200 150 100 50 0 Min Median Max Performance Target Max+ 31
Convergence Time Oracle Convergence Time (quanta) 60 AL 1 AL 2 50 40 30 20 10 0 Min Median Max Performance Target Max+ 32
Managing Application and System Resources Concurrently Manage multiple applications when clock frequency changes bodytrack: maintain performance, minimize power Application Goals x 264: maintain performance, minimize quality loss Change core allocation to both applications System Actions Experiment Change x 264’s algorithms Maintain performance of both applications when clock frequency changes 33
Results SEEC Management of Multiple Applications bodytrack Clock drops 2. 4 -1. 6 GHz SEEC allocates cores to bodytrack w/o SEEC app misses goals w/o SEEC app exceeds goals x 264 SEEC removes cores from x 264 SEEC adjusts algorithm to meet goals 34
Summary of Case Studies Experiment Demonstrated Benefit of SEEC Dynamic Knobs Maintains application performance in the face of loss of compute resources Performance/Watt Out-performs oracle for static allocation of resources by adapting to fluctuations in input data Performance/Watt with learning Learns models online starting from initial model which is ~10 x off true values Multi-App control Maintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources 35
Outline • Introduction/Motivating Example • The SEEC Framework • Experimental Validation • Conclusions 36
Angstrom’s Hardware Support for Self-Awareness • “If you want to change something, measure it. ” • We need sensors to measure everything we want to manage: – – – Performance (there is a rich body of work in this area) Power Quality Resiliency … 37
Need Performance-level Support for Power Analysis 1. gprof -> pprof? index % time self children called name Power? ? <spontaneous> -----------------------0. 00 0. 05 1/1 main [2] [1] 100. 05 1 report [3] 0. 00 0. 03 8/8 timelocal [6] 0. 00 0. 01 1/1 print [9] 0. 00 0. 01 9/9 fgets [12] 0. 00 12/34 strncmp <cycle 1> [40] 0. 00 8/8 lookup [20] 0. 00 1/1 fopen [21] 0. 00 8/8 chewtime [24] 0. 00 8/16 skipspace [44] -----------------------[2] 59. 8 0. 01 0. 02 8+472 <cycle 2 as a whole> [4] 0. 01 0. 02 244+260 offtime <cycle 2> [7] 0. 00 236+1 tzset <cycle 2> [26] ------------------------ Energy? ? ? 38
Angstrom’s Hardware Support for SEEC Low-power core for SEEC runtime system 39
Angstrom’s System Software and Hardware Support for SEEC Actuators Algorithm Cores Cache Network Bandwidth TLB Size Frequency Memory Bandwidth Registers ALU Width … Enable tradeoffs wherever possible Support fine-grain allocation of resources 40
Conclusions • SEEC is designed to help ease programmer burden – Solves resource allocation problems – Adapts to fluctuations in environment • SEEC has three distinguishing features – Incorporates goals and feedback directly from the application – Allows independent specification of adaptation – Uses an adaptive second order control system to manage adaptation • Demonstrated the benefits of SEEC in several experiments – SEEC can optimize performance per Watt for video encoding – SEEC can adapt algorithms and resource allocation to meet goals in the face of power caps or other changes in environment • SEEC navigates tradeoff spaces – So build more tradeoffs into apps, system software, system hardware, etc. 41
SEEC References • Application Heartbeats Interface: – Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010 • Controlling Heartbeat-enabled Systems: – Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within the Heartbeats framework. CDC 2010 – Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. Fe. BID 2011 • Creating Adaptive Applications: – Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209042. 2009 – Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011. • The SEEC Framework: – Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Management of Multicore Resources. MIT-CSAIL-TR-2011 -016 2011. – Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Computing. MIT-CSAIL-TR-2010 -049 2010. 42