Starfish A Selftuning System for Big Data Analytics

Starfish: A Self-tuning System for Big Data Analytics

Analysis in the Big Data Era Massive Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 9/26/2011 Starfish 2

Hadoop Map. Reduce Ecosystem �Popular solution to Big Data Analytics Java / C++ / R / Python Pig Hive Jaql Oozie Elastic Map. Reduce Hadoop HBase 9/26/2011 Map. Reduce Execution Engine Distributed File System Starfish 3

Practitioners of Big Data Analytics �Who are the users? �Data analysts, statisticians, computational scientists… �Researchers, developers, testers… �You! �Who performs setup and tuning? �The users! �Usually lack expertise to tune the system 9/26/2011 Starfish 4

Tuning Challenges �Heavy use of programming languages for Map. Reduce programs (e. g. , Java/python) �Data loaded/accessed as opaque files �Large space of tuning choices �Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking) �Terabyte-scale data cycles 9/26/2011 Starfish 5

Starfish: Self-tuning System �Our goal: Provide good performance automatically Java / C++ / R / Python Pig Hive Jaql Oozie Elastic Map. Reduce Analytics System Starfish Hadoop HBase Map. Reduce Execution Engine Distributed File System 9/26/2011 Starfish 6

What are the Tuning Problems? Job-level Map. Reduce configuration J 1 Cluster sizing Data layout tuning J 2 J 3 J 4 9/26/2011 Workflow optimization Workload management Starfish 7

Starfish’s Core Approach to Tuning Optimizers Search through space of tuning choices Cluster Job Data layout Profiler Collects concise summaries of execution Workflow Workload What-if Engine Estimates impact of hypothetical changes on execution 1) if Δ(conf. parameters) then what …? 2) if Δ(data properties) then what …? 3) if Δ(cluster properties) then what …? 9/26/2011 Starfish 8

Starfish Architecture Workload Optimizer Profiler Elastisizer Workflow Optimizer What-if Engine Job Optimizer Data Manager Metadata Mgr. 9/26/2011 Intermediate Data Mgr. Starfish Data Layout & Storage Mgr. 9

Map. Reduce Job Execution job j = < program p, data d, resources r, configuration c > split 0 map split 2 map reduce out 0 split 1 map split 3 map reduce Out 1 Two Map Waves 9/26/2011 One Reduce Wave Starfish 10

What Controls MR Job Execution? job j = < program p, data d, resources r, configuration c > �Space of configuration choices: �Number of map tasks �Number of reduce tasks �Partitioning of map outputs to reduce tasks �Memory allocation to task-level buffers �Multiphase external sorting in the tasks �Whether output data from tasks should be compressed �Whether combine function should be used 9/26/2011 Starfish 11

Effect of Configuration Settings Rules-of-thumb settings Two-dimensional projection of a multidimensional surface (Word Co-occurrence Map. Reduce Program) �Use defaults or set manually (rules-of-thumb) �Rules-of-thumb may not suffice 9/26/2011 Starfish 12

Map. Reduce Job Tuning in a Nutshell �Goal: �Challenges: p is an arbitrary Map. Reduce program; c is high-dimensional; … �Profiler Runs p to collect a job profile (concise execution summary) of <p, d 1, r 1, c 1> �What-if Engine Given profile of <p, d 1, r 1, c 1>, estimates virtual profile for <p, d 2, r 2, c 2> �Optimizer 9/26/2011 Enumerates and searches through the optimization space S efficiently Starfish 13

Job Profile �Concise representation of program execution as a job �Records information at the level of “task phases” �Generated by Profiler through measurement or by the What-if Engine through estimation Serialize, Memory map Partition Buffer Sort, [Combine], split Merge [Compress] DFS Read 9/26/2011 Map Collect Spill Starfish Merge 14

Job Profile Fields Dataflow: amount of data flowing through task phases Costs: execution times at the level of task phases Map output bytes Read phase time in the map task Number of spills Map phase time in the map task Number of records in buffer per spill Spill phase time in the map task Dataflow Statistics: statistical information about dataflow Cost Statistics: statistical information about resource costs Width of input key-value pairs I/O cost for reading from local disk per byte Map selectivity in terms of records CPU cost for executing the Mapper record Map output compression ratio CPU cost for uncompressing the input per byte 9/26/2011 Starfish 15

Generating Profiles by Measurement �Goals �Have zero overhead when profiling is turned off �Require no modifications to Hadoop �Support unmodified Map. Reduce programs written in Java or Hadoop Streaming/Pipes (Python/Ruby/C++) �Approach: Dynamic (on-demand) instrumentation �Event-condition-action rules are specified (in Java) �Leads to run-time instrumentation of Hadoop internals �Monitors task phases of Map. Reduce job execution �We currently use Btrace (Hadoop internals are in Java) 9/26/2011 Starfish 16

Generating Profiles by Measurement JVM split 0 Enable Profiling map raw data JVM reduce ECA rules out 0 raw data JVM split 1 map raw data map profile reduce profile job profile Use of Sampling • Profile fewer tasks • Execute fewer tasks JVM = Java Virtual Machine, ECA = Event-Condition-Action 9/26/2011 Starfish 17

What-if Engine Job Profile <p, d 1, r 1, c 1> Input Data Properties <d 2> Possibly Hypothetical Cluster Resources <r 2> Configuration Settings <c 2> What-if Engine Job Oracle Virtual Job Profile for <p, d 2, r 2, c 2> Task Scheduler Simulator Properties of Hypothetical job 9/26/2011 Starfish 18

Virtual Profile Estimation Given profile for job j = <p, d 1, r 1, c 1> estimate profile for job j' = <p, d 2, r 2, c 2> Profile for j (Virtual) Profile for j' Dataflow Statistics Input Data d 2 Cost Statistics Resources r 2 Dataflow Relative Black-box Models Cardinality Models White-box Models Cost Statistics Dataflow White-box Models Costs 9/26/2011 Dataflow Statistics Configuration c 2 Costs Starfish 19

Job Optimizer Job Profile <p, d 1, r 1, c 1> Input Data Properties <d 2> Cluster Resources <r 2> Just-in-Time Optimizer Subspace Enumeration Recursive Random Search What-if calls Best Configuration Settings <copt> for <p, d 2, r 2> 9/26/2011 Starfish 20

Workflow Optimization Space Physical Job-level Configuration Logical Dataset-level Configuration Vertical Packing Inter-job 9/26/2011 Starfish Partition Function Selection Join Selection Inter-job 21

Optimizations on TF-IDF Workflow D 0 <{D}, {W}> … J 1 M 1 R 1 D 1 <{D, W}, {f}> … J 2 M 2 R 2 D 2 <{D}, {W, f, c}> … J 3, J 4 M 3 R 3 M 4 D 0 <{D}, {W}> … J 1, J 2 M 1 R 1 M 2 R 2 … Partition: {D} Sort: {D, W} Logical Physical Optimization D 2 <{D}, {W, f, c}> … J 3, J 4 M 3 R 3 M 4 D 4 <{W}, {D, t}> … 9/26/2011 Starfish Reducers= 50 Compress = off Memory = 400 … … Reducers= 20 Compress = on Memory = 300 … … Legend D = docname f = frequency W = word c = count t = TF-IDF 22

New Challenges �What-if challenges: �Support concurrent job execution �Estimate intermediate data properties �Optimization challenges �Interactions across jobs �Extended optimization space �Find good configuration settings for individual jobs 9/26/2011 Starfish Workflow J 1 J 2 J 3 J 4 23

Cluster Sizing Problem �Use-cases for cluster sizing �Tuning the cluster size for elastic workloads �Workload transitioning from development cluster to production cluster �Multi-objective cluster provisioning �Goal �Determine cluster resources & job-level configuration parameters to meet workload requirements 9/26/2011 Starfish 24

Multi-objective Cluster Provisioning �Cloud enables users to provision clusters in minutes Running Time (min) 1, 200 1, 000 800 600 400 200 0 Cost ($) m 1. small m 1. large m 1. xlarge c 1. medium c 1. xlarge 10. 00 8. 00 6. 00 4. 00 2. 00 0. 00 m 1. small m 1. large m 1. xlarge c 1. medium c 1. xlarge EC 2 Instance Type 9/26/2011 Starfish 25

Experimental Evaluation �Starfish (versions 0. 1, 0. 2) to manage Hadoop on EC 2 �Different scenarios: Cluster × Workload × Data EC 2 Node Type CPU: EC 2 units Mem I/O Perf. Cost /hour #Maps /node #Reds /node Max. Mem /task m 1. small 1 (1 x 1) 1. 7 GB moderate $0. 085 2 1 300 MB m 1. large 4 (2 x 2) 7. 5 GB high $0. 34 3 2 1024 MB m 1. xlarge 8 (4 x 2) 15 GB high $0. 68 4 4 1536 MB c 1. medium 5 (2 x 2. 5) 1. 7 GB moderate $0. 17 2 2 300 MB c 1. xlarge 20 (8 x 2. 5) 7 GB high $0. 68 8 6 400 MB cc 1. 4 xlarge 33. 5 (8) 23 GB very high $1. 60 8 6 1536 MB 9/26/2011 Starfish 26

Experimental Evaluation �Starfish (versions 0. 1, 0. 2) to manage Hadoop on EC 2 �Different scenarios: Cluster × Workload × Data Abbr. Map. Reduce Program Domain Dataset CO Word Co-occurrence Natural Lang Proc. Wikipedia (10 GB – 22 GB) WC Word. Count Text Analytics Wikipedia (30 GB – 1 TB) TS Tera. Sort Business Analytics Tera. Gen (30 GB – 1 TB) LG Link. Graph Processing Wikipedia (compressed ~6 x) JO Join Business Analytics TPC-H (30 GB – 1 TB) TF Term Freq. - Inverse Document Freq. Information Retrieval Wikipedia (30 GB – 1 TB) 9/26/2011 Starfish 27

Job Optimizer Evaluation Speedup over Default Hadoop cluster: 30 nodes, m 1. xlarge Data sizes: 60 -180 GB 60 50 Default Settings 40 Rule-based Optimizer 30 20 Cost-based Optimizer 10 0 TS 9/26/2011 WC LG JO TF Map. Reduce Programs Starfish CO 28

Estimates from the What-if Engine Hadoop cluster: 16 nodes, c 1. medium Map. Reduce Program: Word Co-occurrence Data set: 10 GB Wikipedia True surface 9/26/2011 Estimated surface Starfish 29

Profiling Overhead Vs. Benefit 35 2. 5 30 Speedup over Job run with RBO Settings Percent Overhead over Job Running Time with Profiling Turned Off Hadoop cluster: 16 nodes, c 1. medium Map. Reduce Program: Word Co-occurrence Data set: 10 GB Wikipedia 25 20 15 10 5 0 1. 5 1. 0 0. 5 0. 0 1 5 10 20 40 60 80 100 Percent of Tasks Profiled 9/26/2011 2. 0 Starfish 1 5 10 20 40 60 80 100 Percent of Tasks Profiled 30

Multi-objective Cluster Provisioning Running Time (min) 1, 200 1, 000 800 600 400 200 0 Actual Predicted Cost ($) m 1. small m 1. large m 1. xlarge c 1. medium c 1. xlarge 10. 00 8. 00 6. 00 4. 00 2. 00 0. 00 Actual Predicted m 1. small m 1. large m 1. xlarge c 1. medium c 1. xlarge EC 2 Instance Type for Target Cluster Instance Type for Source Cluster: m 1. large 9/26/2011 Starfish 31

More info: www. cs. duke. edu/starfish Job-level Map. Reduce configuration J 1 Cluster sizing Data layout tuning J 2 J 3 J 4 9/26/2011 Workflow optimization Workload management Starfish 32