Introduction to Experiment Design Refs Chap 16 17

  • Slides: 32
Download presentation
Introduction to Experiment Design Refs: Chap 16 -17 of Raj Jain’s book Shiv Kalyanaraman

Introduction to Experiment Design Refs: Chap 16 -17 of Raj Jain’s book Shiv Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse. rpi. edu http: //www. ecse. rpi. edu/Homepages/shivkuma Adapted from Prof. Raj Jain’s slides Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1

Why Experiment design? q Problem: validation and results only as good as your test

Why Experiment design? q Problem: validation and results only as good as your test cases! q How to design a critical set of test cases ? q Idea: Parameterize and use black-box strategy Parameters or Factors System Metrics Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2

Performance, Metrics and Parameters q Performance questions: q Absolute: How fast does computer A

Performance, Metrics and Parameters q Performance questions: q Absolute: How fast does computer A run MY program ? Relative: Is machine A faster than machine B, and if so, how much faster ? q q Parameters: factors or inputs q Eg: clock rate, poisson inter-arrivals q Metrics: functions of factors or parameters q Eg: throughput, response time, queue length. . . q Metric should characterize the design tradeoffs adequately q Metrics are usually functions of many factors. Use of one factor alone may be misleading. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3

Choose metrics to reflect appropriate tradeoffs q q q Network users: services and performance

Choose metrics to reflect appropriate tradeoffs q q q Network users: services and performance that their applications need, e. g. , guarantee that each message it sends will be delivered without error within a certain amount of time Network designers: cost-effective design e. g. , that network resources are efficiently utilized and fairly allocated to different users q Users + designer perspectives enough to meet 3 factors of interface/architecture design. But. . . Network providers: system that is easy to administer and manage e. g. , that faults can be easily isolated and it is easy to account for usage q Require management tools and interfaces. q Often considered to the basic protocol interface design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4

Goals of Experiment Design q q q q Design a proper set of (simulation

Goals of Experiment Design q q q q Design a proper set of (simulation or measurement) experiments: maximize information gained with minimum experiments! Develop a (regression) model that best describes the data obtained Estimate the contribution of each factor and its values (I. e alternatives) to the performance variation Isolate measurement errors Estimate confidence intervals for model parameters Check if the alternatives are significantly different Check if the model is adequate Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5

Example: TCP/AQM Design Problem TCP versions: Reno or SACK q Max Segment sizes: 100

Example: TCP/AQM Design Problem TCP versions: Reno or SACK q Max Segment sizes: 100 B vs 1000 B q Buffer Size: 10 pkts vs 100 pkts q AQM: Drop tail vs RED vs REM q AQM parameters … q Workload: FTP q Configuration parameters: 3 flows vs 10 flows q q Metrics: total throughput (efficiency), C. o. V of per -flow throughputs (fairness) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6

Terminology q q q q Response Variable: Outcome. Eg: response time Factors: Variables that

Terminology q q q q Response Variable: Outcome. Eg: response time Factors: Variables that affect the response variable q Eg: buffer size, RED parameters Levels: The values that a factor can assume q Eg: TCP type has 2 levels: SACK or Reno Primary Factors: The “interesting” factors whose effects need to be quantified. Secondary Factors: Factors whose impact need not be quantified q Eg: We have excluded minor factors like receiver window size, or delay ack (whose effects we roughly know and/or consider minor). Some workload parameters also are secondary factors Replication: Repetition of all or some experiments Design: The number of experiments, the factor level, and number of replications for each experiment q Eg: Full Factorial Design with replications Interaction: Effect of one factor depends upon other factors! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7

Experiment Design Problem q q q A system with k parameters and ni level

Experiment Design Problem q q q A system with k parameters and ni level for parameter xi and a performance metric f f=s(x 1, x 2, …, xk) can be evaluated by an experiment, I. e. simulation or measurement or analytical formula Find the effect of each parameter or parameter interaction on the system performance, the best parameter combination, etc. Parameters or Factors System Metrics Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8

Simple Experiment Design q q q Start with a typical configuration Vary one parameter

Simple Experiment Design q q q Start with a typical configuration Vary one parameter at a time Number of experiments: Disadvantage: only good for the system with no interaction between parameters q Wrong conclusions if the factors have interaction Not statistically efficient Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9

Full factorial Design Try every possible combination of all the parameters. q Number of

Full factorial Design Try every possible combination of all the parameters. q Number of experiments q q Disadvantage: too many experiments Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10

Reduce experiment number? q Reduce the number of levels for each parameter, e. g.

Reduce experiment number? q Reduce the number of levels for each parameter, e. g. , reduce ni to 2 levels for each parameter q Reduce the number of parameters q Use fractional factorial design: q Try a subset of the all possible parameter combinations q Less information q May not get all interactions q Not a problem if negligible interactions Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11

2 k Full Factorial Design q Using a nonlinear regression model (assuming 2 parameters,

2 k Full Factorial Design q Using a nonlinear regression model (assuming 2 parameters, x 1, x 2) y = q 0 + q 1 x 1+ q 2 x 2 + q 3 x 3 + q 12 x 1 x 2 + q 23 x 2 x 3 + q 13 x 1 x 3 + q 123 x 1 x 2 x 3 q q Goal: Run 2 k experiment (e. g. , take the two extreme values of each parameter) and solve the above equation for the regression parameters: q 0, q 1, q 2, q 12, q 23, q 123. Disadvantage: only good for systems where effects of parameters or interactions are monotonous q i. e. , the system has to be consistent with the model Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12

2 k-p Fractional Factorial Design q Use a simplified regression model, ignore some interactions

2 k-p Fractional Factorial Design q Use a simplified regression model, ignore some interactions (especially high-order interactions since their effect are usually small) q For example, 23 -1 design: y=q 0 + q 1 x 1+ q 2 x 2 + q 3 x 3 All interactions are ignored, only 4 unknowns, only 23 -1 experiments are need to solve them Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13

Simple Full Factorial Problem Qn: What is the effect of memory & cache on.

Simple Full Factorial Problem Qn: What is the effect of memory & cache on. Polytechnic workstation performance ? Rensselaer Institute 14 Shivkumar Kalyanaraman

Underlying Regression Model q q Interpretation: Mean performance = 40 MIPS Effect of memory

Underlying Regression Model q q Interpretation: Mean performance = 40 MIPS Effect of memory = 20 MIPS Effect of cache = 10 MIPS Interaction between memory and cache = 5 MIPS Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15

Computation of Effects (I. e. coeffs) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16

Computation of Effects (I. e. coeffs) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16

Computation of Effects (cont’d) q Note: effects (qj) are linear combinations of responses (yi)

Computation of Effects (cont’d) q Note: effects (qj) are linear combinations of responses (yi) q Sum of the coefficients is zero: aka “contrasts” Note that the above equations can be expressed in terms of column A, B etc ! => Sign table method! q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17

Sign Table Method Each column multiplies the responses (yi) ; q Sum the multiples,

Sign Table Method Each column multiplies the responses (yi) ; q Sum the multiples, and Shivkumar Kalyanaraman q Divide the sums by 2 k Rensselaer Polytechnic Institute q 18

Allocation of Variation A. k. a. which factors and alternatives matter! q Importance of

Allocation of Variation A. k. a. which factors and alternatives matter! q Importance of a factor: proportion of variation explained by that factor q Note: variation is not variance! q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19

Allocation of Variation: 22 design q Fractions explained by B and interaction between A

Allocation of Variation: 22 design q Fractions explained by B and interaction between A & B can be obtained similarly Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20

Allocation of Variation: Example 22 design Note: A is most impt factor! Shivkumar Kalyanaraman

Allocation of Variation: Example 22 design Note: A is most impt factor! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21

General 2 k Factorial Designs EXAMPLE Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22

General 2 k Factorial Designs EXAMPLE Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22

2 k Factorial Design Example Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23

2 k Factorial Design Example Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23

Allocation of Variation q Factor C (I. e. number of processors) is the most

Allocation of Variation q Factor C (I. e. number of processors) is the most important (accounts for 71% of performance variation!) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24

Exercise Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25

Exercise Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25

Experiment Designer Package Run 2 k full factorial or 2(k-p) fractional factorial design q

Experiment Designer Package Run 2 k full factorial or 2(k-p) fractional factorial design q Executable: edesign –s <configuration_file> q Experiment results are in designer. res q Configuration File Experiment Script edesign Designer. res Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26

Example Configuration File Script. File my_script. tcl Log. On y Designer Factorial P 0

Example Configuration File Script. File my_script. tcl Log. On y Designer Factorial P 0 [parameters] #name min max x 1 -2 2 x 2 -2 2 [/parameters] Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27

Example Configuration File Farmer localhost: 6666: farmer: any Worker localhost: worker [deployer]. /get_effects. pl

Example Configuration File Farmer localhost: 6666: farmer: any Worker localhost: worker [deployer]. /get_effects. pl designer. res [/deployer] worker E R R farmer R E … edesign E worker Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28

Example Experiment Script #SETPARAMETERS return_result [expr $x 1*$x 1+$x 2*$x 2] edesign Generate a

Example Experiment Script #SETPARAMETERS return_result [expr $x 1*$x 1+$x 2*$x 2] edesign Generate a set of parameter values Original Experiment Script Return experiment result Farmer-worker System Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29

Example Output(design. res) Format of designer. res 1 st parameter, 2 nd parameter, …,

Example Output(design. res) Format of designer. res 1 st parameter, 2 nd parameter, …, metric q Example -1, -1, -0. 999 1, -1, -1, 1. 001 -1, 1, -1, 1. 001 q get-effects. pl will process this file and output the effect and variation percentage of each parameter (interaction). q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30

Example get-effects. pl Output The parameter index represents: 1 x 1 2 x 2

Example get-effects. pl Output The parameter index represents: 1 x 1 2 x 2 Parameter(s) Effect Variation Percentage --------------------------------none 40 1 20 76. 19% 2 10 19. 05% 12 5 4. 76% ----------------------------Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31

What you need to do? Finish the tcl script q Open a terminal window,

What you need to do? Finish the tcl script q Open a terminal window, run start_system. sh q Open another terminal windows, run: edesign –s ex_ff. conf q Sit and wait to see the output of get-effects. pl q Any problem, ask Yong or me. q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32