Parallelization in Action with SAS Analytic Procedures Robert
Parallelization in Action with SAS Analytic Procedures Robert Cohen Senior Research Statistician Linear Models R&D Copyright © 2003, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand product names are registered trademarks or Trademarks of their respective companies
Your Rise and Shine Menu § Parallelization adds value to the IVC Marketing: I should have slept in § Multithreading to provide parallel execution Boring: I should have left when I had the chance § How do you measure scalability Insulting: This guy thinks I’m a 10 year old § Selected demonstrations Deceiving: The truth, but not the whole truth Copyright © 2003, SAS Institute Inc. All rights reserved. 2
IVC: Parallelization Adds Value Parallel access to data Multithreaded Procedures § Complete today’s analyses faster § Analyze tomorrow’s problems within today’s time constraints Copyright © 2003, SAS Institute Inc. All rights reserved. 3
The IVC in Action V C Copyright © 2003, SAS Institute Inc. All rights reserved. I 4
Changes You Have to Make in Your Legacy Code TINSTAAFL Copyright © 2003, SAS Institute Inc. All rights reserved. There are 5
Unthreaded GLM: 2 CPU Box Thread View: Exited Running Waiting I/O Blocked § GLM runs in a single thread § GLM never blocks this thread § GLM work is NOT done in parallel Copyright © 2003, SAS Institute Inc. All rights reserved. 6
Unthreaded GLM: 2 CPU Box Thread View: Exited CPU Utilization: CPU 2 Copyright © 2003, SAS Institute Inc. All rights reserved. Running Waiting I/O Blocked CPU 1 7
Unthreaded GLM: 2 CPU Box Thread View: Exited Running Waiting I/O Blocked Combined CPU Utilization 100 50. Copyright © 2003, SAS Institute Inc. All rights reserved. 8
Thread View: Exited Multithreaded GLM: 1 Active Thread 2 CPU Box Running Waiting I/O Blocked Invert GLMX’X Thread matrix § Worker threads used for specific tasks § GLM thread blocks while a worker thread is active § GLM does not execute in parallel Copyright © 2003, SAS Institute Inc. All rights reserved. 9
Thread View: Exited Multithreaded GLM: 1 Active Thread 2 CPU Box CPU Utilization: CPU 2 Copyright © 2003, SAS Institute Inc. All rights reserved. Running Waiting I/O Blocked CPU 1 10
Thread View: Exited Multithreaded GLM: 1 Active Thread 2 CPU Box Running Waiting I/O Blocked Combined CPU Utilization 100 50. 0. Copyright © 2003, SAS Institute Inc. All rights reserved. 11
Multithreaded GLM: 2 Active Threads 2 CPU Box Thread View: Exited Running Waiting I/O Blocked GLMX’X Thread Invert matrix § GLM thread spawns off worker threads § Two independent worker threads per task § Work is done in parallel Copyright © 2003, SAS Institute Inc. All rights reserved. 12
Multithreaded GLM: 2 Active Threads 2 CPU Box Thread View: Exited CPU Utilization: CPU 2 Copyright © 2003, SAS Institute Inc. All rights reserved. Running Waiting I/O Blocked CPU 1 13
Multithreaded GLM: 2 Active Threads 2 CPU Box Thread View: Exited Running Waiting I/O Blocked Combined CPU Utilization 100 50. Copyright © 2003, SAS Institute Inc. All rights reserved. 14
Multithreaded GLM: 4 Active Threads 2 CPU Box Thread View: Exited Copyright © 2003, SAS Institute Inc. All rights reserved. Running Waiting I/O Blocked 15
Threading Comparison Multithreaded GLM: 2 CPU Box Thread View: Exited Copyright © 2003, SAS Institute Inc. All rights reserved. Running Waiting I/O Blocked 16
Amdahl’s Law Not Scalable PF = 80% CPUs Speedup 1 1. 00 2 4 8 16 32 Copyright © 2003, SAS Institute Inc. All rights reserved. 1. 6 7 2. 5 0 3. 3 3 4. 0 0 4. 4 4 17
Amdahl’s Law Parallelizable Fraction 100 % 99% 95 %90% 80 % 60% Copyright © 2003, SAS Institute Inc. All rights reserved. 18
Scalability in PROC REG: Wide Data and Scalar I/O Test Details 50, 000 observations 500 predictors Speedups Linear Amdahl, PF=93% Stepwise Selection Scalar I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 19
Scalability in PROC REG: Wide Data and Scalar I/O Test Details 50, 000 observations 500 predictors Stepwise Selection Speedups Linear Amdahl, PF=93% Achieved Scalar I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 20
Scalability in PROC REG: Narrow Data, Parallel I/O Test Details Speedups 4 million observations Linear 20 predictors Amdahl, PF=99. 9% Parallel I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 21
Scalability in PROC REG: Narrow Data, Parallel I/O Test Details Speedups 4 million observations Linear 20 predictors Amdahl, PF=99. 9% Achieved Parallel I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 22
Scalability in PROC DMREG Test Details 500, 000 observations Predictors: 50 continuous 15 classification Speedups Linear Amdahl, PF=93% Logistic model Parallel I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 23
Scalability in PROC DMREG Test Details 500, 000 observations Predictors: 50 continuous 15 classification Speedups Linear Amdahl, PF=93% Achieved Logistic model Parallel I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 24
Baseline Speedup and Scalability in PROC DMREG Test Details 500, 000 observations Predictors: 50 continuous 15 classification Logistic model Speedups Linear Amdahl, PF = 93% Achieved V 9/V 8*** Parallel I/O Copyright © 2003, SAS Institute Inc. All rights reserved. 25
Scalability in PROC GLM Test Details 6000 observations 4 classification variables Speedups Linear Amdahl, PF = 98% 2000 parameters Copyright © 2003, SAS Institute Inc. All rights reserved. 26
Scalability in PROC GLM Superlinear Scalability! Test Details 6000 observations 4 classification variables Speedups Linear Amdahl, PF = 98% Achieved 2000 parameters Copyright © 2003, SAS Institute Inc. All rights reserved. 27
Scalability in PROC LOESS Test Details 4000 observations Speedups 18 models Linear evaluated Amdahl, PF=95% Confidence limits for selected model Copyright © 2003, SAS Institute Inc. All rights reserved. 28
Scalability in PROC LOESS Test Details 4000 observations Speedups 18 models Linear evaluated Amdahl, PF=95% Achieved Confidence limits for selected model Copyright © 2003, SAS Institute Inc. All rights reserved. 29
Scalability in PROC LOESS Test Details 4000 observations Speedups 1 model specified Linear Confidence limits for specified model Amdahl, PF=99% Copyright © 2003, SAS Institute Inc. All rights reserved. 30
Scalability in PROC LOESS Test Details 4000 observations Speedups 1 model specified Linear Confidence limits for specified model Amdahl, PF=99% Achieved Copyright © 2003, SAS Institute Inc. All rights reserved. 31
Partially Multithreaded Procedures §Base SAS • PROC SORT • PROC SUMMARY • SQL (Group by, Order by) §Enterprise Miner • PROC DMDB • PROC DMREG • PROC DMINE Copyright © 2003, SAS Institute Inc. All rights reserved. §SAS/STAT • • PROC GLM PROC LOESS PROC REG PROC ROBUSTREG NOTE: Not all usages of these procedures are scalable. Your mileage may vary! 32
Reading Between the Lines § Parallelization adds value to the IVC Analyze bigger volumes of data § Multithreading to provide parallel execution Not as boring as I feared § How do you measure scalability Predicting scalability is a subtle task § Selected demonstrations Some of my jobs will run faster in SAS 9 Copyright © 2003, SAS Institute Inc. All rights reserved. 33
Questions and hopefully answers Copyright © 2003, SAS Institute Inc. All rights reserved. 34
- Slides: 34