Workload Characterization and Performance Assessment of Yellowstone using

  • Slides: 29
Download presentation
Workload Characterization and Performance Assessment of Yellowstone using XDMo. D and Exploratory data analysis

Workload Characterization and Performance Assessment of Yellowstone using XDMo. D and Exploratory data analysis (EDA) Ying Yang, SUNY, University at Buffalo Mentor: Tom Engel, NCAR Co-Mentors: Shawn Strande, Dave Hart, NCAR 1 August 2014

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 2

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 3

Background • What is XDMo. D? Open XDMo. D is an open source tool

Background • What is XDMo. D? Open XDMo. D is an open source tool designed to audit and facilitate the utilization of supercomputers by providing a wide range of metrics on resources, including resource utilization, resource performance, and impact on scholarship and research. XDMo. D is an acronym for "XSEDE Metrics on Demand” developed by the University of Buffalo for NSF's XSEDE under NSF grant OCI 1025159 4

Background • XDMo. D Architecture Details 5

Background • XDMo. D Architecture Details 5

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 6

XDMo. D and Yellowstone Job Data • XDMo. D runs on a dedicated server

XDMo. D and Yellowstone Job Data • XDMo. D runs on a dedicated server at NWSC, and that software was installed and configured by the SSG group • Collaborated with CISL and SUNY at Buffalo developers to test a new shredder for ingesting LSF job termination accounting records. • Shredded and ingested all of the LSF accounting data from Yellowstone, Geyser, and Caldera (November 2012 to the present) into open XDMo. D. Total 7111011 job records are shredded. 6810231 jobs are ingested. 7

XDMo. D and Yellowstone Job Data 8

XDMo. D and Yellowstone Job Data 8

XDMo. D and Yellowstone Job Data LSF 9

XDMo. D and Yellowstone Job Data LSF 9

XDMo. D and Yellowstone Job Data LSF Yellowstone Shredded Data Super. Mo. D REST

XDMo. D and Yellowstone Job Data LSF Yellowstone Shredded Data Super. Mo. D REST Service API Yellowstone Ingested Data 10

XDMo. D and Yellowstone Job Data • XDMo. D’s Summary tab 11

XDMo. D and Yellowstone Job Data • XDMo. D’s Summary tab 11

XDMo. D and Yellowstone Job Data • XDMo. D’s Metric Explorer (CPU time group

XDMo. D and Yellowstone Job Data • XDMo. D’s Metric Explorer (CPU time group by user) 12

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 13

Enhancement of XDMo. D for Yellowstone • Two new metrics (1) Job Size: Weighted

Enhancement of XDMo. D for Yellowstone • Two new metrics (1) Job Size: Weighted By Core Hours (Core Count): The average NCAR job size weighted by Core hours. Defined as: sum(i = 0 to n){job i core count*job i core hours consumed }/sum(i = 0 to n){job i core hours consumed}. 14

Enhancement of XDMo. D for Yellowstone • XDMo. D’s Average Job Size 15

Enhancement of XDMo. D for Yellowstone • XDMo. D’s Average Job Size 15

Enhancement of XDMo. D for Yellowstone • Sophie’s Job Size Weighted By Core Hours

Enhancement of XDMo. D for Yellowstone • Sophie’s Job Size Weighted By Core Hours (Core Count) 16

Enhancement of XDMo. D for Yellowstone • Two new metrics (2) Yellowstone %Scheduled: The

Enhancement of XDMo. D for Yellowstone • Two new metrics (2) Yellowstone %Scheduled: The percentage of resources scheduled to be utilized by jobs running on Yellowstone Scheduled Utilization: The ratio of the total scheduled CPU hours to Yellowstone jobs over a given time period divided by the total CPU hours that the system could have potentially provided during that period. 17

Enhancement of XDMo. D for Yellowstone • Yellowstone %Scheduled: (by job size) Many 144

Enhancement of XDMo. D for Yellowstone • Yellowstone %Scheduled: (by job size) Many 144 -node (only 1 core per node) jobs are running. 18

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 19

Additional Analyses of Yellowstone Job Data Exploratory data analysis with ingested data using R

Additional Analyses of Yellowstone Job Data Exploratory data analysis with ingested data using R Ø Question: What is the average job size and how has it varied over time? Ø Methods: • • Forecasting Using Exponential Smoothing Forecasting Using ARIMA Model Multiple Linear Regression K-Nearest Neighbor Ø Experiments and Results 20

Additional Analyses of Yellowstone Job Data Ø Methods: Exponential Smoothing a) Simple Exponential Smoothing

Additional Analyses of Yellowstone Job Data Ø Methods: Exponential Smoothing a) Simple Exponential Smoothing An additive model with constant level and no seasonality b) Holt’s Exponential Smoothing An additive model with increasing or decreasing trend and no seasonality c) Holt-Winters Exponential Smoothing An additive model with increasing or decreasing trend and seasonality 21

Additional Analyses of Yellowstone Job Data Ø Methods: ARIMA Model Autoregressive Integrated Moving Average

Additional Analyses of Yellowstone Job Data Ø Methods: ARIMA Model Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the irregular component of a time series, that allows for non -zero autocorrelations in the irregular component. Building the Model: Step 1: Differencing a Time Series (diff() function) Step 2: Selecting a Candidate ARIMA Model(acf(), pacf() function) Step 3: Forecasting Using an ARIMA Model 22

Additional Analyses of Yellowstone Job Data Ø Experiments • Naive method • Mean method

Additional Analyses of Yellowstone Job Data Ø Experiments • Naive method • Mean method • Drift method (week and month) • Simple Exponential Smoothing (SES) • Holt’s Exponential Smoothing (HES) • Holt-Winters Exponential Smoothing (HWES) • ARIMA Model • Multiple Linear Regression • K-Nearest Neighbor Descriptions: • • Data: data in 2013, total days: 364. Day 1 -308 as training data, day 309 -364 as testing data. Prediction error: the percentage that the difference of predicted value and true value taking of the true value. Naive, Mean and Drift methods serve as performance comparisons. ES methods are predicted using all days before the predicting day. (e. g. day 1 -100 predict 101, day 1 -101 predict 102, . . ) 23

Additional Analyses of Yellowstone Job Data Ø Experiment Results 24

Additional Analyses of Yellowstone Job Data Ø Experiment Results 24

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of

Big Picture • Background • XDMo. D and Yellowstone Job Data • Enhancement of XDMo. D for Yellowstone • Additional Analyses of Yellowstone Job Data • Summary & Future Work 25

Summary & Future Work Summary: • Ingested all Yellostone accounting data into XDMo. D(November

Summary & Future Work Summary: • Ingested all Yellostone accounting data into XDMo. D(November 2012 -Present) • Developed two new metrics for Yellowstone and contribute back to open source • Exploratory data analysis using R Future Work: • Enhancement of XDMo. D • Further data analysis on Yellowstone data • Integrate EDA into XDMo. D 26

Acknowledgements HSS and USS: Tom Engel Shawn Strande Dave Hart Davide Del Vento Pamela

Acknowledgements HSS and USS: Tom Engel Shawn Strande Dave Hart Davide Del Vento Pamela Gillman Erich Thanhardt Irfan Elahi IMAGe: Doug Nychka HSS USS CSG DASG MSSG SCSG (Mentor) (Co-Mentor) IMAGe 27

28

28

Ying Yang yyang 25@buffalo. edu 29

Ying Yang yyang 25@buffalo. edu 29