Comparing machine learning methods for dynamically identifying future

  • Slides: 26
Download presentation
Comparing machine learning methods for dynamically identifying future water supply vulnerabilities in the northern

Comparing machine learning methods for dynamically identifying future water supply vulnerabilities in the northern California reservoir system Bethany Robinson*1, Jon Herman 1, Jon Cohen 1 University of California, Davis Civil & Environmental Engineering 1 CWEMF Annual Meeting April 23, 2019

Uncertainty in water resources systems 50 -year moving average of inflow to Folsom Reservoir,

Uncertainty in water resources systems 50 -year moving average of inflow to Folsom Reservoir, CA [Data sources: CDEC, USBR CMIP 5 simulations] CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 2

Previous Research – Testing Thresholds From Robinson and Herman, 2018 CWEMF 2019 | Bethany

Previous Research – Testing Thresholds From Robinson and Herman, 2018 CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 3

Water Systems Streamflow Snowpack Precipitation Black Box Water Supply Reliability in 10 years Air

Water Systems Streamflow Snowpack Precipitation Black Box Water Supply Reliability in 10 years Air Temperature Reservoir Storage CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 4

Research Questions Can we use hydrologic variables with machine learning techniques to predict reliability

Research Questions Can we use hydrologic variables with machine learning techniques to predict reliability of the system? How accurate is this prediction? What are the most important hydrologic variables for predicting the northern California reservoir system reliability? Why? Can we simplify this analysis (retaining accuracy) so that it can be widely used by water planners? CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 5

ORCA System Overview • Simulates Shasta, Oroville, and Folsom Reservoirs • SWP and CVP

ORCA System Overview • Simulates Shasta, Oroville, and Folsom Reservoirs • SWP and CVP pumping from Delta • Relies heavily on snowpack-tostreamflow forecasting • Input data include streamflow, snowpack, precipitation, and temperature Figures from Jon Cohen CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 6

How do we use the ORCA model with ML methods? CWEMF 2019 | Bethany

How do we use the ORCA model with ML methods? CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 7

Experiment Design • Reliability is expressed as a 30 -yr rolling average • Each

Experiment Design • Reliability is expressed as a 30 -yr rolling average • Each of the inputs is expressed as multiple rolling averages (AVG) and multiple rolling standard deviations (SD): • • 10, 20, 30, 40, 50 –year rolling windows This means that each input is now 10 different features instead of one • Regression: the 30 -yr Reliability will be predicted for lead times of 0, 1, 5, 10, and 20 years • Classification: the 30 -yr Reliability will be classified as below (positive) or above (negative) a specified threshold for lead times of 0, 1, 5, 10, and 20 years CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 8

Machine Learning Methods Classification Methods Regression Methods K Nearest Neighbors Linear Regression Logistic Regression

Machine Learning Methods Classification Methods Regression Methods K Nearest Neighbors Linear Regression Logistic Regression Polynomial Regression (2 nd Degree) Linear SVM Regression (linear) SVM Regression (2 nd Degree) Gaussian Process Classifier SVM Regression (3 rd Degree) Decision Tree Classifier Decision Tree Regression Random Forest Classifier Random Forest Regression MLP Classifier Ada. Boost Classifier Gaussian Naïve Bayes Quadratic Discriminant Analysis More information at: https: //scikit-learn. org/stable/modules/classes. html#module-sklearn. linear_model CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 9

Training VS Testing Sets The Training Set • All of the data is used

Training VS Testing Sets The Training Set • All of the data is used to train the ML method • All of the data (the same data) is used to predict reliability • These sets should perform better than test sets The Test Set • 70% of the data is used to train the ML method • 30% of the data (different data) is used to predict reliability • These sets should better represent how the method would perform in the real world CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 10

PRELIMINARY RESULTS CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu)

PRELIMINARY RESULTS CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu)

Regression Results – Predicting in Advance CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu)

Regression Results – Predicting in Advance CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu)

Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 13

Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 13

Regression Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 14

Regression Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 14

Classification Results CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 15

Classification Results CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 15

Classification Results – Thresholds CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 16

Classification Results – Thresholds CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 16

Classification Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 17

Classification Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 17

Classification Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 18

Classification Results – Reducing Features CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 18

The Most Important Features Rank 0 -yr Lead Time 1 -yr Lead Time 5

The Most Important Features Rank 0 -yr Lead Time 1 -yr Lead Time 5 -yr Lead Time 10 -yr Lead Time 20 -yr Lead Time 1 30 -yr AVG of Shasta Storage 30 -yr AVG of Oroville Storage 20 -yr AVG of Shasta Storage 10 -yr AVG of Folsom Storage 2 30 -yr AVG of Oroville Storage 30 -yr AVG of Shasta Storage 20 -yr AVG of Folsom Storage 10 -yr AVG of BKL snow -water-eq 3 30 -yr AVG of Folsom Storage 20 -yr AVG of Oroville Storage 20 -yr AVG of Shasta Storage 4 20 -yr AVG of Folsom Storage 30 -yr AVG of Folsom Storage 20 -yr SD of Shasta Storage 10 -yr AVG of Shasta Storage 5 30 -yr SD of Shasta Storage 20 -yr AVG of Shasta Storage 10 -yr AVG of Folsom Storage 20 -yr AVG of Folsom Storage The reliability value can be predicted well using only the last 30 years of reservoir storage from Shasta, Oroville, and Folsom CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 19

The Most Important Features (change in reliability) Rank 1 -yr Lead Time 5 -yr

The Most Important Features (change in reliability) Rank 1 -yr Lead Time 5 -yr Lead Time 10 -yr Lead Time 20 -yr Lead Time 1 50 -yr SD of Shasta min air temp 2 30 -yr SD of Oroville Storage 30 -yr SD of Delta X 2 50 -yr SD of Folsom min air temp 3 10 -yr SD of Folsom Storage 20 -yr SD of Delta X 2 10 -yr AVG of Shasta Storage 10 -yr AVG Shasta Storage 4 50 -yr SD of Folsom max air temp 30 -yr SD of Oroville Storage 50 -yr SD of Folsom Storage 30 -yr SD of Oroville Storage 5 30 -yr SD of Shasta Storage 30 -yr SD of Delta X 2 40 -yr SD of Delta X 2 Change in reliability value can be predicted well with only the past 50 years of minimum air temperature at Shasta CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 20

Conclusions (so far) Can we use hydrologic variables with machine learning techniques to predict

Conclusions (so far) Can we use hydrologic variables with machine learning techniques to predict reliability of the system? How accurate is this prediction? • Yes! • Regression R-squared values are between ~0. 75 and ~0. 98 depending on the lead time and the machine learning method used • Classification True Positive and True Negative Ratios are all above 0. 8 for some thresholds CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 21

Conclusions (so far) What are the most important hydrologic variables for predicting the northern

Conclusions (so far) What are the most important hydrologic variables for predicting the northern California reservoir system reliability? Why? • For predicting reliability value: the average storage of the three reservoirs in the system • For predicting the change in the reliability value: the minimum air temperature at Shasta • The reasons for this are still being investigated CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 22

Conclusions (so far) Can we simplify this analysis (retaining accuracy) so that it can

Conclusions (so far) Can we simplify this analysis (retaining accuracy) so that it can be widely used by water planners? • Yes! • Some machine learning methods are almost as accurate with only one feature as they are with all of them • This could mean that there are very few factors driving overall reliability of this system CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 23

Future/Continuing Work • Expand the classification and regression methods used • Create a temporal

Future/Continuing Work • Expand the classification and regression methods used • Create a temporal analysis showing how accurate the predictions are over the 2000 – 2100 time period • Select the “best” machine learning method • Integrate the chosen ML method into ORCA to make adaptation decisions during simulations • Compare simulation performances with and without adaptations CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 24

Thank you! References: Robinson, B. , and Herman, J. (2018) ‘A framework for testing

Thank you! References: Robinson, B. , and Herman, J. (2018) ‘A framework for testing dynamic classification of vulnerable scenarios in ensemble water supply projections’. Climatic Change, 1 -18. Brekke, L. , Wood, A. , and Pruitt, T. (2014) Downscaled CMIP 3 and CMIP 5 Hydrology Climate Projections: Release of Hydrology Projections, Comparison with Preceding Information, and Summary of User Needs, US Bureau of Reclamation. Available at: https: //gdo- dcp. ucllnl. org/downscaled_cmip_projections/techmemo/BCSD 5 Hydrology. Memo. pdf. Haasnoot, M. , van’t Klooster, S. , & van Alphen, J. (2018). Designing a monitoring system to detect signals to adapt to uncertain climate change. Global Environmental Change, 52, 273 -285. https: //watershed. ucdavis. edu/files/Drought. Report_20160812. pdf CWEMF 2019 | Bethany Robinson (bjrobins@ucdavis. edu) 25

AGU 2018 | Bethany Robinson (bjrobins@ucdavis. edu)

AGU 2018 | Bethany Robinson (bjrobins@ucdavis. edu)