Principal Components Regression Simon Mason simoniri columbia edu
Principal Components Regression Simon Mason simon@iri. columbia. edu Seasonal Forecasting Using the Climate Predictability Tool
Linear Regression in CPT In CPT linear regression is performed using the MLR (multiple linear regression) option. The MLR (multiple linear regression) option allows for more than one predictor: But what happens when we have lots of predictors (k is large)? … 2 Seasonal Forecasting Using the Climate Predictability Tool
Problems with Multiple Linear Regression (MLR) • Multicolinearity - Predictors are strongly correlated. Predicting MAM 1961 – 2010 rainfall for Thailand from NIÑO 4 SSTs: Correlation between NINO 4 Jan and NINO 4 Feb is 0. 97. For the first half of the data (1961 – 1985) only: 3 Seasonal Forecasting Using the Climate Predictability Tool
Problems with Multiple Linear Regression • Multiplicity - Too many predictors from which to choose. 4 Seasonal Forecasting Using the Climate Predictability Tool
Exercise • Using the NINO indices, how well can we predict rainfall over Thailand at increasing lead-times? • Create a file combining 2 or more lead-times as separate predictors. Repeat the calculations using this new file. Does the skill improve? Compare the regression equation for the three predictors with the equations for the three months individually. • Now try calculating a seasonal average of the predictors for lead-times of interest. Which gives the best results: one month predictors, predictors for multiple months, or seasonal predictors? 5 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components The principal components are defined like a weighted average of the original data: If the sum of the “weights” added to 1. 0 then the principal components would be a true weighted average. However, the squares of the weights are made to add to 1. 0; the variance of the original data is then retained. 6 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components Regression Instead of using the original data as predictors, we can use the principal components as predictors in the same simple regression model. The PCR option contains the information in many of the original predictors, and so a complex MLR model can be simplified considerably: 7 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components A principal component is a weighted sum of a set of original variables, with the weights set so that the principal component has maximum variance. Scores and loadings for first principal component of February 1961 – 2000 sea-surface temperatures. 8 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components The score indicates how intensely developed the loading pattern is for each year. ? ? 9 Seasonal Forecasting Using the Climate Predictability Tool
Principal Components Separate patterns (“modes”) of variability can be defined. We can use just a few of these modes to represent the SST variability throughout the domain. Scores and loadings for second principal component of February 1961 – 2000 sea-surface temperatures. 10 Seasonal Forecasting Using the Climate Predictability Tool
Why PCR? When using principal components of sea-surface temperatures the components have desirable features: 1. They explain maximum amounts of variance, and therefore are representative of sea temperature variability over large areas; 2. They are uncorrelated, and so errors in estimating the regression parameters are minimized; 3. Only a few need be retained and so the dangers of fishing are minimized. 11 Seasonal Forecasting Using the Climate Predictability Tool
Summary • Multiple regression has two serious problems: – multicolinearity: if predictors are correlated the coefficients become difficult to understand, and can be very sensitive to the sample; – multiplicity: if there are lots of predictors, the chances of one or more of them working well by accident becomes very large. • Principal components regression can resolve the multicolinearity problem; it can reduce the multiplicity problem. 12 Seasonal Forecasting Using the Climate Predictability Tool
Exercise • Use gridded SSTs to predict Thailand rainfall. What considerations can we apply for selecting an appropriate SST domain and setting the number of modes? 13 Seasonal Forecasting Using the Climate Predictability Tool
web: iri. columbia. edu/cpt/ @climatesociety …/climatesociety CPT Help Desk cpt@iri. columbia. edu
- Slides: 14