Introduction to the Climate Predictability Tool CPT Simon

Introduction to the Climate Predictability Tool (CPT) Simon J. Mason simon@iri. columbia. edu International Research Institute for Climate and Society The Earth Institute of Columbia University CIMH Workshop on the Climate Predictability Tool Bridgetown, Barbados, July / August 2013

Workshop Goal Understanding, and sustainable creation and communication of quality-controlled seasonal climate forecasts for the Caribbean.

What is CPT? Climate Predictability Tool (CPT) is an easy-to-use software package for making tailored and downscaled seasonal climate forecasts. Versions: • Windows 95+ • Batch

CPT Use CPT downloads since 2005:

Seasonal forecasting I: empirical

Seasonal forecasting II: dynamical

What is CPT? CPT is designed to produce statistical forecasts of seasonal climate using either the output from a GCM, or empirical predictors. Features: • Model training • Validation • Verification • Flexible forecasts

Linear Regression We can use simple linear regression to predict rainfall using a single predictor such as the Nino 3. 4 index. June NIÑO 3. 4 index as a predictor of Jul – Sep 1971 – 2010 rainfall over the eastern Caribbean

Linear Regression Or we can use the GCM rainfall as the predictor. GCM prediction of Jul – Sep 1971 – 2010 rainfall over the eastern Caribbean

Linear regression in CPT In CPT linear regression is performed using the MLR (multiple linear regression) option. Similarly the GCM option applies a simple linear regression using nearest or interpolated gridboxes for each station: Through the goodness index CPT indicates how well the model predicts the Y data, not how well it describes the Y data.

Cross-validation

Retroactive forecasting Given data for 1951 -2000, it is possible to calculate a retroactive set of probabilistic forecasts. CPT will use an initial training period to cross-validate a model and make predictions for the subsequent year(s), then update the training period and predict additional years, repeating until all possible years have been predicted.

Principal Components A principal component is like a weighted average of a set of original variables. (More strictly, it is a weighted sum, because the weights do not add up to 1. 0). Scores and loadings for first principal component of August 1961 – 2000 sea-surface temperatures.

Principal Components Separate patterns (“modes”) of variability can be defined. We can use just a few of these patterns to represent the SST variability everywhere in the domain. Scores and loadings for second principal component of August 1961 – 2000 sea-surface temperatures.

Why PCR? When using principal components of sea-surface temperatures the components have desirable features: 1. They explain maximum amounts of variance, and therefore are representative of sea temperature variability over large areas; 2. They are uncorrelated, and so errors in estimating the regression parameters are minimized; 3. Only a few need be retained and so the dangers of fishing are minimized.

Principal Components Regression (PCR) • The weights U are defined so that the principal components have maximum variance, and are uncorrelated. • In CPT, to use the kth principal component, the first k 1 principal components must also be used. • Suitable for multiple predictors, and one or a small number of uncorrelated predictands.

Selecting models in CPT MLR can be used when there is one or a very small number of predictors. GCM is a special case – a single predictor is used from the nearest, or an interpolated, gridpoint. PCR can be used to address problems with MLR that arise when there are many predictors. But what if there are many predictands?

What is Canonical Correlation Analysis CCA? June SSTs r=0. 97 JAS rainfall

GCM What is CCA? JAS rainfall r=0. 83 GPCC

Canonical Correlation Analysis (CCA) • The weights VX and VY are defined so that ZX and ZY have maximum correlation. • In CPT, the CCA is performed using principal components of X and Y to avoid over-fitting. • Suitable for multiple predictors, and multiple predictands. • Predictions are spatially consistent.

Prediction If we construct a regression model, we can get a best guess estimate of Y given new X:

Prediction error variance If the best guess value is right on the upper tercile, the above-normal category will have 50% probability.

Low probability of normal This problem often occurs if there are “outliers” in the data – the assumption of normally distributed data is invalid.

Low probability of normal The problem can often be solved by switching on Options ~ Data ~ Transform Y Data.