Deutscher Wetterdienst Bootstrapping using different methods to estimate

Deutscher Wetterdienst Bootstrapping – using different methods to estimate statistical differences between model errors Ulrich Damrath COSMO GM Rome 2011

Some typical situations occuring during operational verification: ahhdfkfflflflfkfkfkjdjdddnbdnnnd Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Questions: è 1. Question: Are the differences of scores due to noise or are they statistical significant? è 2. Question: Are there significant differences between the quality of different models? (Interests user of forecasts) è 3. Question: Are there significant differences between the quality of models for different situations? (Interests developers of models) è Problem: BIASes may be normal distributed, but RMSEs? è A possible solution: Application of bootstrap techniques to get confidence intervals or quantiles of the distribution è 1. Question concerning the bootstrap method: How many replications are necessary to get stable statistical results? è 2. Question concerning the bootstrap method: How should the sample data be grouped in order to avoid autocorrelation effect? Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

The principle of bootstrapping for a sample with 10 elements Realisation 1: mean value using elements: 5 3 8 7 8 4 7 0 4 3 Realisation 2: mean value using elements: 3 2 0 5 1 2 0 2 2 8 Realisation 3: mean value using elements: 5 2 3 6 8 3 8 0 8 6 Realisation 4: mean value using elements: 7 5 1 6 4 0 1 2 1 6 Realisation 5: mean value using elements: 6 5 8 6 1 0 0 2 3 2 Realisation 6: mean value using elements: 1 0 5 5 6 5 8 5 5 8 Realisation 7: mean value using elements: 3 4 4 4 2 8 5 3 2 6 Realisation 8: mean value using elements: 0 8 2 0 6 4 1 6 6 5 Realisation 9: mean value using elements: 0 7 5 6 3 2 2 3 8 8 Realisation 10: mean value using elements: 2 2 3 6 6 2 0 0 The mean value of all realisations (replications) gives the bootstrap mean. The standard deviation of all mean values gives the bootstrap standard deviation as Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Bootstrap properties for three analytical cases Number of sample values: 31 Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Bootstrap properties for three analytical cases Number of sample values: 310 Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Bootstrap properties for three analytical cases Number of sample values: 3100 Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Bootstrap properties for three analytical cases Number of sample values: 31000 Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Bootstrap properties for three analytical cases Number of sample values: 310000 Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Conclusion concerning the convergence of the method: A number of ~500 replications seems to be appropriate to get a stable value for the bootstrap variance. Setting the sample characteristics: Treating each pair of observations and forecasts as a single sample member leeds to large sample sizes with relatively high autocorrelation. Therefore values are grouped by blocks of one, two and four days. Additionally, a block size was constructed using the optimal block length LOPT which can be estimated by with ‚a‘ as a function of autocorrelation and N as sample size. Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

The real world: Dependence of bootstrap standard deviation and bootstrap confidence intervals on the number of replications 2 m-temperature forecasts during Summer 2010 and 10 m-wind speed during Winter 2010/2011. BIASes for different periods, models and weather elements Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

The real world: Dependence of bootstrap standard deviation and bootstrap confidence intervals on the number of replications 2 m-temperature forecasts during Summer 2010 and 10 m-wind speed during Winter 2010/2011. RMSEs for different periods, weather elements and types of mean wind direction over Germany (700 h. Pa) Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Quantiles 10% and 90% for different bootstrap types, Period 01. 06. 2010 – 31. 08. 2010 COSMO-EU (solid), COSMO-DE (dotted), Element Temperature 2 m Top: Median and quantiles (green: overlapping quantiles, red: no overlapping quantiles) Bottom: another visualisation of the overlapping intervals (bluish: overlapping intervals, deep red: no overlapping intervals) Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Quantiles 10% and 90% for different bootstrap types, Period 01. 06. 2010 – 31. 08. 2010 COSMO-EU (solid), COSMO-DE (dotted), Element Wind speed 10 m Top: Median and quantiles (green: overlapping quantiles, red: no overlapping quantiles) Bottom: another visualisation of the overlapping intervals (bluish: overlapping intervals, deep red: no overlapping intervals) Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Comparison of overlapping quantile intervals for different wind directions NW: north westerly flow, SW: south westerly flow, NO: north easterly flow, SO: south easterly flow Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Some typical situations occuring during operational verification in 2009, 2010 and 2011: Modification of turbulent mixing length May 2009: Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

Conclusions: è Different types of grouping the samples lead to different result concerning the statistical significance of the model errors. è Block methods give more or less equivalent results. è The results for the comparison of different models may users lead to a decision which model should be used. è The results for different weather types (flow directions) may developers give some hints concerning the development of model physics. Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011

References: Efron, B. , Tibshirani, R. J. (1993): An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics & Applied Probability) Mudelsee, M. (2010): Climate Time Series Analysis – Classical Statistical and Bootstrap Methods, Springer Dordrecht, Heidelberg, London, New York Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011