Statistical PostProcessing of Ensembles Bo Cui 1 Zoltan

Statistical Post-Processing of Ensembles Bo Cui 1, Zoltan Toth 2, Yuejian Zhu 2, Stéphane Beauregard 3, David Unger 4 , Richard Wobus 1 1 SAIC at Environmental Modeling Center, NCEP/NWS 2 Environmental Modeling Center, NCEP/NWS 3 Canadian Meteorological Centre, Meteorological Service of Canada 4 Climate Prediction Center, NCEP/NWS Acknowledgements Richard Verret, Poulin Lewis CMC/MSC Dingchen Hou EMC/NCEP/NWS/NOAA David Michaud, Brent Gorden, Luke Lin NCO/NCEP/NWS/NOAA

Statistical Post-Processing Issues GOAL • Improve reliability while maintaining resolution in NWP forecasts Reduce systematic errors (improve reliability) while Not increasing random errors (maintaining resolution) • Retain all useful information in NWP forecast METHODOLOGY • Use bias-free estimators of systematic error • Need methods with fast convergence using small sample APPROACH – Computational efficiency • Bias Correction : remove lead-time dependent bias on model grid Working on coarser model grid allows use of more complex methods Feedback on systematic errors to model development • Downscaling: downscale bias-corrected forecast to finer grid Further refinement/complexity added • No dependence on lead time 2

Bias Correction Method & Application Bias Correction Techniques – array of methods Estimate/correct bias moment by moment (e. g. , D. Unger et al. ). • Simple approach, implemented partially • May be less applicable for extreme cases Bayesian approach (e. g. , Roman Krzysztofovicz) • Allows simultaneous adjustment of all modes considered, under development Moment-based method at NCEP: apply adaptive (Kalman Filter type) algorithm decaying averaging mean error = (1 -w) * prior t. m. e + w * (f – a) For separated cycles, each lead time and individual grid point, t. m. e = time mean error 6. 6% 3. 3% 1. 6% Toth, Z. , and Y. Zhu, 2001 • Test different decaying weights. 0. 25%, 0. 5%, 1%, 2%, 5% and 10%, respectively • Decide to use 2% (~ 50 days) decaying accumulation bias estimation

First Implementation of Bias Correction in NAEFS ( May 2006 ) 1. Bias corrected members of joint MSC-NCEP ensemble • • 2. Decaying accumulated bias (~past 50 days) for each variable for each grid point For selected 35 of 50 NAEFS variables 32(00 Z), 15(06 Z), 32(12 Z) and 15(18 Z) joint ensemble members Bias correction against each center’s own operational analysis Weights for each member for creating joint ensemble (equal weights now – unequal weights to be added later) • Ridging method, optimal weights ( e. g. M. Pena and H. van den Dool et al. ) • Weights don’t depend on the variables Weights depend on geographical location Weights depend on the lead time Bayesian approach

List of Variables for Bias Correction, Weights and Forecast Anomalies for CMC & NCEP Ensemble

2 meter temperature: 120 hours forecast (ini: 2006100700) Shaded: left – raw bias right – bias after correction

Bias Before/After Bias Correction ( NCEP NH) 500 h. Pa height 850 h. Pa temperature Before bias correction (1 x 1) After bias correction (1 x 1) 2 m Temperature Sea level pressure scali before down ow after d 5 km) g (5 x nscalin c. before b (1 x 1) c. (1 fter b a ng (5 x 5 km) x 1)

Bias Improvement (absolute value) after Bias correction ( NCEP ) 82% 500 h. Pa height 78% 50% 850 h. Pa temperature Overall bias reduction: 40% (globally) D 0 -3: 75% D 3 -8: 60% 82% D 8 -15: 45% Sea level pressure 2 m Temperature 65% 38% 35%

RPSS Before/After Bias Correction (NCEP 500 mb Height) 1 -6 day gain in skill due to bias correction Northern Hemisphere Southern Hemisphere RMS reduced for first week Tropics

RPSS Before/After Bias Correction (NCEP, CMC & Combined Ensemble ) After Bias Correction Before Bias Correction Northern Hemisphere After Bias Correction Bias correction improves RPSS for all lead time vs. raw ens Combined ens. ( raw or bc. ) better than a single ens. in most cases NCEP has higher skill than CMC in most cases Limitation Tropics Combined ensemble: 14 members with 7 from NCEP & 7 from CMC Use NCEP analysis to verify combined ensemble, use NCEP and CMC mean analysis in the future

Downscaling Method & Application True = High Resolution Analysis • operational North American Real-Time Mesoscale Analysis (RTMA) • 5 x 5 km National Digital Forecast Database (NDFD) grid (e. g. G. Di. Mego et al. ) • 4 variables available: surface pressure, T 2 m, 10 m U and V • other data can also be used Downscaling Method: apply decay averaging algorithm downscaling vector = (1 -w) * prior d. v. + w * (GDAS – RTMA) Ø four cycles, individual grid point, d. v. = downscaling vector Ø GDAS analysis and forecast interpolated to RTMA grids Ø regime (not flow) dependent Downscaling Process downscaled forecast = bias-corrected forecast – downscaling vector Application • off-line experiments starting from 08/11/2006, different decaying weights 2%, 5%, 10% • baseline for evaluating other sophisticated flow dependent downscaling methods

GDAS Analysis & Downscaling Vector

Ensemble Mean Forecast & Bias Before/After RTMA Downscaling Before After Before 2% experiments Left top: operational ens. mean and its bias Right top: bias corrected ens. mean and its bias Left bottom: bias corrected ens. mean after downscaling and its bias left toward RTMA After Downscaling More detailed forecast information Bias reduced, especially high topography areas

Accumulated Bias Before/After RTMA Downscaling black red blue Black- operational ensemble mean, 2% Pink- bias corrected ens. mean after downscaling, 5% Red- NAEFS bias corrected ensemble mean, 2% Blue-bias corrected ens. mean after downscaling, 2% Yellow-bias corrected ens. mean after downscaling, 10%

Summary of Bias Correction 1. Adaptive, regime dependent bias correction works well for first few days Frequent updates of analysis/modeling system possible 2. Regime dependent vs. climate mean bias correction Climate mean bias correction adds value, especially for wk 2 prob. forecasts Short range: no need of large hind-cast data set? Generation of large hind-cast ensemble is expensive but helpful for extended range forecast 3. Can we have the good aspects of both worlds? Latest analysis/model AND large hind-cast dataset? 4. Real time generation of hind-cast data Assume biggest problem is bias in first moment Can be estimated using large sample of single forecasts Focus on long lead bias estimation problem • Assume long lead bias depends only on model (and not quality of initial condition)

THORPEX LINKS STATISTICAL POST-PROCESSING • Goals: – Bias correct ensemble forecasts on model grid – Downscale bias-corrected ensemble onto finer (NDFD) grid • Participants / Contributions – Richard Verret, Laurie Wilson et al (Meteorological Service of Canada) • NAEFS bias correction (BMA, etc) – Roman Krzisztofowicz (Univ. VA) • Bayesian Processor for Ensembles research and method development (NSF funding) – Yulia Gel (Univ. Waterloo) • Bias correction and downscaling research (self-supported) – David Unger (CPC) • Moment-based bias correction for NAEFS – Bo Cui (NCEP) • Moment-based bias correction, downscaling • Funded partially by NOAA THORPEX program 16

Background !!!!!

500 h. Pa height: 120 hours forecast (ini: 2006100700) Shaded: left – raw bias right – bias after correction

CMC Control: Bias Improvement (absolute value) after Bias correction 500 h. Pa height Overall bias reduction: 850 h. Pa temperature (globally) D 0 -3: 55% D 3 -8: 50% D 8 -15: 40% Sea level pressure 2 m Temperature

RPSS Before/After Bias Correction (NCEP 500 mb Height) Probabilistic skill Extended 20 -h for d-7 Northern Hemisphere Southern Hemisphere RMS errors for ensemble mean reduced for 48 -h forecast (~9%) Tropics

RPSS After Bias Correction (NCEP, CMC & Combined Ensemble ) 500 h. Pa height Northern Hemisphere Southern Hemisphere Raw NCEP, CMC & combined ensemble Tropics Northern Hemisphere

ROC Before/After Bias Correction (NCEP 500 mb Height) Northern Hemisphere Tropics Southern Hemisphere

ROC Before/After Bias Correction ( NCEP 850 mb Temperature ) Northern Hemisphere Tropics Southern Hemisphere

RMS Before/After Bias Correction (NCEP 500 mb Height) Northern Hemisphere Tropics Southern Hemisphere

RPSS Before/After Bias Correction (CMC 500 mb Height) Northern Hemisphere Southern Hemisphere RMS errors for ensemble mean reduced for 48 -h forecast (~9%) Tropics

Ensemble Mean Forecast & Bias Before/After RTMA Downscaling

Accumulated Bias toward RTMA Before/After RTMA Downscaling Black-operational ensemble mean, 2% Red- NAEFS bias corrected ensemble mean, 2% Blue-bias corrected ens. mean after downscaling, 2% Pink-bias corrected ens. mean after downscaling, 5% Yellow-bias corrected ens. mean after downscaling, 10%

Accumulated Bias Before/After RTMA Downscaling yellow black pink red blue Black- operational ensemble mean, 2% Pink- bias corrected ens. mean after downscaling, 5% Red- NAEFS bias corrected ensemble mean, 2% Blue-bias corrected ens. mean after downscaling, 2% Yellow-bias corrected ens. mean after downscaling, 10%

Accumulated Bias Before/After RTMA Downscaling black red blue Black- operational ensemble mean, 2% Pink- bias corrected ens. mean after downscaling, 5% Red- NAEFS bias corrected ensemble mean, 2% Blue-bias corrected ens. mean after downscaling, 2% Yellow-bias corrected ens. mean after downscaling, 10%

Raw, Optimal & Actual Bias Corrected Ensembles Annual Mean RPSS ( 20040301 – 20050228 ) 500 mb Height over Northern Hemisphere 3 OPR ENS. Decaying average bias correction improves RPSS for all lead time vs. raw oper. ens. Climate error removed bias corrected reforecast gains significant improvement for all lead time vs. raw reforecast 3 RFC ENS. Operational vs. reforecast ens. oper. fcst is better than the biascorrected reforecast out to 9 -10 days. Beyond 10 days, bias-corrected reforecast becomes competitive to or better than oper. fcst Sign of improving larger for CDC reforecast 30

Background !!!!!

Comparison of Different Decaying Weights

Raw, Optimal & Actual Bias Corrected Ensembles Annual Mean RPSS ( 20040301 – 20050228 ) 500 mb Height over Northern Hemisphere 3 OPR ENS. Decaying average bias correction improves RPSS for all lead time vs. raw oper. ens. Climate error removed bias corrected reforecast gains significant improvement for all lead time vs. raw reforecast 3 RFC ENS. Operational vs. reforecast ens. oper. fcst is better than the biascorrected reforecast out to 9 -10 days. Beyond 10 days, bias-corrected reforecast becomes competitive to or better than oper. fcst Sign of improving larger for CDC reforecast

Raw, Optimal & Actual Bias Corrected Ensembles RPSS of 850 mb Temperature Northern Hemisphere, 2004 Summer 3 OPRR ENS. Decaying average oper. ens. with bias correction has better performance than the raw fcst. Climate error removed bias corrected reforecast gains significant improvement for most lead time vs. raw reforecast 3 RFC ENS. Operational vs. reforecast ens. both the raw and post-processed oper. fcst. are better than the biascorrected reforecast

Raw, Optimal & Actual Bias Corrected Ensembles RPSS of 2 m Temperature Northern Hemisphere, Annual Mean Average For 20040301 – 20050228 Raw operational ens. abrupt drop of RPSS around day 7 is caused by model configuration change at 180 h, resolution from 1º to 2. 5 º Decaying average gives a pretty good bias correction as compared to the verifying analysis

Preliminary Results 1. Decaying averaging ( 2% weight, ~46 -day oper. training data): • Short range: Works very well, all measures improved (~Day 5) • Week 2: Limited success Improves probabilistic performance (i. e. , RPSS, outlier stats ) 2. Climatological mean error removed (25 -yr CDC training data): • RMS and PAC: Very limited improvement • Probabilistic measures (RPSS, etc): significant gain 3. Bias correction algorithm • Use of most recent data better out to ~ 5 days • Use of large sample work for wk 2 4. Operational vs. reforecast performances: • Ensemble mean: Operational much better than CDC hindcast • CDC has ~50% larger initial error Probabilistic scores: Operational much better for out to day 10 For some measures, CDC hindcasts better beyond day 10 5. “Hybrid” system ( large reforecast archive & most recent operational data) • No major improvement

Tentative Conclusion 1. Adaptive, regime dependent bias correction works well for first few days (almost as good as “optimal”) • Frequent updates of analysis/modeling system possible 2. Climate mean bias correction can add value, especially for wk 2 prob. fcsts • Short range: no need of large hind-cast data set • Generation of large hind-cast ensemble is expensive but can be helpful for • extended range fcst Take into account the upgraded analysis/modeling system when designing the reforecast experiment 3. Hybrid system tested • May not be helpful if 2 systems are dissimilar?

500 h. Pa height: 120 hours forecast (ini: 2006043000) Shaded: left – raw bias right – bias after correction

2 meter temperature: 120 hours forecast (ini: 2006043000) Shaded: left – raw bias right – bias after correction

NCEP Bias Improvement (absolute value) after Bias correction 500 h. Pa height Overall bias reduction: 850 h. Pa temperature (globally) D 0 -3: 50% D 3 -8: 40% D 8 -15: 30% Sea level pressure There is daily variation after bias correction, more bias reduced for valid 12 Z cycle 2 m Temperature

Bias Improvement (absolute value) after Bias correction 10 m V-component 10 m U-component Overall bias reduction: (Tropic) D 0 -3: 50% D 3 -8: 45% D 8 -15: 40% Sea level pressure 2 m temperature

Evaluation after bias correction (16 cases) Probabilistic skill Extended 20 -h for d-7 Northern Hemisphere Southern Hemisphere Black-operational ensemble (10 m) Red-real time parallel ensemble (14 m) Green-real time parallel ensemble after bias correction (14 m) RMS errors for ensemble mean reduced for 48 -h forecast (~9%) Tropics

Bias Improvement (absolute value) after Bias correction Overall bias reduction for Northern Hemisphere Bias – before/after bias correction

RMS errors before/after bias correction Opr. ROC scores for past 5 years

RPSS before/after bias correction RPSS performance for past 5 years

Background !!!!!