Calibration of PQPF from the NCEP ShortRange Ensemble

Calibration of PQPF from the NCEP Short-Range Ensemble Forecasting (SREF) in 2006 Huiling Yuan NOAA/Earth System Research Lab (ESRL) and National Research Council Associate In collaboration with NOAA/ESRL/Global Systems Division/Forecast Applications Branch and NOAA/NCEP/EMC/ensemble group

Introduction • NCEP Short-Range Ensemble Forecasting (SREF) • Postprocessing of PQPF by the new SREF system Probabilistic Quantitative Precipitation Forecast (PQPF) Method: adjust probability using a series of probabilities (PDF) through linear regression/neural networks • Results of calibrated PQPF during warm and cool seasons in 2006

Ensemble and model configurations NCEP SREF since March 06, 2006 • Ensemble configuration: Initialization: 09 z, 21 z Output interval: 3 h Forecast length: 87 h • Breeding perturbations • 5 Model (physics) output, 21 ensemble members WRF-ARW: 3 members: 1 cntrl, 1 pair breeding member WRF-NMM: 3 members: 1 cntrl, 1 pair breeding member ETA-BMJ: 5 members: 1 cntrl, 2 pairs breeding members ETA-KF: 5 members: 1 cntrl, 2 pairs breeding members RSM: 5 members: 1 cntrl, 2 pairs breeding members • CONUS, Grib 212 (185 x 129 grids, ~ 40 km)

Observation and calibration • Observation: NCEP Stage IV 6 -hourly precipitation analyses, aggregated to 24 -h and CONUS Grib 212 grids • Calibration methods: Linear regression and an artificial neural network • Calibration results of 24 -h PQPF for day 1 (3 -27 h), day 2 (27 -51 h), and day 3 (51 -75 h) valid at 00 z • Warm season (April-September) calibration: test data: April-September, 2006 Cool season (October-March) calibration: test data: October-December, 2006 • Cross-validation: validation rotates for a month, and data from the rest months are used as the training in a season

Linear regression and an artificial neural network (ANN) • Linear regression equation: • A three-layered neural network: Input-neural-output layers Optimization in the neural layer: minimization of root mean squared error (RMSE, for PQPF, equivalent to minimize Brier score) through the weights, which link input and output variables. Least square simplex search algorithm and global optimization X 1 1 2 i n 0 7 1 h 1(p) w 0 h 2(p) w 1 w 2 hj(p wj ) h (p w ) >2 n n P 1 Probability 4 nodes (can increase nodes by convergence), up to 5, 000 iterations

Three-layered feedforward artificial neural network (Hsu et al. 1995) Sigmoid function: S=1/(1+exp(-a))

Test of bias correction for PQPF • Input: 7 probabilities of precipitation for all grids over each region Probabilities at 0. 01, 0. 25, 0. 75, 1, 1. 25, 1. 5, 2. 0, 2. 5, 3. 0, 3. 5, 4. 0 inches thresholds e. g. , 0. 5 inches is the selected threshold, input the nearest 7 probabilities: at 0. 01, 0. 25, 0. 75, 1, 1. 25 inches Input for 1 inch Input for thresholds 0. 01, 0. 25, 0. 5 in.

Test of bias correction for PQPF (cont. ) • Input target probability: 1/0, binary observation (occurred event in the observation: 1, otherwise, 0) at the selected threshold • To increase the efficiency, only the grids having non-zero probabilities for a selected threshold are used in both training and verification data. • Calculate the weights for each threshold over a region, then apply the weights to the data during the verification period • After completing calibration procedures for all selected thresholds, a program to check monotonic distribution of probabilities for the selected thresholds and correct outliers (e. g. , prob(1 inch) < prob (0. 5 inches)< … < prob (0. 01 inches). • In this study, bias correction for PQPF by linear regression received similar (or somewhat better) results compared to use a three-layered neural network. Linear regression is fast and easily operated.

Study domain – CONUS: west, central, and east Roughly divide CONUS by hydrological regions (USGS Huc 4 data) Blue: west, green: central, red: east

Calibration for warm season, 2006 Attributes diagrams (reliability curves), Brier scores Regional calibration: west, central, and east Cross-validation for April-Sept, 2006 Verification period rotates for each month, and data from the rest five months is used as the training period during April. September, 2006. Thresholds: 0. 1, 0. 25, 0. 5, 1 inches /24 h Composite results from linear regression: (1) results for each region composite from individual months of the whole warm season in 2006 (2) results for CONUS composite from the whole warm season and three regions in 2006

Bias correction of PQPF over the west U. S. , day 1 Reliability Diagrams blue: old, red: new April-Sept, 2006 Thresholds: 0. 1, 0. 25, 0. 5, 1 inches /24 h Reliability improved for lower thresholds Due to small samples, did not improve higher thresholds and lack high probabilities

Bias correction of PQPF over the west U. S. , day 2 Reliability Diagrams blue: old, red: new April-Sept, 2006 Reliability improved for lower thresholds Due to small samples, did not improve higher thresholds and lack high probabilities

Bias correction of PQPF over the west U. S. , day 3 Reliability Diagrams blue: old, red: new April-Sept, 2006 Reliability improved for lower thresholds Due to small samples, did not improve higher thresholds and lack high probabilities

Bias correction of PQPF over the central U. S. , day 1 Reliability Diagrams blue: old, red: new April-Sept. , 2006 Reliability improved Lack high probabilities for higher thresholds

Bias correction of PQPF over the central U. S. , day 2 Reliability Diagrams blue: old, red: new 2006 April-Sept. Reliability improved Lack high probabilities for higher thresholds

Bias correction of PQPF over the central U. S. , day 3 Reliability Diagrams Cross-validation blue: old, red: new 2006 April-Sept. Reliability improved Lack high probabilities for higher thresholds

Bias correction of PQPF over the east U. S. , day 1 Reliability Diagrams cross-validation blue: old, red: new April-Sept. , 2006 Reliability improved

Bias correction of PQPF over the east U. S. , day 2 Reliability Diagrams cross-validation blue: old, red: new April-Sept. , 2006 Reliability improved Lack high probabilities for higher thresholds

Bias correction of PQPF over the east U. S. , day 3 Reliability Diagrams cross-validation blue: old, red: new April-Sept. , 2006 Reliability improved for lower thresholds Lack high probabilities for higher thresholds Overcorrection for higher thresholds

Calibration for warm season, 2006, CONUS, day 1 Bias correction of PQPF over CONUS, day 1 Attributes Diagrams blue: old, red: linear regression cross-validation: Apr-Sept, 2006 Brier scores (BS) decreased Reliability improved

Calibration for warm season, 2006, CONUS, day 2 Bias correction of PQPF over CONUS, day 2 Attributes Diagrams blue: old, red: linear regression cross-validation: Apr-Sept, 2006 Brier scores (BS) decreased Reliability improved

Calibration for warm season, 2006, CONUS, day 3 Bias correction of PQPF over CONUS, day 3 Attributes Diagrams blue: old, red: linear regression cross-validation: Apr-Sept, 2006 Brier scores (BS) decreased Overcorrection: dry bias for higher thresholds

Calibration for cool season, 2006 Attributes Diagrams Regional calibration: west, central, and east cross-validation for Oct-Dec, 2006 Select one month as the validation period, and the rest two months as the training data. (during December 2006, some data missing) Thresholds: 0. 1, 0. 25, 0. 5, 1 inches /24 h Results for CONUS are composite for the whole cool season and three regions in 2006 (1) results by linear regression (2) compare results by linear regression and neural networks

Calibration for cool season, using warm season as the training Bias correction of PQPF over CONUS, day 1 Attributes Diagrams Linear regression blue: old, red: new Oct-Dec, 2006 using training data from Mar-Sept, 2006 Brier scores decreased for lower thresholds Overcorrection!

Bias correction of PQPF over CONUS, day 1 Attributes Diagrams blue: old, red: linear regression cross-validation: Oct-Dec, 2006 Improved reliability curves Brier scores decreased

Bias correction of PQPF over CONUS, day 2 Attributes Diagrams blue: old, red: linear regression Thresholds: 0. 1, 0. 25, 0. 5, 1 inch /24 h cross-validation: Oct-Dec, 2006 Improved reliability curves Brier scores decreased

Bias correction of PQPF over CONUS, day 3 Attributes Diagrams blue: old, red: linear regression Thresholds: 0. 1, 0. 25, 0. 5, 1 inch /24 h cross-validation: Oct-Dec, 2006 Improved reliability curves and Brier scores decreased except for 1 inch

Linear regression vs. neural networks (cont. ) Bias correction of PQPF over CONUS, day 1 Attributes Diagrams blue: old, green: ANN, red: Linear cross-validation: Oct-Dec, 2006 ANN: overfitting problem; larger BS; for 1 inch, calibration received worse results than original forecasts

Linear regression vs. neural networks Bias correction of PQPF over CONUS, day 3 Attributes Diagrams blue: old, green: ANN, red: Linear cross-validation: Oct-Dec, 2006 ANN: overfitting problem; larger BS calibration received worse BS and reliability curves for >0. 25 inches

Weights in linear regression and neural networks Linear regressions: Need constrains to output [0 1]; Simple weights ANN: Output is [0 1]; Complicate weights; Overfitting problem, sensitive to nodes and parameters; npa = (n. Input+1)*n. Hidden + (n. Hidden+1)*n. Output e. g. , weights for 0. 01 in. by ANN (4 hidden nodes) in a): -0. 789112 2. 564361 -2. 533994 0. 158898 3. 922368 2. 309217 3. 901703 -0. 373235 3. 325226 -2. 178053 0. 051663 3. 143624 0. 654948 3. 123569 3. 480210 1. 493520 -0. 042751 0. 318624 0. 597600 0. 203516 3. 993636 -2. 894761 2. 564028 0. 528968 2. 015937 -0. 403388 -2. 075513 -0. 385473 1. 548653 1. 432035 1. 895414 -1. 607493 7. 364338 83. 107254 25. 844067 -7. 248751 -96. 884842

Summary • Overall, daily PQPF by the SREF system has a good reliability with light overestimations, better results for thresholds smaller than 1 inch. • Calibration of PQPF: Using the training data from the same season (warm/cool), reliability curves of PQPF for day 1, 2, 3 after bias correction improved for lower thresholds, but did not improve higher thresholds much, especially for drier west U. S. . • Such calibration is to decrease the conditional bias in PQPF. Most calibration decreases Brier Scores, except a few calibration processes for higher thresholds show somewhat inflated (increase) Brier scores due to overcorrection. • Sample size for higher thresholds is a problem to build up stable training weights in both linear regression or neural networks, and causes more severe overfitting in neural networks (need careful test for parameters in neural networks). Linear regression is easily operated and fast adaptive update.

Future works • Next step: Install calibration codes (linear regression first) in real-time postprocessing of the NCEP SREF system update daily, calibrate PQPF for 3 h, 6 h, 24 h accumulations thresholds: 0. 01, 0. 25, 0. 5, 1, and 2 inches regional calibration (by individual RFC? ) as more data is available • Any suggestions for improving short-range PQPF or QPF?