Verification of nowcasts and very short range forecasts

Why verify forecasts? • • To monitor performance over time summary scores To evaluate

Verifying nowcasts and very short range forecasts Nowcast characteristics Impact on verification concerned mainly

Observations – issues for nowcasts Thunderstorms and severe weather (mesocyclones, hail, lightning, damaging winds)

Matching forecasts and observations • Matching approach depends on • Nature of forecasts and

Forecast Quality Definitions Wilson subjective categories 1 – forecast and observed almost perfect overlap.

Systematic verification – many cases Aggregation and stratification • Aggregation • • • More

Real-time nowcast verification • Rapid feedback from latest radar scan • Evaluate the latest

Post-event verification • • • verification results more robust No single measure is adequate!

Accuracy – categorical verification Estimated yes observations M H CR F Observed forecast no

Accuracy – continuous verification Forecast F Observations O Domain Standard continuous verification scores (scores

Accuracy – probabilistic verification Standard probabilistic verification scores/methods Reliability diagram Relative operating characteristic (ROC)

Strategy 1: Plot the performance of the forecast system and the unskilled reference on

Practically perfect hindcast – upper bound on accuracy Approach: If the forecaster had all

"Double penalty" Event predicted where it did not occur, no event predicted where it

Value A forecast has value if it helps a user make a better decision

Exact match vs. "close enough" Need we get a high resolution forecast exactly right?

"Fuzzy" verification methods Large forecast and observed variability at high resolution Sydney • Fuzzy

Spatial multi-event contingency table Verify using the Relative Operating Characteristic (ROC) • Measures how

Object- and entity-based verification • • Consistent with human interpretation Provides diagnostic information on

Contiguous Rain Area (CRA) verification • • Define entities using threshold (Contiguous Rain Areas)

Error decomposition methods • • Attempt to quantify the causes of the errors Some

Scale separation methods • • • Measure correspondence between forecast and observations at a

Summary • Nowcasts and very short range forecasts present some unique challenges for verification

Summary (cont'd) • Evolving concept of what makes a "good" forecast • • •

http: //www. bom. gov. au/bmrc/wefor/staff/eee/verif_web_page. html 26

Slides: 26

Download presentation

Verification of nowcasts and very short range forecasts Beth Ebert BMRC, Australia WWRP Int'l Symposium on Nowcasting and Very Short Range Forecasting, Toulouse, 5 -9 Sept 2005 1

Why verify forecasts? • • To monitor performance over time summary scores To evaluate and compare forecast systems continuous and categorical scores To show impact of forecast skill & value scores To understand error in order to improve forecast system diagnostic methods The verification approach taken depends on the purpose of the verification 2

Verifying nowcasts and very short range forecasts Nowcast characteristics Impact on verification concerned mainly with high impact weather rare events difficult to verify in systematic manner may detect severe weather elements storm spotter observations & damage surveys required observations-based same observations often used to verify nowcasts high temporal frequency many nowcasts to verify high spatial resolution observation network usually not dense enough (except radar) small spatial domain relatively small number of standard observations 3

Observations – issues for nowcasts Thunderstorms and severe weather (mesocyclones, hail, lightning, damaging winds) • Spotter observations may contain error • Biased observations • • More observations during daytime & in populated areas • More storm reports when warnings were in effect Cell mis-association by cell tracking algorithms Precipitation • Radar rain rates contain error • Scale mismatch between gauge observations and radar pixels Observation error can be large but is usually neglected more research required on handling observation error 4

Matching forecasts and observations • Matching approach depends on • Nature of forecasts and observations • • Other matching criteria • • Scale Consistency Sparseness point-to-grid-to-point Forecast grid Verification goals Use of forecasts Matching approach can impact verification results Grid to grid approach • • Observed grid Overlay forecast and observed grids Match each forecast and observation 5

Forecast Quality Definitions Wilson subjective categories 1 – forecast and observed almost perfect overlap. 2 – majority of observed and forecast echoes overlap or offsets <50 km 3 – forecast and observed look similar but there a number of echo offsets and several areas maybe missing or extra. 4 – the forecasts and observed are significantly different with very little overlap; but some features are suggestive of what actually occurred. 5 – there is no resemblance to forecast and observed. First rule of forecast verification – look at the results! 6

Systematic verification – many cases Aggregation and stratification • Aggregation • • • More samples more robust statistics Across time - results for each point in space Space - results for each time Space and time - results summarized across spatial region and across time Stratification • • • Homogeneous subsamples better understanding of how errors depend on regime By location or region By time period (diurnal or seasonal variation) 7

Real-time nowcast verification • Rapid feedback from latest radar scan • Evaluate the latest objective guidance while it is still "fresh" • Better understand strengths and weaknesses of nowcast system • Tends to be subjective in nature Real time forecast verification system (RTFV) under development in BMRC • Not commonly performed! 8

Post-event verification • • • verification results more robust No single measure is adequate! • • Frequency bias several metrics needed distributions-oriented verification • • POD More observations may be available scatter plots (multi-category) contingency tables box-whisker plots CSI FAR Confidence intervals recommended, especially when comparing one set of results with another • Bootstrap (resampling) method simple to apply 9

Accuracy – categorical verification Estimated yes observations M H CR F Observed forecast no yes H = hits M = misses no F = false alarms CR = correct rejections Standard categorical verification scores PC = (H + CR) / N proportion correct (accuracy) Bias = (F + H) / (M + H) frequency bias POD = H / (H + M) probability of detection POFD = F / (CR + F) probability of false detection FAR = F / (H + F) false alarm ratio CSI = H / (H + M + F) critical success index (threat score) ETS = (H – Hrandom) / (H + M + F – Hrandom) equitable threat score HSS = (H + CR – PCrandom) / (N – PCrandom) Heidke skill score HK = POD – POFD Hanssen and Kuipers discriminant OR = (H * CR) / (F * M) odds ratio 10

Accuracy – continuous verification Forecast F Observations O Domain Standard continuous verification scores (scores computed over entire domain) bias = mean error MAE = mean absolute error RMSE = r= root mean square error correlation coefficient 11

Accuracy – probabilistic verification Standard probabilistic verification scores/methods Reliability diagram Relative operating characteristic (ROC) Brier score Brier skill score Ranked probability score 12

Strategy 1: Plot the performance of the forecast system and the unskilled reference on the same diagram Skill scores measure the relative improvement of the forecast over the reference forecast: Strategy 2: Plot the value of the skill score 0. 6 > 0 mm 0. 5 > 1 mm > 5 mm 0. 4 0. 3 0. 2 0. 1 0 -0. 1 30 60 90 120 150 Forecast (min) 180 ____ Nowcast _ _ _ Extrapolation. . . . Gauge persistence Skill w. r. t. gauge persis. A forecast has skill if it is more accurate than a reference forecast (usually persistence, cell extrapolation, or random chance). Hanssen & Kuipers score Skill 0 -0. 2 -0. 4 -0. 6 -0. 8 -1 30 60 90 120 150 180 Forecast (min) ____ Nowcast _ _ _ Extrapolation 13

Practically perfect hindcast – upper bound on accuracy Approach: If the forecaster had all of the observations in advance, what would the "practically perfect" forecast look like? • • • Apply a smoothing function to the observations to get probability contours, choose an appropriate yes/no threshold Did the actual forecast look like the practically perfect forecast? How did the performance of the actual forecast compare to the performance of the practically perfect forecast? Convective outlook was 75% of the way to being "practically perfect" SPC convective outlook CSI = 0. 34 Practically perfect hindcast CSI = 0. 48 Kay and Brooks, 2000 14

"Double penalty" Event predicted where it did not occur, no event predicted where it did occur Big problem for nowcasts and other high resolution forecasts Ex: Two rain forecasts giving the same volume fcst obs 10 10 High resolution forecast RMS ~ 4. 7 POD=0, FAR=1, CSI=0 fcst 3 obs 10 Low resolution forecast RMS ~ 2. 7 POD~1, FAR~0. 7, CSI~0. 3 15

Value A forecast has value if it helps a user make a better decision Value scores measures the relative economic value of the forecast over some reference forecast: Expense depends on the cost of taking preventative action and the loss incurred for a missed event obs fcst Small or rare events with high losses, value maximized by over-prediction obs fcst Events with high costs and displacement error likely, value maximized by under-prediction The most accurate forecast is not always the most valuable! Baldwin and Kain, 2004 16

Exact match vs. "close enough" Need we get a high resolution forecast exactly right? Often "close" is still useful to a forecaster YES • High stakes situations (e. g. space shuttle launch, hurricane landfall) • Hydrological applications (e. g. flash floods) • Topographically influenced weather (valley winds, orographic rain, etc. ) Standard verification methods appropriate (POD, FAR, CSI, bias, RMSE, correlation, etc. ) NO • Guidance forecasters • Model validation (does it predict what we expect it to predict? ) • Observations may not allow standard verification of high resolution forecasts "Fuzzy" verification methods, diagnostic methods verify attributes of forecast 17

"Fuzzy" verification methods Large forecast and observed variability at high resolution Sydney • Fuzzy verification methods don't require an exact match between forecasts and observations to get a good score • Vary the size of the space / time neighborhood around a point • • • Damrath, 2004 Rezacova and Sokol, 2004 * Theis et al. , 2005 Roberts, 2004 * Germann and Zawadski, 2004 t-1 t t+1 Forecast value Also vary magnitude, other elements • • Forecasters don't (shouldn't!) take a high resolution forecast at face value – instead they interpret it in a probabilistic way. Frequency • Atger, 2001 Evaluate using categorical, continuous, probabilistic scores / methods * Giving a talk in this Symposium 18

Spatial multi-event contingency table Verify using the Relative Operating Characteristic (ROC) • Measures how well the forecast can separate events from non-events based on some decision threshold single threshold Decision thresholds to vary: • magnitude (ex: 1 mm h-1 to 20 mm h-1) • distance from point of interest (ex: within 10 km, . . , within 100 km) • timing (ex: within 1 h, . . . , within 12 h) • anything else that may be important in interpreting the forecast Can apply to ensembles, and to compare deterministic forecasts to ensemble forecasts Atger, 2001 ROC curve for varying rain threshold ROC curve for ensemble forecast, varying rain threshold 19

Object- and entity-based verification • • Consistent with human interpretation Provides diagnostic information on whole-system properties • • Location Amplitude Size Shape NCAR fcst obs Df Cf • Bf Techniques • • Af Co Bo Do Ao Contiguous Rain Area (CRA) verification (Ebert and Mc. Bride, 2000) NCAR object-oriented approach* (Brown et al. , 2004) Cluster analysis (Marzban and Sandgathe, 2005) Composite method (Nachamkin, 2004) MM 5 8 clusters identified in x-y-p space 20

Contiguous Rain Area (CRA) verification • • Define entities using threshold (Contiguous Rain Areas) Horizontally translate the forecast until a pattern matching criterion is met: • • • minimum total squared error maximum correlation maximum overlap Obs Fcst The displacement is the vector difference between the original and final locations of the forecast. Compare properties of matched entities • • area mean intensity max intensity shape, etc. Ebert and Mc. Bride, 2000 21

Error decomposition methods • • Attempt to quantify the causes of the errors Some approaches: • CRA verification (Ebert and Mc. Bride, 2000) MSEtotal = MSEdisplacement + MSEvolume + MSEpattern • Feature calibration and alignment (Nehrkorn et al. , 2003) E(x, y) = Ephase(x, y) + Elocal bias(x, y) + Eresidual(x, y) • Acuity-fidelity approach (Marshall et al. , 2004) minimize cost function: J = Jdistance + Jtiming + Jintensity + Jmisses from both perspectives of forecast (fidelity) and observations (acuity) • Error separation (Ciach and Krajewski, 1999) MSEforecast = MSEtrue + MSEreference 22

Scale separation methods • • • Measure correspondence between forecast and observations at a variety of spatial scales Some approaches: Multiscale statistical properties (Zepeda-Arce et al. , 2000; Harris et al. , 2001) • Scale recursive estimation (Tustison et al. , 2003) • Intensity-scale approach* (Casati et al. , 2004) SATELLITE l=0 MODEL l=1 RADAR l=2 RAIN GAUGES l=3 23

Summary • Nowcasts and very short range forecasts present some unique challenges for verification • • High impact weather High resolution forecasts Imperfect observations There is still a place for standard scores • • • Historical reasons When highly accurate forecasts are required Useful for monitoring improvement Must use several metrics Please quantify uncertainty, especially when intercomparing forecast schemes Compare with unskilled forecast such as persistence 24

Summary (cont'd) • Evolving concept of what makes a "good" forecast • • • Recognizing value of "close enough" Probabilistic view of deterministic forecasts Exciting new developments of diagnostic methods to better understand the nature and causes of forecast errors • • • Object- and entity-based Error decomposition Scale separation 25

http: //www. bom. gov. au/bmrc/wefor/staff/eee/verif_web_page. html 26