SPF Development Making an SPF Overview Tools available

SPF Development Making an SPF

Overview • Tools available for highway safety analysis • Why develop Safety Performance Functions (SPFs)? • How to improve the development process • Demonstration of SPF development tool 2

There are limited resources for SPF development • SPF Application – What are SPFs – How are SPFs used • SPF Calibration – Why calibrate – How to calibrate • SPF Development – When to calibrate/develop – What are considerations – How to develop Resources • Guides • Software • Training Resources • Guides 3

IHSDM and Safety Analyst can calibrate existing SPFs (but no goodness of fit measures) SPFCalibrated = SPFUncalibrated * Calibration Factor Calibration Function Х Goodness-of-fit Measures Х SPF Development Х

The Calibrator is good for calibrating existing SPFs (planning or design-level) SPFCalibrated = SPFUncalibrated * Calibration Factor (or Function) Calibration Factor Calibration Function Goodness-of-fit Measures SPF Development Х https: //safety. fhwa. dot. gov/rsdp/toolbox-content. aspx? toolid=150

Predicted Crash Frequency What calibration can do… Traffic Volume 6

Predicted Crash Frequency What calibration can’t do… Traffic Volume 7

What if the decision is to develop? • Limited SPF development resources – Guides • 2 primary resources – Software • General statistical packages can be challenging for engineers and planners – Training • No continual training • Drs. Hauer and Srinivasan offered two courses that served as the basis for the textbook 8

SPF Development • Negative binomial regression • Relate crashes to road volume – Parameters help describe this relationship • Can be developed using: – Statistical software – Excel • Whether calibrating or developing… – Consider segmentation – Consider assessment 9

Considerations for SPF Development • Segmentation – How similar are your road segments or intersections? • Model type – Regression technique – Model form • Assessment – How to improve 10

Segmentation for Safety

Classic Segmentation Quandary 10’ Lanes 12’ Lanes 2 Lanes 4 Lanes 2’ SHLD ADT 400 4’ SHLD ADT 600 ADT 1000 Rural Urban Curve Tangent County 1 Curve County 2 10 Segments!

Roadway Homogeneity • Roadways with similar geometrics are considered homogeneous – Number of lanes – Medians – Roadway widths – Terrain • This similarity is found to be helpful in safety modeling • Heterogeneous roads produce poor models

Modeling Technique • A Poisson distribution is an ideal for a specific roadway segment – variance is equal to the mean • At the Network level crashes exhibit a large variance and a small mean – the variance is greater than the mean • This is known as overdispersion • A more appropriate distribution is the Poisson Gamma or negative binomial distribution • Results in two parameters: – the mean and the overdispersion (or shape) parameter 14

Evaluating State SPFs • The Overdispersion parameter (OP) is a crude way to evaluate SPFs • OP is reported as Theta in R Studio • This is the inverse of k as reported by many other statistics packages • As such, as Theta gets larger, there is more weight assigned to the SPF in the Empirical Bayes analysis

Overdispersion Distinctions More intuitive: as k increases, so does over dispersion Overdispersion parameter Inverse Overdispersion • AKA Shape parameter • Statistical packages: Stata, R • HSM • Statistical packages: SPSS, SAS • Lower value -> more statistically reliable model • Higher value -> more statistically reliable model Estimating Safety by the Empirical Bayes Method: A Tutorial Ezra Hauer - Douglas Harwood - Forrest Council - Michael Griffith - Transportation Research Record: Journal of the Transportation Research Board - 2002 AASHTO. Highway Safety Manual. First Edition. American Association of State Highway and Transportation Officials. Washington, D. C. , 2010.

Empirical Bayes Estimate Crashes (corrected for regression to the mean) EB = (Weight)×SPF+(1 -Weight)×Observed Crashes

Empirical Bayes Weight • Equation 3 -10 from HSM

Overdispersion • High theta SPF good representation on data Observed Crashes SPF • Low theta greater reliance on crash data Observed Crashes SPF

SPF and Crash Data - Rural Parkway 100 90 y=x^(1. 3288)*e^(-9. 7247) Theta=7. 96 Crashes per Mile (5 years) 80 70 60 50 Crash Data SPF 40 30 20 10 0 0 5000 10000 15000 AADT 20000 25000

SPF and Crash Data - Rural 4 -Lane Divided 200 y=x^(0. 73827)*e^(-3. 20716) Theta=0. 9345 180 Crashes per Mile (5 years) 160 140 120 100 Crash Data 80 SPF 60 40 20 0 0 5000 10000 15000 AADT 20000 25000 30000

Comparing Models • Theta can be used to evaluate good models from bad models • What about improving the models for a single dataset? 22

Blind SPF Development Scenario 1 Scenario 2 Scenario 3 Theta (1/k)* 1. 313776 1. 556977 1. 50734 Alpha -5. 23151 -5. 24279 -4. 01983 Beta 0. 97871 0. 97832 0. 760655

CURE Plots • Calculate residual – difference between the observed number of crashes and the predicted number of crashes from the SPF • Sort list of segments by AADT • Calculate cumulative residual • Can indicate in what AADT ranges the model performs well

Enter… Assessment Scenario 1 Scenario 2 • Scenario 3’s is the most desirable model – CURE plots – Also comparing other GOF measures • Consider the impact of the regression coefficients… Scenario 3

Impact of Coefficients Scenario 2 a -5. 24 b 0. 98 Scenario 3 -4. 02 0. 76 SPF @ 1000 ADT/1 mile 4. 55 3. 44 SPF @ 5000 ADT/2 mile 43. 94 23. 38

Improving SPFs • Goodness-of-fit (GOF) measures and CURE plots can be used to improve SPFs

Metrics (GOF) • • • Modified R 2 Mean Absolute Deviation (MAD) Akaike Information Criterion (AIC)* CURE Plot Percentage CURE Deviation (PCD) Maximum Absolute CURE Deviation *Useful when comparing model form where the sample size remains constant

CURE Plots – first line of defense • Expected to oscillate about 0 • Drifting up indicates more observed than predicted • Drifting down indicates fewer observed than predicted • Plot is a reflection of the functional form of the particular explanatory variable • Can also indicate whether other relevant explanatory factors have been excluded in the model (omitted variable bias) • Plot should stay within 2 standard deviations https: //safety. fhwa. dot. gov/rsdp/downloads/spf_development_guide_final. pdf

A Review Consider the following progression • Filters for Homogeneity

Rural 2 Lane

Rural 2 Lane – No HC

Rural 2 Lane – LW=9

Rural 2 Lane – No median, SW=2, LW=9, no VC, no HC

Rural 2 Lane – No median, SW=2, LW=9, no VC, no HC, AADT < 500

SPF Development Automation • RStudio was used to develop an R Script that automated tasks: – – – SPF development CURE plots Scatter plots Box plots Calculate metrics Filtered data

SPF-R • Requires some programming/scripting knowledge • https: //safety. fhwa. dot. gov/rsdp/toolboxcontent. aspx? toolid=210

Inputs Required Helpful • • Crashes AADT Length (if segment) Minor road AADT (if intersection) Geometrics Severity Classes Variable/Constant Dispersion 38

Model Form Description Functional Form** R Code Typical SPF=glm. nb(crash~ln. ADT+offset(ln. L)) Alternate SPF=glm. nb(crash~ln. ADT+ln. L) HSM SPF=glm. nb(crash~offset(HSM*)) Intersectio SPF=glm. nb(crash~ln. ADT 1+ln. ADT 2) n Shoulder SPF=glm. nb(crash~ln. ADT+SW+offset(ln. L)) Interaction SPF=glm. nb(crash~ln. ADT+SW+LW+SW*LW+offset(l n. L)) *HSM = log(data 2[[AADTColumn]]*data 2[[Length. Column]]*365*10^-6) **LW = lane width, SW = shoulder width

Complex Models •

Loops - Intersections • Kentucky’s intersection database used to develop SPFs • 20 intersection types created • SPF-R used to loop through • https: //uknowledge. uky. edu/ktc_researchrepor ts/1490/

Loops - KABCO Injury Severity Level Fatality (K) Comprehensive Crash Cost $4, 008, 900 Disabling Injury (A) $216, 000 Evident Injury (B) $79, 000 Fatal/Injury (K/A/B) $158, 200 Possible Injury (C) $44, 900 PDO (O) $7, 400 42

Length-Based Overdispersion • Overdispersion has been observed to be higher in shorter segments than in longer ones (Hauer, 2001) • Stata supports this: gnbreg • R requires an additional library • When comparing R to Stata the same results are returned to several decimal places – WARNING: the variable dispersion coefficients are negative in R and positive in Stata • This is best on several tests and seems to be consistent • SPF coefficients are equal, however • Possible bug in R (the Stata values are more intuitive)

Code for Variable Dispersion library(gnlm) #Point to variables crash=data 2[[Crash. Column]] ln. ADT=log(data 2[[AADTColumn]]) ln. L=log(data 2[[Length. Column]]) SPF = gnlr(crash, dist="negative binomial", mu=~exp(a+b*ln. ADT+c*ln. L), shape=~(const+b 1*ln. L), pmu=list(a=0, b=0, c=0), pshape=c(0, 0))

SPF-R User’s Guide https: //github. com/irkgreen/SPF-R/blob/master/SPF-R%20 Users. Guide. pdf 45

Demonstration http: //spfr. uky. edu/ 46