Fundamentals and Crash Count Distributions Fall 2019 HSM

Fundamentals and Crash Count Distributions Fall 2019

HSM - Fundamentals CRASHES AS THE BASIS OF SAFETY ANALYSIS Crash frequency is used as a fundamental indicator of “safety” in the evaluation and estimation methods presented in the HSM. Where the term “safety” is used in the HSM, it refers to the crash frequency or crash severity, or both, and collision type for a specific time period, a given location, and a given set of geometric and operational conditions.

HSM - Fundamentals Objective and Subjective Safety The HSM focuses on how to estimate and evaluate the crash frequency and crash severity for a particular roadway network, facility, or site, in a given period, and hence the focus is on “objective” safety. Objective safety refers to use of a quantitative measure that is independent of the observer. In contrast, “subjective” safety concerns the perception of how safe a person feels on the transportation system. Assessment of subjective safety for the same site will vary between observers.

HSM - Fundamentals Definition of a Crash In the HSM, a crash is defined as a set of events that result in injury or property damage due to the collision of at least one motorized vehicle and may involve collision with another motorized vehicle, a bicyclist, a pedestrian, or an object. The terms used in the HSM do not include crashes between cyclists and pedestrians, or vehicles on rails.

HSM - Fundamentals Definition of Crash Frequency In the HSM, “crash frequency” is defined as the number of crashes occurring at a particular site, facility, or network in a one-year period. Crash frequency is calculated according to Equation 3 -1 and is measured in number of crashes per year.

HSM - Fundamentals Definition of Predictive Method The term “predictive method“ refers to the methodology in Part C of the HSM that is used to estimate the “expected average crash frequency” of a site, facility, or roadway under given geometric design and traffic volumes for a specific period of time. Definition of Expected Average Crash Frequency The term “expected average crash frequency” is used in the HSM to describe the estimate of long-term average crash frequency of a site, facility, or network under a given set of geometric design and traffic volumes in a given time period (in years).

HSM - Fundamentals Definition of Crash Severity Crashes vary in the level of injury or property damage. The American National Standard ANSI D 16. 1 -1996 defines injury as “bodily harm to a person” (7). The level of injury or property damage due to a crash is referred to in the HSM as “crash severity. ” While a crash may cause a number of injuries of varying severity, the term crash severity refers to the most severe injury caused by a crash.

HSM - Fundamentals Definition of Crash Severity Crash severity is often divided into categories according to the KABCO scale, which provides five levels of injury severity. Even if the KABCO scale is used, the definition of an injury may vary between jurisdictions.

HSM - Fundamentals Definition of Crash Severity The five KABCO crash severity levels are: K—Fatal injury: an injury that results in death; A—Incapacitating injury: any injury, other than a fatal injury, that prevents the injured person from walking, driving, or normally continuing the activities the person was capable of performing before the injury occurred; B—Non-incapacitating evident injury: any injury, other than a fatal injury or an incapacitating injury, that is evident to observers at the scene of the crash in which the injury occurred; C—Possible injury: any injury reported or claimed that is not a fatal injury, incapacitating injury, or non-incapacitating evident injury and includes claim of injuries not evident; O—No Injury/Property Damage Only (PDO).

HSM - Fundamentals Crashes Are Rare and Random Events Crashes are rare and random events. By rare, it is implied that crashes represent only a very small proportion of the total number of events that occur on the transportation system. Random means that crashes occur as a function of a set of events influenced by several factors, which are partly deterministic (they can be controlled) and partly stochastic (random and unpredictable). An event refers to the movement of one or more vehicles and or pedestrians and cyclists on the transportation network.

HSM - Fundamentals Crashes Are Rare and Random Events A crash is one possible outcome of a continuum of events on the transportation network during which the probability of a crash occurring may change from low risk to high risk. Crashes represent a very small proportion of the total events that occur on the transportation network. For example, for a crash to occur, two vehicles must arrive at the same point in space at the same time. However, arrival at the same time does not necessarily mean that a crash will occur. The drivers and vehicles have different properties (reaction times, braking efficiencies, visual capabilities, attentiveness, speed choice), that will determine whether or not a crash occurs.

HSM - Fundamentals Natural Variability in Crash Frequency Because crashes are random events, crash frequencies naturally fluctuate over time at any given site. The randomness of crash occurrence indicates that short-term crash frequencies alone are not a reliable estimator of long-term crash frequency. If a three-year period of crashes were used as the sample to estimate crash frequency, it would be difficult to know if this three-year period represents a typically high, average, or low crash frequency at the site. This year-to-year variability in crash frequencies adversely affects crash estimation based on crash data collected over short periods. The short-term average crash frequency may vary significantly from the long-term average crash frequency. This effect is magnified at study locations with low crash frequencies where changes due to variability in crash frequencies represent an even larger fluctuation relative to the expected average crash frequency.

HSM - Fundamentals Natural Variability in Crash Frequency

Theoretical Process of Motor Vehicle Crashes Each time a vehicle enters an intersection, a highway segment, or any other type of entity (a trial) on a given transportation network, it will either crash or not crash. For purposes of consistency a crash is termed a “success” while failure to crash is a “failure. ” For the Bernoulli trial, a random variable, defined as X, can be generated with the following probability model: if the outcome “w” is a particular event outcome (e. g. a crash), then X (ω) = 1 whereas if the outcome is a failure then X (ω) = 0. Thus, the probability model becomes: X 1 0 P(x=X) p q where p is the probability of success (a crash) and q=(1 -p) is the probability of failure (no crash).

Binomial distribution It can be shown that if there are N independent trials (vehicle passing through an intersection, road segment, etc. ), the count of successes over the number of trials give rise to a Bernoulli distribution. We’ll define the term Z as the number of successes over the N trials. Under the assumption that all trials are characterized by the same failure process (this assumption is revisited later), the appropriate probability model that accounts for a series of Bernoulli trials is known as the binomial distribution, and is given as: Equation 1 Where, n = 0, 1, 2, … , N (the number of successes or crashes)

Poisson Approximation For typical motor vehicle crashes where the event has a very low probability of occurrence and a large number of trials exist (e. g. million entering vehicles, vehicle-miles-traveled, etc. ), it can be shown that the binomial distribution is approximated by a Poisson distribution. Under the Binomial distribution with parameters N and p, let p=λ/N , so that a large sample size N will be offset by the diminution of p to produce a constant mean number of events λ for all values of p. Then as N -› ∞, it can be shown that: Equation 2 Where, n = 0, 1, 2, … , N (the number of successes or crashes) λ = the mean of a Poisson distribution

Poisson Approximation The approximation illustrated in Equation (2) works well when the mean λ and p are assumed to be constant. In practice however, it is not reasonable to assume that crash probabilities across drivers and across road segments (intersections, etc. ) are constant. Specifically, each driver-vehicle combination is likely to have a probability that is a function of driving experience, attentiveness, mental workload, risk adversity, vision, sobriety, reaction times, vehicle characteristics, etc. Furthermore, crash probabilities are likely to vary as a function of the complexity and traffic conditions of the transportation network (road segment, intersection, etc. ). All these factors and others will affect to various degrees the individual risk of a crash. These and other characteristics affecting the crash process create inconsistencies with the approximation illustrated in Equation (2). Outcome probabilities that vary from trial to trial are known as Poisson trials (note: Poisson trials are not the summation of independent Poisson distributions; this term is used to designate Bernoulli trials with unequal probability of events).

Poisson Approximation The equation below is used for determining if the unequal event of independent probabilities can be approximated by a Poisson process. Equation 3 Where, d. TV = total variance between the two probabilities measured L(Z) and Po(λ); L(Z) = count data generated by unequal probability of events Po(λ) = count data generated by unequal events of independent probabilities with λ=E(Z). See Barbour et al. (1992) Poisson Approximation. Clarendon Press, New York, NY for additional information.

Poisson Approximation The equation below is used for determining if the unequal event of independent probabilities leads to over-dispersion, VAR(Z) > E(Z). If Then For any r > 2, where

Crash Data as Poisson Process Given the characteristics described in the previous overheads, it is often assumed that crash data on a given site (or entity) follow a Poisson a distribution. In other words, if one were to count data over time for one site, the data are assumed to be Poisson distributed. Example: 3 7 0 1 2 3 3 4 1 i 4 Crash Count i+1 Time t Poisson assumption: Where, λ = Mean of the Poisson distribution y = Crash count (0, 1, 2, …)

Crash Data as Poisson Process If we have counts = 3, 7, 0, and 3 on an entity, what is λ?

Crash Data as Poisson Process We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function)

Crash Data as Poisson Process We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function) is maximum at

Crash Data as Poisson Process p(y) λ* is maximum at λ

Crash Data as Poisson Process Accuracy of estimation ( ): Counts and Predicted Values i 1 yi 3 3 3 2 7 5 2. 5 3 0 3. 3 1. 1 4 3 3. 25 0. 8

Crash Data as Poisson Process Method to calculate the mean and variance observed in crash data Finding the mean: Finding the variance: Where:

Overdispersion (aka heterogeneity) Crash data can rarely be exhibited as pure Poisson distribution. Usually, the data display a variance that is greater than the mean, VAR(Y) > E(Y). This is known as over-dispersion. Sometimes, the data can show under-dispersion, but this is very rare. The principal cause of over-dispersion was explained in the previous overheads (Bernoulli process with unequal probability of events). Over-dispersion can also be caused by numerous factors. For other types of processes (not based on a Bernoulli trial), over -dispersion can be explained by the clustering of data (neighborhood, regions, wiring boards, etc. ), unaccounted temporal correlation, and model mis-specification. These factors also influences the heterogeneity found in crash data.

Overdispersion (aka heterogeneity) In order to account for over-dispersion commonly found in crash data, it has been hypothesized that the mean (λ) found in a population of sites follows a gamma probability density function. In other words, if we have a population of entities (say 100 intersections) their mean λs (if everything else remain constant) would follow a gamma distribution. The gamma probability density function is defined by: for >0 Where, = the mean of the selected site = parameters of the gamma distribution [gamma( = gamma function (∫e-u u(Φ-1) du) )]

Overdispersion (aka heterogeneity) f(λ) Distribution of the Poisson means follows the gamma distribution λ As discussed in the previous slide, 100 intersections with the exact same characteristics (traffic flow, geometric design, etc. ), including the number of crashes per year, will (or are expected to) have different Poisson mean λ values. The distribution of these means is assumed to be gamma distributed.

Overdispersion (aka heterogeneity) There are three reasons why the gamma probability function has been a popular assumptions: 1. The mean λ is allowed only to take a positive value; 2. The Gamma PDF is very flexible; it can move and stretched to fit a variety of shapes; and 3. It makes the algebra simple and often yields “closed form “ results. Note: Nobody has proved so far that the mean varies according to a gamma probability function. People use it because it is easy to manipulate. Some researchers have used a lognormal function to characterize the distribution of the mean, which is a little more complex. You can also use more complicated distributions (e. g. , Conway-Maxwell-Poisson).

Overdispersion (aka heterogeneity) Poisson-gamma distribution is usually characterized by using one shape parameter (note: ): The shape parameter ϕ can be estimated as follows: and s 2 are estimated using the equations shown above for the Poisson.

Negative Binomial (or Poisson-gamma) The mean and variance of the Negative Binomial distribution (one parameter) are estimated using the following equations: Note: if , the second part of the variance function tend towards 0. This means that the Negative Binomial becomes a Poisson distribution since the mean and variance are now equal. Note: For modeling purposes, the term is usually estimated directly from the data. This will be addressed later in the course.

Negative Binomial (or Poisson-gamma) Alternative forms of the PDF:

Other Distributions/Models � Dispersed Data ◦ ◦ ◦ ◦ ◦ Poisson-Lognormal Poisson-Weibull Negative multinomial Finite Mixture/Latent Class Bi- and Multi-variate Generalized Waring Poisson Inverse Gaussian (PIG) Random Effects/Random Parameters Poisson-weighted exponential ◦ ◦ Gamma (time dependent) Conway-Maxwell-Poisson Double Poisson Hyper-Poisson � Under-Dispersion

Other Distributions/Models � Highly Dispersed Data and Excess Zeros � Semi- and Non-Parametric ◦ ◦ ◦ Zero-Inflated/Markov Switching NB-Lindley NB-Generalized Exponential NB-Crack NB-DP Sichel (PIG is a special case) ◦ ◦ ◦ Support Vector Machine Neural Network/Bayesian Neural Network Regression Tree Generalized Additive Model NB-DP