**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

60 Chapter 6. Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions Introduction In the 1st edition of the HSM (AASHTO, 2010), when estimating average predicted crash frequencies by type and severity using a safety performance function, a simple two-stage approach is used. This approach first estimates the total number of crashes for a specific entity and then crashes by type or severity are estimated by applying a fixed proportion for each type or severity to the estimate for total crashes. However, since the crash type or severity distribution, and consequently the proportion for each type or severity level, may differ by segment characteristics and other factors, using a pre-specified fixed proportion is questionable. This Chapter addresses the assessment of the reliability of this simple two- stage approach. Table 26. Influence Applying the Two-stage CPM Approach on the Reliability of an Estimated Frequency of Rare Crash Types and Severities. Influence Category Factor Effect of Factor on Reliability of CPM Bias Variance Repeatability Application- related factors influencing reliability Estimating CPMs for rare crash types and severities Less reliable No effect Less reliable, especially for rarer crash types and severities This problem has been resolved to a large extent by CPMs estimated in NCHRP Project 17-62, âImproved Predicted Models for Crash Types and Crash Severities,â for various crash types and severities. However, there are two types of cases where SPFs could not be reliably estimated, and a third type of case pertaining to crash types and severities for which estimation of SPFs was not even considered. For example, Figure 3 and Figure 4 depict some SPFs that could not be estimated due to small sample sizes or odd estimation results. The following abbreviations are used in these figures: ï· 3ST â 3 leg stop controlled intersection ï· 4ST â 4 leg stop controlled intersection ï· 3SG â 3 leg signalized intersection ï· 4SG â 4 leg signalized intersection ï· OD â Opposite direction crashes ï· SD â Same direction crashes ï· ID â Intersecting direction crashes ï· SV â Single-vehicle crashes ï· KA â Crashes involving a fatality or injury severity A ï· KAB â Crashes involving a fatality, injury severity A, or injury severity B ï· KABC â Crashes involving a fatality or injury ï· KABCO â Crashes of all severity including property damage only crashes

61 Figure 3. Crash Type and Severity SPFs for Signalized Intersections that could not be estimated in NCHRP Project 17-62 Figure 4. Crash Type and Severity SPFs for Signalized Intersections that could not be estimated in NCHRP Project 17-62 3ST OD KA 4ST Total KAB KA SD KA OD KA ID KABC0 KABC KAB KA 3SG Total KA SV KABC KAB KA SD KA OD KA 4SG SV KA ID KA

62 Method to Assess Potential Reliability This Chapter addresses the reliability of predictions obtained from the simple two-stage approach as it applies to rare crash types for which three cases may prevail. The three cases are as follows: ï· Case A: Models did not converge or were illogical (e.g., AADT exponents were negative or statistically insignificant at the 10% level) and as such there are no recommended SPFs. ï· Case B. There is low confidence in a SPF because it did not validate well or had poor Goodness-of-Fit statistics. ï· Case C: For numerous crash types and severities, estimation of SPFs was not considered either because they were not of primary interest generally (e.g., nighttime crashes) or because there are typically too few crashes to attempt SPF development (e.g., bicycle, pedestrian, and fatal crashes). This case also applies to bicycle and pedestrian crashes that was developed in NCHRP Project 17-70 (Ferguson et al., 2019). In all three cases, a reliable âparentâ SPF is available in the HSM to which a crash type/severity proportion using the application jurisdictionâs data can be applied. A âparentâ SPF would be the one with the lowest crash frequency that includes the crash type/severity of interest. For example, a KAB parent SPF, if reliable, and presented as such in the HSM, would be considered for KA crashes. Otherwise, a KABC parent SPF, if reliable, would be considered for both KA and KAB crashes, and so on. If Case A or Case C pertains, a crash type/severity proportion developed from the jurisdictionâs data is applied to a prediction from the recommended and calibrated âparentâ SPF. It is recommended that the validity of the resulting SPF be assessed using the FHWA Calibrator Tool (Lyon et al., 2016) before adopting it and that due caution be exercised in applying it should the assessment indicate that it may be unreliable. If Case B pertains, the question for the analyst is which of two potential approaches and SPFs produces the most reliable crash predictions. ï· Approach 1: A Case B uncalibrated SPF that did not validate well or has poor GOF statistics. Such an SPF may not be presented in the HSM but may be retrieved from another source, such as the NCHRP Web-Only Document 295: Improved Prediction Models for Crash Types and Crash Severities (Ivan et al. 2021). ï· Approach 2: A modified SPF in which a crash type/severity proportion developed from the jurisdictionâs data is applied to a prediction from the HSM recommended and uncalibrated âparentâ SPF. Goodness-of-Fit Measures Several goodness-of -fit measures can be used, including coefficient of variation of calibration factor (defined as CV), and cumulative residual (CURE) Plots that indicate the percent of CURE plot ordinates for fitted values (after calibration) exceeding 2Ï limits. The definitions of these criteria can be found in the User Guide for the FHWA Calibrator Tool (Lyon et al., 2016) and are described below. Because the impact on reliability can be so variable depending on the relative frequencies of the rare crash and parent crash types, it is not possible to give strict, step-by-step guidance an analyst might follow to assess how the use of fixed crash type/severity proportions will affect the reliability of an SPF. Thus, the fundamental objective here is to demonstrate, using actual data, and SPFs estimated, a heuristic procedure that an analyst can use to assess how reliability may be impacted for a very specific application. Two illustrations are presented, one where Case A or C pertains and one where Case B pertains.

63 Coefficient of Variation of Calibration Factor (Defined As CV) The CV of the calibration factor is the standard deviation of the calibration factor divided by the estimate of the calibration factor as shown in the following equation. Equation 53 ð¶ð ð CC Where: CV = coefficient of variation of the calibration factor. C = estimate of the calibration factor. V(C) = variance of the calibration factor, can be calculated as follows: Equation 54 ð ð¶ â ð¦ ð â ð¦â ð¦ Where: ð¦i = observed counts. ð¦ = uncalibrated predicted values from the SPF. k = dispersion parameter (recalibrated, as shown above). CURE Plots and Related Measures A CURE plot is a graph of the cumulative residuals (observed minus predicted crashes) against a variable of interest sorted in ascending order (e.g., major road traffic volume). CURE plots provide a visual representation of GOF over the range of a given variable and help to identify potential concerns such as the percent exceeding the 95% confidence limit. Cumulative residuals outside the 95% confidence limits indicate a poor fit over that range in the variable of interest. If these residuals are frequently outside the confidence limits possible bias in the SPF is indicated. The FHWA Calibrator Tool (Lyon et al., 2016) provides CURE plots, and evaluates the percent of cumulative residuals points exceeding the 95% confidence limits, Illustration Where Case A or C Pertains As noted above, where Case A or Case C pertains, a crash type/severity proportion developed from a subject jurisdictionâs data is applied to a prediction from the recommended and calibrated âparentâ SPF from another jurisdiction. The validity of the resulting SPF is assessed by applying the FHWA Calibrator Tool (Lyon et al., 2016) to a dataset from the subject jurisdiction and evaluating the various goodness of fit measures provided by the tool. This illustration here is based on SPF predictions for same direction (SD), killed and seriously injured (KA) crashes on 4-lane divided (4D) segments. An HSM base model for these crashes could not be estimated in NCHRP Project 17-62 because there were none in the database, which came from California. However, the database for another jurisdiction (Illinois) that was used for model validation in that project contained eight such crashes. So the question is: what SPFs can be used for estimating KAâSD crashes for base conditions in Illinois? To address this question, as noted, a crash type/severity proportion developed from the subject jurisdictionâs data (Illinois) is applied to a prediction from the recommended and calibrated âparentâ SPF from another jurisdiction (California). Two parent SPFs estimated for California could be considered and compared, one for all severities combined SD crashes, and one for all KA crashes of all types combined.

64 The Illinois validation dataset is used in this illustration for estimating the crash type/severity proportion as well as for assessing the resulting SPF when the proportion is applied to the parent SPF from California estimation data. The eight crashes in the Illinois dataset constituted 17.02% of all SD crashes and 38.01% of all KA crashes in the Illinois validation dataset. Applying these proportions to the respective parent SPFs estimated for NCHRP Project 17-62 for California, the following base condition CPMs are considered for estimating SD--KA crashes in Illinois: Equation 55 ððð¡ððð 1:ð¶ððð âðð ð¦ððð 0.1702 exp 14.701 ð ðððððð¡ ððððð¡â ð´ð´ð·ð . Equation 56 ððð¡ððð 2:ð¶ððð âðð ð¦ððð 0.3801 exp 7.690 ð ðððððð¡ ððððð¡â ð´ð´ð·ð . Using the FHWA Calibrator tool (Lyon et al., 2016), these SPFs are calibrated to Illinois base condition validation data being used for the illustration. The data used for this should be, as is the case here, the largest set that is feasibly available. The assessment of which option is best, and if either provides reliable predictions in the first place, is based on several goodness-of-fit (GOF) measures estimated by the FHWA Calibrator Tool (Lyon et al., 2016). The key ones for this exercise are the coefficient of variation (CV) of the estimated calibration factor, and measures from cumulative residual (CURE) plots. The Calibrator GOF outputs and CURE plots based on the SPF predicted values (shown on the x-axis) for the two options are shown below in Table 27, Figure 5, and Figure 6. Table 27. GOF Outputs for the Two Options (SD_KA). SPF Option Total observed crashes Total predicted crashes Calibration factor V(C) CV(C) SD_KA Option 1 8 12.86 0.62 0.05 0.36 SD_KA Option 2 8 4.67 1.71 0.37 0.36

65 Â Figure 5. FHWA Calibrator Tool CURE plot of residuals based on modified NCHRP Project 17-62 estimated base condition SPF (Option 1, Equation 55) predictions (x-axis) Â

66 Â Figure 6. FHWA Calibrator Tool CURE plot of residuals based on modified NCHRP Project 17-62 estimated base condition SPF (Option 2, Equation 56) predictions (x-axis) The first thing to note is the coefficient of variation [CV(C)] of the calibration factors (0.36 in each case) is suggestive of an unsuccessful calibration based on the suggested upper threshold of 0.15 for the CV in the FHWA Calibrator User Guide (Lyon et al., 2016). Nevertheless, and analyst may still decide to select an SPF and use it with due caution. In this case, the CURE plots suggest that Option 1 would be somewhat superior to Option 2 in that the cumulative residuals oscillate closer to the x-axis and stay largely within the two standard deviation (Ï) limits. (Additional Calibrator output (not shown) indicates that the % of CURE data points exceeding the 2Ï limits is 41% for Option 2 and 23% for Option 1.) Illustration Where Case B Pertains As noted above, where Case B pertains, an SPF is available but is thought not to be reliable because it did not validate well or has poor GOF statistics. The question for the analyst in a given jurisdiction is whether it is better to use that unreliable SPF or to apply a modified SPF in which a crash type/severity proportion developed from the jurisdictionâs data is applied to a prediction from the recommended and uncalibrated âparentâ SPF. The two SPF approaches are assessed and compared by applying the FHWA Calibrator Tool (Lyon et al., 2016) to the jurisdictionâs data and evaluating the various goodness-of-fit measures provided by the tool. Several base condition SPFs developed in NCHRP Project 17-62 did not validate well or had poor GOF statistics. The illustration here is for one of those that pertained to same direction (SD), KAB (killed, serious or moderately injured) crashes at 4 leg stop controlled (4ST) intersections on multilane roads. The NCHRP Project 17-62 estimated base condition SPF for this crash type and severity was:

67 Equation 57 ð¶ððð âðð ð¦ððð exp 9.502 ððð¡ðð ð¸ðð¡ððððð ð´ð´ð·ð . The SPF was estimated based on only 12 crashes at 139 sites in Minnesota and the large standard error for the AADT exponent (0.558) indicates not only that it is highly insignificant statistically, but that the SPF is likely highly unreliable. To illustrate the assessment of the validity of applying this SPF to another jurisdiction, NCHRP Project 17-62 validation data for Ohio are used. That dataset contained 12 KAB-SD crashes at 83 sites. To address the question at hand, as noted, a crash type/severity proportion developed from the subject jurisdictionâs data (Ohio) is applied to modify predictions from the recommended and calibrated âparentâ SPF from another jurisdiction (Minnesota). These predictions are then assessed and compared to those from the unreliable SPF (Equation 3 in this illustration). Two parent SPFs from Minnesota data could be considered and compared for the modified SPF predictions, one for all severities combined SD crashes and one for all KAB crashes of all types combined. The 12 KAB-SD crashes constituted 17.39% of all KAB crashes and 28.57% of all SD crashes in the Ohio validation dataset. Applying these proportions to the respective parent SPFs, the following modified base condition SPFs are considered for estimating SD--KAB crashes in Ohio: Equation 58 ððð¡ððð 1:ð¶ððð âðð ð¦ððð 0.1739 exp 8.843 ððððð ð´ð´ð·ð . ððððð ð´ð´ð·ð . Equation 59 ððð¡ððð 2:ð¶ððð âðð ð¦ððð 0.2857 exp 14.343 ððððð ð´ð´ð·ð . ððððð ð´ð´ð·ð . Using the FHWA Calibrator Tool (Lyon et al., 2016), these two SPFs, along with the unreliable one estimated from the Minnesota base condition data (Equation 57), are calibrated to NCHRP Project 17-62 Ohio base condition validation data being used for the illustration. The data used for this should be, as is the case here, the largest set that is feasibly available. The assessment of which of the three SPFs is best, and if any of them provides reliable predictions in the first place, is based on several goodness-of-fit measures estimated by the Calibrator tool. The key ones for this exercise are the coefficient of variation (CV) of the estimated calibration factor, and measures based on cumulative residual (CURE) plots. The calibration factor results and CURE plots based on predicted values (shown on the x-axis) for the original NCHRP Project 17-62 estimated base condition SPF (Equation 57) and the two modified SPF options (Equation 58 and Equation 59) are shown below in Table 28, Figure 7, Figure 8, and Figure 9. Table 28. GOF Outputs for Original and Options 1 and 2 (SD_KAB). SPF Option Total observed crashes Total predicted crashes Calibration factor V(C) CV(C) Original 12 7.11 1.69 0.24 0.29 Option 1 12 11.57 1.04 0.09 0.29 Option 2 12 19.14 0.63 0.03 0.29

68 Figure 7. Calibrator CURE plot of residuals based on original NCHRP Project 17-62 estimated base condition SPF (Equation 57) predictions (x-axis)

69 Figure 8. Calibrator CURE plot of residuals based on modified NCHRP Project 17-62 estimated base condition SPF (Option 1, Equation 58) predictions (x-axis)

70 Figure 9. Calibrator CURE plot of residuals based on modified NCHRP Project 17-62 estimated base condition SPF (Option 2, Equation 59) predictions (x-axis) The first thing to note is the coefficient of variation [CV(C)] of the calibration factors (0.29 in each case) is indicative of an unsuccessful calibration based on the suggested upper threshold of 0.15 for the CV in the FHWA Calibrator User Guide (Lyon et al., 2016). Nevertheless, an analyst may still decide to select an SPF and use it with due caution. In this case, the CURE plots suggest that Options 1 or 2 would be somewhat superior to the original model in that the cumulative residuals for these options oscillate closer to the x-axis and stay within the 2 standard deviation limits. (Additional Calibrator output (not shown) indicates that the % of CURE points exceeding the 2Ï limits is 29% for the original model, 19% for Option 1 and 13% for Option 2.) There is little to choose between the Options 1 and 2, except for the fact that the calibration factor of 1.04 for Option 1 is closer to 1.0 than that for Option 2 (0.63), so may give more comfort to an analyst in the subject jurisdiction in spite of the slightly larger % of CURE points exceeding the 2Ï limits.