**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

51 Chapter 5. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability Introduction In some instances, CPMs may include only traffic volumes as predictor variables, traffic volumes plus a limited number of geometric and traffic control variables, or, traffic volumes and a large number of geometric and traffic control variables. One question a practitioner may face is whether a CPM with more variables is more reliable than a simpler CPM. Likewise, if a user does not have the information for each variable in a CPM, there is a question as to how reliable its application may be. For practitioners, the reliability of a CPM is concerned with its bias and the precision of the estimate, expressed by the estimate variance. Assuming that a CPM was developed with an appropriate functional form and estimation method, additional variables in a CPM would be expected to reduce any biases in its application. However, additional variables may also increase the variance of the estimates provided, and CPMs with many variables may also be in danger of being overfit to the calibration data and thus not perform as well when applied elsewhere. A CPM that was overfit to the estimation data will result in more bias when applied. Hauer (2015) addresses the issue of bias in CPMs in the context of modeling where the question asked is whether to add a variable to a CPM equation. Bias-in-fit arises when for some range of values of a predictor variable the CPM either consistently over or under-predicts the number of crashes. Bias-in-use arises when the user has information on a safety related variable that is not included in the CPM, or, the CPM includes a CMF for which the user does not have information. Hauer notes that while the addition of variables with explanatory power to the CPM will reduce bias-in-use, there is the potential that the variance of the prediction will increase because some variables themselves are measured with uncertainty (e.g., AADT) and each parameter estimate has an associated variance. The net effect is however difficult to predict given the complex interactions and correlations between variables. Because the impact on reliability can be so variable depending on the relative impact of a variable to the overall CPM prediction, it is not possible to give strict guidance on how the number of variables in a CPM will affect its reliability. For this reason, the evaluation methods will be demonstrated using actual data and a heuristic procedure that practitioners can use to assess how their data may affect reliability. The procedure also provides a related rating of reliability that could contribute to an overall rating system of CPM reliability. The subject of this Chapter is the development of an evaluation and rating system for the impact of the number of variables in a CPM on the predictions from CPMs. Several quantitative measures are recommended for use and thresholds suggested for providing a rating system. The remainder of this Chapter discusses factors affecting the potential magnitude of reliability issues, the procedure developed for assessment, and a worked example. Factors Affecting the Potential Magnitude of Reliability Issues How significant the addition or absence of a variable is to the reliability of a CPM may depend on the context of its use. There are several ways in which reliability is influenced as shown in Table 18. Relative Impact of the Variable The importance of the variable to the expected number of crashes influences the reliability to a large degree. For example, traffic volumes have been shown to be the most influential predictor of crashes. The absence of appropriate traffic volume variables, for example, left-turn volumes in a CPM for intersection left-turn crashes, would result in larger biases in application and a more unreliable CPM. On the other

52 hand, there are variables with a relatively low impact on expected crashes and the inclusion of those variables would do little to reduce bias and increase reliability. For example, the shoulder type may have little impact on the frequency of total crashes for rural multilane roads. Table 18. Factors Related to How Number of Variables in a CPM Influences the Reliability of an Estimated Value Using a CPM. Influence Category Factor Effect of Factor on Reliability Measures Bias Variance Repeatability Application-related factors influencing reliability 1. Relative impact of a CPM variable Less reliable for more influential variables not included Less reliable for more influential variables not included No effect if variable has relatively low impact on CPM. Less reliable for more influential variables not included Application-related factors influencing reliability 2. Omitted variables in CPM Less reliable Less reliable Less reliable if CMF omitted variable is prominent in application sites Application-related factors influencing reliability 3. Missing application data Less reliable Less reliable Less reliable for more influential variables and depending on similarity between application and model estimation sites Omitted Variables in the CPM Reliability is also influenced if the application sites differ from the sites used to develop the CPM and those variables are not included in the model. For example, consider a CPM for two-lane rural roads that does not include horizontal curvature in the CPM and was developed from a dataset containing 10 per cent curved segments and 90 per cent tangent segments. If the CPM were applied to a group of segments that are all horizontal curves, bias would almost certainly exist given that expected crash frequencies are higher on curved segments than on tangents. Missing Application Data If the user does not have data for one or more of the variables in the CPM, then bias may result. While it would be prudent to acquire that data, there may be instances where this is not possible. For example, a designer may be applying the CPM to estimate the safety of a site before all design elements have been finalized. In such a case, a user may âremoveâ the variable with missing data from the CPM by substituting the average value of that variable from the estimation data (if available). The extent of the bias will depend on the importance of the variable to the estimate of expected crashes and the similarity between the application and model estimation sites. Methods to Assess Potential Reliability Because the impact on reliability can be so variable depending on the relative impact of a variable to the overall CPM prediction, it is not possible to give strict guidance on how the number of variables in a CPM will affect its reliability.

53 This procedure essentially answers two questions: 1. Which of multiple CPMs to apply, particularly when the number of variables varies between SPFs? 2. What are the impacts on reliability of using a CPM when not all the variables in the CPM are known? Goodness-of-Fit (GOF) Measure Definitions Modified R2 Fridstrom et al., (1995), introduced a modified R2 value. This GOF measure subtracts the normal amount of random variation that would be expected if the SPF were 100 percent accurate. Even with a perfect SPF, some variation in observed crash counts would be observed due to the random nature of crashes. As a result, the amount of systematic variation explained by the SPF is measured. Larger values indicate a better fit to the data in comparing two or more competing SPFs. Values greater than 1.0 indicate that the SPF is over-fit and some of the expected random variation is incorrectly explained as the systematic variation. ð â ð¦ ð¦ â ðâ ð¦ ð¦ â ð¦ where: ð¦ = observed counts. ð¦ = predicted values from the SPF. ð¦= sample average. ð = ð¦ -ð¦ . Mean Absolute Deviation (MAD) The mean absolute deviation is a measure of the average value of the absolute difference between observed and predicted crashes. ðð´ð· â |ð¦ ð¦ |ð where: ð¦ = predicted values from the SPF. ð¦ = observed counts. n = validation data sample size. Dispersion Parameter The dispersion parameter, f(k), in the negative binomial distribution is reported from the variance equation expressed as follows: ððð ð ð¸ ð ð ð ð¸ ð Or, ð ð ððð ð ð¸ ðð¸ ð The guidance developed is a heuristic procedure that practitioners can use to assess how the use or absence of additional variables in a CPM affects reliability.

54 where: f(k) = estimate of the dispersion parameter. Var{m} = estimated variance of mean crash rate. E{m} = estimated mean crash rate. The estimated variance increases as dispersion increases, and consequently the standard errors of estimates are inflated. As a result, all else being equal, an SPF with lower dispersion parameter estimates (i.e., smaller values of f(k)) is preferred to an SPF with more dispersion. Note that f(k) can be specified as a constant or as a function of site characteristics. When f(k) is a constant or a constant per length this may be used to easily compare multiple CPMs. CURE Plot Measures Another tool to assess GOF is the cumulative residual (CURE) plot. A CURE plot is a graph of the cumulative residuals (observed minus predicted crashes) against a variable of interest sorted in ascending order. CURE plots provide a visual representation of GOF over the range of a given variable, and help to identify potential concerns such as the following: Long trends: long trends in the CURE plot (increasing or decreasing) indicate regions of bias that should be rectified through improvement to the SPF either by the addition of new variables or by a change of functional form. Percent exceeding the confidence limits: cumulative residuals outside the confidence limits indicate a poor fit over that range in the variable of interest. Cumulative residuals frequently outside the confidence limits indicate notable bias in the SPF. Recommended thresholds for the percent of cumulative residuals exceeding the 95 percent confidence limits are provided below. Vertical changes: Large vertical changes in the CURE plot are potential indicators of outliers, which require further examination. Further information on this topic can be found in Chapter 7 of Hauer (2015). The Calibrator (Lyon et al., 2016) automatically provides a CURE plot for fitted values and two summary measures: ï· The maximum deviation ï· The percent of observations outside the 95 percent confidence limits Spearmanâs Correlation Coefficient The Spearmanâs correlation coefficient is used to compare network screening rankings using the CPM with all potential variables and alternate CPMs. Note that the same sites must be represented on both ranked lists. ðððððððð ð ðððððððð¡ððð ððððððððððð¡ ð âð 1 6â ð ððð ð ðððð ð 1 Â where: Rankfull = rank number using the full CPM with all variables Rankalt = rank number using the alternate CPM n = number of sites in ranked list Percentage of False Positives For network screening, for the top 30, 50 and 100 sites ranked using the full CPM with all variables, the percentage of those sites not included in the ranked lists using the alternate CPMs is tabulated for both EB Expected and EB Excess screening.

55 Procedural Steps It is important to first mention that the focus of the procedure is not on developing a CPM but to help analysts determine which of multiple candidate CPMs to use, or, if not all variables are readily available for applying a CPM how much reliability is lost if those variables are not used in applying the CPM. The methodology to evaluate reliability issues related to the number of variables in a CPM makes use of the FHWA Calibrator Tool (https://safety.fhwa.dot.gov/rsdp/toolbox-content.aspx?toolid=150) (Lyon et al., 2016). The Calibrator is a spreadsheet tool for calibrating CPMs to new jurisdictions and assessing the predictive performance. The goodness-of-fit measures provided can be used to assess the bias that may arise from excluding variables and conversely how improved is the performance of a CPM with additional variables included. Step 1: Assemble all data required for applying the CPM. HSM guidance on minimum sample sizes can be used to determine the number of locations required. All variables required for applying the full CPM are collected. This may mean a sub-sample of all sites in a jurisdiction is used to assess the reliability of alternate CPMs. Step 2: Decide how many alternate CPMs are to be compared and which variables will be included in each. One of the CPMs will be the full CPM with all variables and another should only include traffic volume variables, and length if the site type is a segment. The other CPMs of interest are derived from the full CPM by removing one or more variables. For these âderivedâ CPMs, variables to not include may be those that are difficult to obtain for all sites or where the estimated values are suspect. When âremovingâ a variable from a CPM, this is accomplished by substituting the average value for that variable from the data used to develop the CPM into the CPM when applying to all sites. Step 3: For each CPM being considered, estimate the Modified R2, MAD, dispersion parameter, and the percent of observations outside of two standard deviation limits for the CURE plot for the fitted values. It is recommended to use The Calibrator tool to perform this step. For ease of comparison it is recommended to calibrate a constant dispersion parameter. Note that each CPM should first be calibrated to the data prior to the calculation of these GOF measures. For each of these measures, divide the values by the value for the full CPM with all variables. This step is undertaken to assess the changes in each measure compared to the full model which should in theory have the best fit to the data. Step 4: The analyst should decide how many years of observed crash data will be used in their Network Screening program and whether sites are to be screened by the EB Expected or the EB Excess methods. Then, for each CPM applied, compute the screening measure for each site as outlined in the 1st edition of the HSM. If sites are road segments divide the estimates by length to normalize by length. Step 5: For each ranked list determine the Spearmanâs correlation coefficient, comparing the rankings using the CPM with all variables used to the other CPMs in turn. Step 6: For each ranked list, for the top 30, 50 and 100 sites ranked using the full CPM with all variables, the percentage of sites not included in the ranked lists using the alternate CPMs is tabulated. Step 7: Using the goodness-of-fit measures calculated in Steps 3, 5 and 6, evaluate the alternate CPMs using the guidance in Table 19 and Table 20. Each CPM is evaluated for goodness-of-fit using Table 19 and Network Screening applications using Table 20. Table 19 guidance is for CPMs that will be used for design applications or evaluation of countermeasures. For these measures, the CPM is evaluated by examining each measure relative to that for the full CPM with all pertinent variables included. Table 20

56 guidance is for CPMs that will be used for Network Screening. The guidance classifies each measure for reliability as High, Medium, Low or Critically Low. The most reliable rating is High, while the worst is Critically Low. The overall rating for each Table is the lowest rating for the CPM across each measure. Table 19. Goodness-of-Fit Evaluation Guidance. Measure Reliability Rating High Medium Low Critically Low Modified R2 relative to full CPM >= 0.90 0.76-0.90 0.50-0.75 <0.50 MAD relative to full CPM <1.20 1.20-1.50 1.51-2.00 >2.00 Overdispersion relative to full CPM <1.20 1.20-1.50 1.51-2.00 >2.00 % values outside of CURE plot vs fitted values relative to full CPM <1.20 1.20-1.50 1.51-2.00 >2.00 Table 20. Network Screening Evaluation Guidance. Measure Reliability Rating High Medium Low Critically Low Spearmanâs correlation coefficient 0.90 to 1.00 0.70 to 0.89 0.40 to 0.69 <0.40 % False Positives in Top 30 Sites <10% 11% to 25% 26% to 40% >40% % False Positives in Top 50 Sites <7.5% 7.6% to 20% 21% to 40% >40% % False Positives in Top 100 Sites <5% 6% to 15% 15% to 40% >40% Example For the example, data and the CPM used were adopted from the FHWA study of improvements to pavement friction, specifically, for two-lane rural roads in California using the CPM for run-off-road crashes. The question is if the CPM can be applied if not all variables are available in the data and which variables may be worth the cost of data collection. The California CPM is: Crashes/mile_year exp 4.3617 0.2162 ððððð¢ð 0.1872 ðð¢ððð¡ð¦ðð 0.0448 ð´ð£ðð âððð¤ðð 0.0852 ð¿ðððð¤ðð ððððððð ð´ð´ð·ð . Where, AADT = Average Annual Daily Traffic

57 Urbrur = 0 if rural environment; 1 if urban Surftype = 1 if asphalt; 0 if concrete Avgshldwid = average of left and right shoulder width in feet Lanewid = lanewidth in feet Terrain = -0.3181 if flat, 0.0000 if rolling, 0.3464 if mountainous overdispersion parameter, k = 0.7667 To remove variables from the CPM to compare its reliability when using fewer variables, the average value for that variable was used. For non-continuous variables such as terrain, the average value of that variable multiplied by its parameter estimate was used to in effect remove that variable from the CPM. These average values were as follows: Urbrur = value of 0.0231 Surftype = value of 0.1831 Avgshldwid = 9.8 feet Lanewid = 11.8 feet Terrain = value of 0.0046 For network screening, the EB Excess measure requires that the EB estimate has the CPM value subtracted from it. This CPM should be a simple CPM representing an average site, with only AADT and length (if a segment) as a predictor variable. For the example, a series of CPMs were applied to the same data starting with only AADT as a predictive variable and adding one variable at a time. As noted above, variables were removed from the full CPM but substituting the average values for variables not included. These CPMs were: CPM 1: AADT CPM 2: AADT, AREA TYPE CPM 3: AADT, AREA TYPE, TERRAIN CPM 4: AADT, AREA TYPE, TERRAIN, LANEWIDTH CPM 5: AADT, AREA TYPE, TERRAIN, LANEWIDTH, SHOULDERWIDTH CPM 6: AADT, AREA TYPE, TERRAIN, LANEWIDTH, SHOULDERWIDTH, SURFACE TYPE Step 3 Results: Looking at Table 21 and Table 22 below it is apparent that the GOF statistics improved noticeably for CPM 3 (addition of Terrain) and CPM 5 (addition of shoulder width). For the other variables, the GOF statistics did not change much so their impact on reliability is less. Table 21. General GOF Statistics. CPM Modified R2 Modified R2 related to CPM 6 (i.e., full model) MAD MAD related to CPM 6 (i.e., full model) Overdispersion Overdispersion relative to CPM 6 (i.e., full model) 1 0.30 0.57 2.88 1.14 1.03 1.32 2 0.30 0.57 2.90 1.15 1.05 1.35 3 0.42 0.79 2.70 1.07 0.89 1.14 4 0.44 0.83 2.66 1.05 0.87 1.12 5 0.53 1.00 2.53 1.00 0.78 1.00 6 0.53 1.00 2.53 1.00 0.78 1.00

58 Table 22. General GOF Statistics Cont'd. CPM % values outside of CURE plot vs fitted values % values outside of CURE plot vs fitted values relative to CPM 6 (i.e., full model) 1 96 2.46 2 97 2.49 3 38 0.97 4 35 0.90 5 39 1.00 6 39 1.00 Step 4: The EB Expected estimates are calculated for all sites using the past 3 years of observed crash data. All sites are divided by segment length to normalize by length. Steps 5 and 6 Results: The Spearmanâs correlation coefficient, Rho, is estimated from the ranked lists in Step 4. For the top 30, 50 and 100 sites ranked using the full CPM, the percentage of sites not included in the ranked lists using the alternate CPMs are tabulated. Table 23. Comparison of Network Screening Results by EB as a Percentage. 3 yr CPM Rho 30 50 100 1 0.87 33 16 21 2 0.87 27 20 23 3 0.95 23 12 14 4 0.96 17 14 15 5 1.00 0 0 1 6 1.00 0 0 0 Step 7: The goodness-of-fit measures are evaluated using the guidance provided (Table 19 and Table 20). The worst rating for all measures in the table is used to rate the reliability of the alternate CPMs. These are highlighted in the tables for each CPM (Table 24 and Table 25). Table 24. General GOF Evaluation for Example. CPM Modified R2 related to CPM 6 MAD related to CPM 6 Overdispersion related to CPM 6 % values outside of CURE plot vs fitted values related to CPM 6 1 Low High Medium Critically Low 2 Low High Medium Critically Low 3 Medium High High High 4 Medium High High High 5 High High High High Â Â

59 Table 25. Network Screening Evaluation for Example. Measure CPM 1 2 3 4 5 Spearmanâs correlation coefficient Medium Medium High High High % False Positives in Top 30 Sites Low Low Medium Medium High % False Positives in Top 50 Sites Medium Medium Medium Medium High % False Positives in Top 100 Sites Low Low Medium Medium High The results indicate that amongst the alternate CPMs evaluated, numbers 1 and 2 should not be applied as their reliability ratings are critically low for the general GOF measures and low for the Network Screening measures. CPMs 3 and 4 are rated medium for both measures while CPM 5 is rated high for both measures. Depending on the cost of data collection and the importance of accuracy in the applications an analyst may decide that either CPMs 3 or 4 can be applied, or, that CPM 5 should be applied. Since CPM 5 is rated high by both measures it may not be worth the extra effort in data collection to use CPM 6.