**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

42 Chapter 4. Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values Introduction The application of crash prediction models (CPMs) involves the use of a safety performance function (SPF), optionally in combination with crash modification factors (CMFs). Both the SPF and CMFs require the input values, e.g., segment AADT or lane width, for each location to be known. In some cases, these variables can be known with certainty, e.g. if a left-turn lane is or is not present on the major road. Other variables, such as segment AADT are only estimated and/or subject to measurement error. In other cases, the value may be accurate for when it was measured but no longer reflects the real value, e.g., using old traffic counts. Since the predicted value of the CPM is reliant on the input values, the reliability of the prediction is directly dependent on the accuracy of the input values. The accuracy of input values affects the reliability of a CPM, as shown in Table 12. There are three aspects of reliability: prediction bias, variance of the predictions (greater uncertainty) and repeatability. Error in estimated input values will make the CPM less reliable in terms of bias. The CPM will also be less reliable in terms of repeatability if the error is due to a poor description of input values. Input values must be clearly understood in terms of how to measure and the units of measurement. Errors in input values will not however impact the variance of the CPMâs predicted value. Table 12. Influence of Error in Estimated Input Values on the Reliability of an Estimated Value Using a CPM. Influence Category Factor Effect of Factor on Reliability of CPM Bias Variance Repeatability Application- related factors influencing reliability Error in estimated input values Less reliable No effect Less reliable if error is due to poor description of input value (units, source, etc.). Because the impact on reliability can be so variable depending on the degree of uncertainty or error and its relative impact to the overall CPM prediction, it is not possible to give strict guidance on how uncertainty in input values will affect the reliability of a CPM. For this reason, the evaluation methods will be demonstrated using actual data and a heuristic procedure that practitioners can use to assess how their data may affect reliability. The procedure also provides a related rating of reliability that could contribute to an overall rating system of CPM reliability. As an example, a jurisdiction may have reason to believe that their AADT counts can be off by as much as 30 per cent. The procedure will allow for a sensitivity analysis to determine how large an issue this is in terms of the CPM resultsâ reliability. The goal is the development of an evaluation and rating system for the impact of error in input values on the predictions from CPMs. Several quantitative measures are recommended for use and thresholds suggested for providing a rating system. The remainder of this Chapter discusses factors affecting the potential magnitude of reliability issues, the procedure developed for assessment, and a worked example.

43 Factors Affecting the Potential Magnitude of Reliability Issues How significant erroneous input values affect the reliability of a CPM may depend on the context of its use and, in the case of estimating the safety effects of countermeasures or design decisions, potentially in the direction of the effect on the CPM, i.e., are more or fewer crashes predicted. For example, CPMs that are being used to evaluate a contemplated countermeasure may prove much more unreliable if incorrect values are used since a decision on implementing the countermeasure is based on cost-effectiveness, which is so directly tied to the empirical Bayes estimate of impacted crashes without the countermeasure, for which a CPM is used. However, if the likely effect of errors is to underestimate crashes, then the impact of errors is minimized as a decision-maker will tend to err on the side of caution in not implementing a countermeasure in the face of uncertainty. On the other hand, network screening applications, whereby a number of sites are ranked by the potential for safety improvement may be less impacted since the relative rankings between sites may not change materially from that where the input values were known with more accuracy. Assessing how impactful erroneous input values are to the reliability of a CPM is difficult to assess. The impact on reliability is largely dependent on: 1. The degree to which the value is uncertain or erroneous. For example, are AADT estimates off an average of 5% or 40%? 2. How impactful the variable under question is to the CPM prediction. For example, AADT related variables typically account for most of the magnitude of a CPM prediction while other variables may not materially affect the CPM prediction. Methods to Assess Potential Reliability Because the impact on reliability can be so variable depending on the degree of uncertainty or error and its relative impact to the overall CPM prediction, it is not possible to give strict guidance on how this will affect the reliability of a CPM. This procedure essentially performs a sensitivity analysis to see how an anticipated possible magnitude of measurement error in input values affects the following types of analyses: 1. Applying a CPM to predict crash frequency 2. Applying a CPM along with crash data for Network Screening using empirical Bayes estimates of expected and expected excess crash frequency The error of interest is that which may occur due to some type of failure in the measurement process (e.g., defective measuring device, inadequate personnel training). It may also result from a misinterpretation or misapplication of recorded archival data (e.g., misunderstanding nuances of agency data definitions, extrapolation of agency data in time or space). Goodness-of-Fit Measure Definitions The procedure applies several goodness-of-fit measures. This allows for a quantitative assessment of the degree of bias introduced when measurement error is present. Variable Definitions ðð ð¸ð· = predicted value from CPM with measurement error The guidance developed is a heuristic procedure that practitioners can use to assess how uncertainty or error in their data may affect reliability.

44 ðð ð¸ð· = predicted value from CPM with estimated value n = data sample size. Root Mean Squared Difference (RMSD) The root mean squared difference is a measure of the variability of the difference between the CPM prediction with error in input values and the CPM prediction with the estimated values bias. ð ððð¡ ðððð ððð¢ðððð ð·ððððððððð â ðð ð¸ð· ðð ð¸ð·ð . Note that if expressing on a per year basis, divide the RMSD by the number of years of data. Mean Absolute Difference The mean absolute difference is a measure of the average absolute difference between the CPM prediction with error in input values and the CPM prediction with the estimated values. ðððð ð´ðð ððð¢ð¡ð ð·ððððððððð â |ðð ð¸ð· ðð ð¸ð· |ð Note that if expressing on a per year basis, divide the mean absolute difference by the number of years of data. Extreme Value The extreme value is a measure of the magnitude of a high value of the mean absolute deviation. It is recommended to use the 85th percentile value although any percentile value desired by the analyst may be selected. The calculation is based on an assumed Gamma distribution of the values of the absolute difference and uses the methods of moments to determine the alpha and theta parameters of the Gamma distributions using the following equations: ðððâð ðððð ð´ðð ððð¢ð¡ð ð·ðððððððððð ððð¡ ðððð ððð¢ðððð ð·ððððððððð ð¡âðð¡ð ð ððð¡ ðððð ððð¢ðððð ð·ððððððððððððð ð´ðð ððð¢ð¡ð ð·ððððððððð The value of the Absolute Difference at the desired percentile level can be determined using online calculators such as https://homepage.divms.uiowa.edu/~mbognar/applets/gamma.html, or using statistical textbooks. For example, using the 85th percentile and estimated Gamma distribution parameters, the analyst estimates the value of Absolute Deviation that 85% of sites would be expected to be less than or equal to, or conversely, the value that 15% of sites may exceed.

45 Spearmanâs Correlation Coefficient (Rho) The Spearmanâs Correlation Coefficient is used to compare Network Screening rankings using the CPMs with measurement error to the ranking using the CPM with the original estimated values. Note that the same sites must be represented on both ranked lists. ðððððððð ð ðððððððð¡ððð ððððððððððð¡ ð âð 1 6â ð ððð ð ðððð ð 1 where: Rankerror = rank number using CPM with measurement error Rankest = rank number using CPM with estimated value(s) n = number of sites in ranked list Percentage of False Positives For Network Screening, for the top 30, 50 and 100 sites ranked using the CPM with estimated values, the percentage of those sites not included in the ranked lists using the other CPM with measurement error is tabulated. Procedural Steps This procedural steps for each of the analyses is outlined in this section. Note that the measures of effectiveness are based on predicted values since the object is to determine how the predicted values would differ if errors were present in the measured values of predictor variables in the model. Sensitivity Analysis of Predicted Crash Values Step 1: Assemble all data required for applying the CPM. HSM guidance on minimum sample sizes can be used to determine the number of locations required. Step 2: Follow the HSM guidance for calibrating the CPM if it was developed in another jurisdiction or a different time period than the data to which it will be applied. The calibration step is done using the estimated values for each variable in the CPM. Step 3: For each variable in the CPM where measurement error is of concern, assign a random number reflecting the degree to which measurement error is suspected for that variable. For example, if the measurement error for AADT is thought to be of the order of 10-20 percent, then each site is assigned a random number between 0.80 and 0.90 or between 1.10 and 1.20. In Excel, this can be accomplished using the following formula: =1+CHOOSE(RANDBETWEEN(1,2),-1,1)*RANDBETWEEN(10,20)/100. The analyst should select an appropriate representation of potential error in a variable which may change by variable. For example, error in traffic volumes may be thought to be better represented as a percentage while for shoulder width the possible error may be better represented by a given number of inches. Below are four additional formulations of measurement error. Xest refers to the estimated value of a variable while Xerror is the value of that variable with measurement error assigned. Case A. Error for variable of interest is P% Xerror = Xest x (1 Â± P/100)

46 Case B. Error for variable of interest is P% +/- Q% Xerror = Xest x (1 Â± P/100 + RandBetween(-Q,Q)/100) Case C. Error for variable of interest is R Xerror = Xest Â± R Case D. Error for variable of interest is R +/- S% Xerror = Xest Â± R x [1 + RandBetween(-S,S)/100)] Step 4: For each variable in the CPM where measurement error is of concern, multiply the recorded value by the random number generated for that variable in Step 3. Step 5: Apply the CPM twice. Once using the original estimated variable values and a second time using the new variable values generated in Step 4. For any variables not thought to be subject to measurement error use their original values. Step 6a: Use the values from Step 5 to estimate the mean difference, root mean squared difference, the mean absolute difference, and the extreme value at the desired percentile. Step 7a: Divide the root mean squared difference and the extreme value estimates from Step 6a by the average value of the crash predictions with known values and multiply by 100. This step normalizes the measures from Step 6a by comparing the differences in predictions to the magnitude of expected crashes. Step 8a: Using the goodness-of-fit measures calculated in Step 7a assess the impact of measurement errors on the CPM using the guidance in Table 13. The values of the measures in Step 7a are used to classify the CMF performance as High, Medium, Low or Critically Low for each measure. The most reliable rating is High while the worst is Critically Low. The overall rating for the CPM is defined by the lowest reliability rating of the two measures. Table 13. Sensitivity Analysis Predictions Evaluation Guidance. Measure Reliability Rating High Medium Low Critically Low Percentage of Root Mean Squared Difference/Avg. Prediction <15% 16% to 25% 26% to 50% >50% Percentage of Extreme Value/Avg. Prediction <15% 16% to 25% 26% to 50% >50% Sensitivity Analysis for Network Screening For this analysis, Steps 1 to 5 shown in the Procedure for Sensitivity Analysis of Predicted Crash Value are the same. At this point, the empirical Bayes (EB) method is applied to combine the CPM prediction with the observed crash frequency.

47 The analyst should decide how many years of observed crash data will be used in their Network Screening program and whether sites are to be screened by the EB Expected or the EB Excess methods2. Step 6b: For each CPM applied, compute either the EB or EB Excess estimate for each site by combining the CPM predicted crash estimate with the observed crash data. The number of years of data used should reflect the number of years to be used in the network screening program. If sites are road segments, divide the estimates by segment length to normalize by length. Step 7b: For each CPM applied, rank all locations separately by the network screening measure used (EB Expected or EB Excess). Step 8b: For each ranked list determine the Spearmanâs correlation coefficient, comparing the rankings using the CPMs with measurement error to the ranking using the CPM with the original estimated values. Step 9b: For each ranked list, for the top 30, 50 and 100 sites ranked using the base CPM, the percentage of sites not included in the ranked lists using the CPMs with measurement error is tabulated. Step 10b: Using the goodness-of-fit measures calculated in Steps 8b and 9b assess the impact of measurement errors on the CPM using the guidance provided in Table 14. The most reliable rating is High, while the worst is Critically Low. The worst rating for all cells in the table is used to rate the reliability of the CPM with errors for network screening applications. Table 14. Network Screening Evaluation Guidance. Measure Reliability Rating High Medium Low Critically Low Spearmanâs correlation coefficient 0.90 to 1.00 0.70 to 0.89 0.40 to 0.69 <0.40 % False Positives in Top 30 Sites <10% 11% to 25% 26% to 40% >40% % False Positives in Top 50 Sites <7.5% 7.6% to 20% 21% to 40% >40% % False Positives in Top 100 Sites <5% 6% to 15% 15% to 40% >40% 2 EB Expected and EB Excess Methods are two of the methods recommended by the 1st edition of the HSM (AASHTO, 2010). These methods account for possible bias due to the regression to the mean (RTM). Further discussion of these methods can be found in Chapter 4 of the 1st edition of the HSM.

48 Example For the example, the CPM that may be included in the HSM 2nd edition for total crashes on urban four- lane divided arterials is applied to each site. The dataset includes 5 years of observed crash data and 358 segments. The CPM applied is: Crashes per year = (length)exp(-11.9469)*AADT(1.3272) exp(0.0182*dwydens-0.0054*medwid) The dispersion parameter is modeled as: Dispersion parameter = exp(-0.6179)*(length)(-0.5502) Where, length = segment length in miles AADT = annual average daily traffic dwydens = number of driveways within the segment per mile (6.35 is the average density) medwid = median width in feet (33.27 is the average width) For network screening, the EB Expected method is selected and 3 years of observed crash data are to be considered. Step 3: For Step 3, the levels of measurement error assumed were: 20-30% for AADT, 10-20% for dwydens and 0-10% for medwid. The factors for each site were determined in Microsoft Excel using the following formulae: AADTfactor=1+CHOOSE(RANDBETWEEN(1,2),-1,1)*RANDBETWEEN(20,30)/100. DWYDENSfactor=1+CHOOSE(RANDBETWEEN(1,2),-1,1)*RANDBETWEEN(10,20)/100. MEDWIDfactor=1+CHOOSE(RANDBETWEEN(1,2),-1,1)*RANDBETWEEN(0,10)/100. Step 4: The random numbers generated for each site in Step 3 were multiplied by their respective variable values. Step 5: The CPM was applied to each site, now using the variable values generated in Step 4. Step 6a: After applying the CPMs with known and with measurement error values, the root squared mean difference, the mean absolute difference, and the extreme value at the 85th percentile were estimated on a per year basis using all 5 years of data. Root Mean Squared Difference per year = 1.34 Mean Absolute Difference per year = 0.46 The values of root mean squared difference per year and mean absolute difference per year are used to estimate the parameters of the gamma distribution: alpha = (0.46/1.34)2= 0.12 theta = 1.342/0.46 =3.90

49 Using the values of alpha and theta and tables for the gamma distribution, the 85th percentile value is determined to be 0.73. Extreme Value85th Percentile per year = 0.73 Step 7a: The root mean squared difference and extreme value were divided by the average predicted value per year using the CPM with known values, 1.33, and multiplied by 100 to express as a percentage. % Root Mean Squared Difference/Avg. Prediction = 1.34/1.33 = 101% % Extreme Value Value85th Percentile/Avg. Prediction = 0.73/1.33 = 55% Step 8a: The goodness-of-fit measures for the sensitivity of predictions calculated in Step 7a are evaluated. Based on the results the reliability for the sensitivity of predictions would be critically low. Therefore, for analyses where the magnitude of the prediction is important, the CPM should not be used, and a better performing CPM should be adopted. Table 15. Sensitivity Analysis of Predictions Evaluations for Example. Measure Reliability Rating High Medium Low Critically Low Percentage of Root Mean Squared Difference/Avg. Prediction <15% 16% to 25% 26% to 50% >50% Percentage of Extreme Value/Avg. Prediction <15% 16% to 25% 26% to 50% >50% Step 6b: The EB Expected estimates are calculated for all sites using the past 3 years of observed crash data. All sites are divided by segment length to normalize by length. Step 7b: All sites are ranked in descending order by the EB Expected estimates from Step 6b for both the CPM with known values and the CPM with measurement errors. Step 8b: The Spearmanâs correlation coefficient (Rho), is estimated from the ranked lists in Step 7b. Step 9b: For the top 30, 50 and 100 sites ranked using the base CPM, the percentage of sites not included in the ranked lists using the CPM with measurement error is tabulated. Step 10b: The goodness-of-fit measures for Network Screening are evaluated. Table 16. Example Network Screening Results. CPM Rho 30 50 100 Prediction With Errors 0.69 20 16 16

50 Comparing to the guidance in Table 17 the results indicate reliability rankings of medium for % false positives in the top 30 and 50 sites and rankings of Low for the Spearmanâs correlation coefficient and % false positives in the top 100 sites. Using the lowest ranking to characterize the CPM with errors results in a reliability of Low for Network Screening. Therefore, for network screening applications the CPM should not be used, and a better performing CPM should be adopted. Table 17. Network Screening Evaluation for Example. Measure Reliability Rating High Medium Low Critically Low Spearmanâs correlation coefficient 0.90 to 1.00 0.70 to 0.89 0.40 to 0.69 <0.40 % False Positives in Top 30 Sites <10% 11% to 25% 26% to 40% >40% % False Positives in Top 50 Sites <7.5% 7.6% to 20% 21% to 40% >40% % False Positives in Top 100 Sites <5% 6% to 15% 15% to 40% >40%