**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

**Suggested Citation:**"Chapter 1. Introduction and Background." National Academies of Sciences, Engineering, and Medicine. 2021.

*Understanding and Communicating Reliability of Crash Prediction Models*. Washington, DC: The National Academies Press. doi: 10.17226/26440.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Chapter 1. Introduction and Background Highway Safety Manual and the Crash Prediction Methodology The 1st edition of the Highway Safety Manual (AASHTO, 2010) is the product of over 10 years of effort and thousands of volunteer hours to provide fact-based, analytical tools, and techniques to quantify the potential safety impacts of planning, design, operations, and maintenance decisions. Part C of the 1st edition of the HSM contains the predictive methods for rural two-lane roads, rural multilane highways, and urban and suburban arterials. Since the publication of the 1st edition, two chapters with crash prediction models (CPMs) for freeways and ramps that were developed through NCHRP Project 17-45 have been approved for inclusion as a supplement to Part C. The 2nd edition of the HSM is expected to be published in the next two years. Both in the 1st edition of the HSM and the upcoming 2nd edition of the HSM, the predictive method uses three components to estimate the average predicted crash frequency at a site (the product of these three components results in a CPM): ï· Base model, which is a safety performance function (SPF); ï· Crash modification factors (CMFs) (in the 2nd edition of the HSM, these CMFs are expected to be called SPF adjustment factors) to adjust the estimate for site-specific conditions, that may be different from the base conditions; and ï· A calibration factor (C) to adjust the estimate for local conditions. Lack of Guidance About Reliability The 1st edition of the HSM does not include methods to consistently convey model reliability. During initial implementation of the 1st edition of the HSM, model results have been generated and utilized without fully understanding and communicating the accuracy of the model results, which can erode the credibility of this new and rapidly growing field. Since the publication of the 1st edition of the HSM, the state of the art of safety analysis has progressed and more has been learned about the impact on accuracy of assumptions made during the development of crash prediction models using HSM procedures. Practitioners are also striving to fully understand and appropriately communicate the benefits of the HSM methods and results derived from these methods. Many factors affect reliability. Case studies presented at various conferences, including the Transportation Research Board (TRB) Annual Meetings, and through other initiatives demonstrate that some practitioners are utilizing the models incorrectly or in ways not recommended, and displaying crash prediction results without properly understanding the model reliability. Some crash prediction models have been reported with measures of model reliability while others have not. Understanding and communicating consistently reliable crash prediction results are critical to credible analysis and to overcome barriers for some transportation agencies or professionals utilizing these models. This report has been developed to fill this void and provide the practitioner with guidance on being able to assess and understand the reliability of CPMs. Before providing an overview of the Guide, following is some discussion about the meaning of reliability, factors that affect reliability, and goodness-of-fit measures. What Is Reliability? In general, the reliability of the prediction from a CPM can be described in terms of bias, variance, and repeatability: ï· Bias represents the difference between the CPM estimate and the true value. ï· Variance describes the extent of uncertainty in the estimate due to unexplained or random influences.

4 ï· Repeatability describes the extent to which multiple analysts using the same CPM with the same training, data sources, and site of interest obtain the same results (as measured by the number of significant figures showing agreement among results). A more reliable estimate has little bias, a smaller variance, and is likely to have results that show several significant figures in agreement (should there be repeated independent applications). This report focuses on how to estimate the bias and the variance for certain conditions. Two categories of factors influence the reliability of a CPM: model-related factors and application- related factors. Model-related factors describe the components of a CPM that is specified by an agency for use by practitioners to evaluate sites in the corresponding jurisdiction. Application-related factors describe techniques that practitioners apply when using a CPM to evaluate a site. The factors in both categories combine to define the reliability of a CPM estimate when it is used to evaluate a given site. In this regard, the estimateâs reliability is likely to vary from site-to-site and analyst-to-analyst, depending on how the model is configured and applied to a given site. Here are examples of model-related factors include: ï· Agency specification of an HSM CPM without local calibration. ï· Agency specification of an HSM CPM with local calibration. Calibration would reduce the bias of the prediction, but the variance of the prediction could depend on the calibration sample. Both bias and variance are also a function of whether calibration functions are used instead of calibration factors. ï· Agency specification of jurisdiction-specific SPFs instead of the default SPFs from the HSM. ï· Agency specification requiring the use of the empirical Bayes (EB) method with the CPM. ï· Agency specification directing the use of a CMF associated with a treatment (e.g., increase lane width) in a previously developed CPM where the base condition is recognized but the CPM was developed using sites with a significantly different distribution of treatment characteristics (e.g., lane width variance) than is represented at the sites being evaluated. ï· Agency specification directing the use of a CMF associated with a novel treatment in a previously developed CPM where the base condition of âtreatment not presentâ is inherently satisfied since the treatment is new. The effect of a novel treatment may overlap that of the treatments associated with the CMFs in the CPM and, as a result, the CPM may overestimate the effectiveness of the novel treatment. The CMF will also increase the uncertainty of the estimate obtained from the CPM (by an amount that is proportional to the uncertainty associated with the CMF value). ï· Agency specification of calibration factors or functions that update CPMs for changes that affect safety over time. Examples of application-related factors include the following: ï· Error and/or uncertainty in the input values. For example, the error and uncertainty of Annual Average Daily Traffic (AADT) may be very different in low volume roads versus high volume roads, because roads with higher volumes are usually counted more often. ï· Use of CMFs that are inconsistent with the base conditions of the HSM. ï· Application beyond the range of an input variable. For example, this could include applying the CPM for roads with AADT that is higher than the maximum value that is documented in the HSM. There are also parameters (e.g., curve radius, grade) that are used to estimate the CMFs for which the HSM does not provide the range. ï· Application of the CPMs for rare crash types such as fatal crashes. Toward zero deaths is a national vision for the United States. Vision Zero has been adopted by several states and cities throughout the United Stated. A Road to Zero coalition of more than 1,500 (and growing continuously) has been formed and managed by the National Safety Council. Many states have goals to substantially reduce fatal and serious injury crashes using a combination of education, enforcement, engineering, emergency services, and other initiatives. The CPMs in the HSM can

5 be used to predict effect of design alternatives on fatal crashes, but the reliability of these estimates needs to be communicated because many of the CPMs in the HSM were estimated based on limited samples of fatal crashes. ï· Application to sites with characteristics that are not represented by CPM. For example, using the CPMs for six-lane roads although they were developed for four-lane roads. Factors That Influence Reliability Table 1 and Table 2 show some of the factors that influence reliability (mentioned earlier in the introduction) along with an assessment of the effect of each factor on the bias, variance, and repeatability of the estimated predicted value from a CPM, and the source for further information (including the appropriate Chapter in this report). For each factor listed in the table, the effect it typically has on a reliability measure can be indicated as seen in the corresponding row and column. Similarly, the overall reliability of a CPM estimate can vary widely, depending on the number of factors present and the degree to which each factor influences reliability. In some instances, the factors can offset each other (or combine) to improve the overall reliability of the estimate. The effect of bias and variance can be mathematically combined to compute the mean square error of an estimate (iTRANS, 2006). The equation for this calculation is: Equation 1 mean square error = variance + bias2 The uncertainty of the estimate from a CPM can be described in terms of: (1) the variance of the predicted mean of all similar sites, and (2) the variance of the predicted mean for a new site. In Table 1, the âadd EB Methodâ factor represents the use of the empirical Bayes method and the predicted value from a CPM to produce a more reliable estimate of the expected average crash frequency (AASHTO, 2010). The âadd CMFs from other sourcesâ factor represents the case where the analyst desires to mathematically combine the CPM and one or more CMFs (e.g., multiply the original CPM estimate by the new CMFs) of new treatments that are not included in the HSM. In general, if the treatment corresponding to the CMF is consistent with the baseline conditions for the SPF in the CPM, then the use of this CMF will not bias the estimate, but it is likely to increase the variance (Lord, 2008). Lord (2008) assumed that the CMFs are independent. However, recently completed work in NCHRP Project 17-63 (Carter et al., forthcoming) has developed procedures for estimating the prediction from a CPM by accounting for the correlation between the different CMFs that are part of the CPM. Further discussion about CMF variability can be found in Hauer et al., (2012). The recent book by Hauer (2015) also includes a chapter on accuracies, which is a component of the reliability of CPMs. Instead of calibrating the CPMs from the HSM, practitioners may choose to estimate âjurisdiction- specific SPFsâ to replace the base SPF from the SPF. This approach is discussed in Srinivasan and Bauer (2013). If proper statistical approaches are used, this approach is expected to be more reliable regarding bias and variance. Instead of a CMF, a crash modification function (CMFunction) can be used. With a CMFunction, it is recognized that the effect of a treatment may vary depending on the characteristics of the site where it is applied. In general, the CMFunction is expected to improve reliability compared to a CMF. The âadd local calibrationâ factor represents the case where a local calibration factor is included with the CPM. The inclusion of this factor is the most direct means of removing the bias associated with applying a CPM to a new site of interest. However, the CMF calibration user guide developed by Bahar and Hauer (2014) indicates that the calibration factor is likely to be associated with some uncertainty, which, in turn, will increase the uncertainty (i.e., variance) of the estimate. Finally, the âupdate CPM to correct for changes over timeâ factor represents the case where road safety features, driver behavior, and vehicle conditions change over time to the extent that the CPM estimates

6 become less reliable over the same time period. In recognition of this issue, the HSM recommends the recalibration of CPMs every two to three years (AASHTO, 2010). Wood et al., (2013) showed that updating the calibration factor periodically does improve model reliability and provides a cost-effective alternative to developing new models. Application-related factors are listed in Table 2. As suggested in the last column, the effect of these factors on reliability can be minimized by improving the guidelines that describe the correct input data and application of the CPM. Table 1. Model-Related Factors Influencing the Reliability of an Estimated Value Using a CPM. Factor Effect of Factor on Reliability Measures Chapter and Supplemental Resources Bias Variance Repeatability Published CPM (not locally calibrated) Baseline (likely present when uncalibrated CPM applied to local site) Baseline (based on sample size used to develop CPM) Baseline (may be highly repeatable if CPM is well documented and requires readily available data) Partially addressed in Bahar and Hauer (2014) Add EB Method More reliable More reliable No effect Partially addressed in Hauer (1997) Add CMFs from other sources (consistent with base conditions) No effect; possibly less reliable Less reliable No effect if CMFs are well documented. Less reliable if CMFs are not well documented Partially addressed in Chapter 3 of this Guide; partially addressed in NCHRP Project 17-63 (Carter et al., forthcoming) Add local calibration by developing a calibration factor More reliable Less reliable No effect Partially addressed in Bahar and Hauer (2014) Use jurisdiction- specific base condition SPFs More reliable if estimated with appropriate statistical procedures More reliable if estimated with appropriate statistical procedures Less reliable since different analysts may develop jurisdiction- specific SPFs with different jurisdiction data samples Partially addressed in Srinivasan and Bauer (2013); Partially being addressed in ongoing NCHRP Project 17-93 (Updating Safety Performance Functions for Data-Driven Safety Analysis) Use CMFunctions instead of CMFs More reliable More reliable More reliable Partially addressed in NCHRP Project 17-63 (Carter et al., forthcoming) Update CPM to correct for changes over time More reliable More reliable No effect Partially addressed in Wood et al., (2013); Partially being addressed in ongoing NCHRP Project 17-93 (Updating Safety Performance Functions for Data-Driven Safety Analysis)

7 Table 2. Application-Related Factors Influencing the Reliability of an Estimated Value Using a CPM. Factor Effect of Factor on Reliability Measures Chapter and Supplemental Resources Bias Variance Repeatability Error in estimated input values Less reliable No effect Less reliable if error is due to poor description of input value (units, source, etc.) Chapter 4 of this report Uncertainty in estimated input values Less reliable Less reliable Less reliable if error is due to poor description of required precision of input Topic for future research Use of CMFs that are inconsistent with SPF base conditions Less reliable Less reliable Less reliable if CMFs are not well documented Chapter 3 of this report Relative impact of a CPM variable Less reliable for more influential variables not included Less reliable for more influential variables not included No effect if variable has relatively low impact on CPM. Less reliable for more influential variables not included Chapter 5 of this report Omitted variables in CPM Less reliable Less reliable Less reliable if CMF omitted variable is prominent in application sites Chapter 5 of this report Missing application data Less reliable Less reliable Less reliable for more influential variables and depending on similarity between application and model estimation sites Chapter 5 of this report Estimating CPMs for rare crash types Less reliable Less reliable Less reliable because different methods may be used to estimate the predicted values Chapter 6 of this report Application exceeds the range of an input variable Less reliable Less reliable Less reliable if error is due to poor description of range associated with each input variable Partly addressed in Chapter 7 of this report Application site has characteristics that are not represented by CPM Less reliable Less reliable Less reliable if error is due to poor description of conditions to which CPM applies Partly addressed in Chapter 8 of this report Research Objectives and Approach The objectives of this study are to: ï· Develop guidance for the quantification of the reliability of crash prediction models (including crash modification factors, crash modification functions and safety performance functions) for practitioner use; ï· Develop guidance for user interpretation of model reliability; and ï· Develop guidance for the application of crash prediction models accounting for, but not limited to assumptions, data ranges, and intended and unintended uses.

8 This was a two-phase effort. Phase I included a kickoff call, reviewing and assimilating literature and state of the art, analysis of relevant resource data and identify gaps, develop work plan, develop an annotated outline of the guidance document, develop an annotated outline of the communications plan, develop an interim report, and a face to face interim meeting. Phase II involved the development of the guidance document that accompanies this conduct of research report, development of the communications plan, and final documents and reports. Following a kickoff call with the panel early in this study, the project team decided to conduct a survey of practitioners to obtain insight into their priorities and concerns and used these to identify the research issues to be addressed in Phase II (see Chapter 2 for a summary of the results from the survey). The project team conducted a limited literature review in the beginning of the project, but most of the literature review was specific to the research issues that were addressed in the context of the plans for Phase II. Scope of This Report and Audience Based on a survey conducted in the beginning of NCHRP Project 17-78, this report focuses on the following model-related and application-related factors: ï· Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Mismatch between CMFs and SPF Base Conditions (Chapter 3) ï· Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on Error in Estimated Input Values (Chapter 4) ï· Procedures for Quantifying the Reliability of Crash Prediction Model Estimates with a Focus on How the Number of Variables in CPM Affects Reliability (Chapter 5) ï· Reliability Associated with Using a CPM to Estimate Frequency of Rare Crash Types and Severities: Overview of the Problem with Possible Solutions (Chapter 6) ï· Reliability Associated with Predicting Outside the Range of Independent Variables: Problem Description and Procedure for Practitioners (Chapter 7) ï· Reliability Associating with Predictions Using CPMs Estimated for Other Facility Types: Problem Illustration with Possible Solutions (Chapter 8) Chapter 2 provides a summary of the results of a survey conducted in Phase 1 to obtain insight into their priorities and concerns. From Chapter 3 through Chapter 8, each chapter provides a description of the problem along with procedures for quantifying the reliability or possible solutions to improve the reliability. The discussion for some of the topics is more involved, e.g., the topic in Chapter 3 involving mismatch between CMFs and SPF base conditions, is very detailed and involved. In this report, each chapter is self-contained thus enabling the users to consult the chapter suitable to their need. The primary audience for this report is practitioners and researchers. These readers should be familiar with the Chapter 3: Fundamentals and Part C Chapters of the AASHTO Highway Safety Manual 1st Edition (AASHTO, 2010). As much as possible, this report does not use complicated jargon. However, some of the procedures described in this document are very detailed and involved and may be require a high level of statistical understanding. Accompanying this report is the following NCHRP Research Report 983: Reliability of Crash Prediction Models: A Guide for Quantifying and Improving the Reliability of Model Results. The rest of this Chapter is a note about goodness-of-fit measures since they have been extensively used in this report. A Note about Goodness-of-Fit Measures Here is a brief note about different goodness-of-fit (GOF) measures used in this report and in other reports and papers.

9 For linear regression models, the R2 statistic, the proportion of the total variation in the dependent variable explained by the model, is the GOF measure typically reported. For Generalized Linear Models (GLMs), a direct analogous R2 statistic is not available. However, there are other GOF suggested to assess how well a GLM fits the data. This document uses many such GOF statistics in the different chapters: Increased root mean square error. The increased root mean square error measure is an indication of overall reliability of the CPM estimate, i.e., the estimated average crash frequency. It is a measure of the uncertainty added to the estimate due to the use of a biased value of the overdispersion parameter, predicted crash frequency, or both. ð , ð ð . withÂ ð ð ð ð , ð , Â ð ð ð , Â where Ïe,I = increased root mean square error e = error in predicted crash frequency ð = absolute difference of the change in variance of the predicted value kreported = reported overdispersion parameter for CPM kp, true = predicted true overdispersion parameter Np = predicted crash frequency from CPM, crashes/year Np,true = predicted true crash frequency, crashes/year Coefficient of Variation for the Increased Root Mean Square Error. The increased root mean square error can be normalized by dividing it by the predicted true crash frequency. This division produces a coefficient of variation that facilitates the relative comparison of alternative CPMs and alternative applications of a given CPM. ð¶ð ð ,ð , where: CVI = the coefficient of variation for the increased root mean square error Ïe,I = increased root mean square error and all other variables are as previously defined Np,true = predicted true crash frequency, crashes/year A coefficient value of 0 indicates that there is no bias or additional uncertainty in the predicted crash frequency obtained from the CPM. As the coefficient of variation value increases, the prediction becomes less reliable because the bias has increased, the uncertainty has increased, or both. Values in excess of 0.20 are considered unreliable for most applications. Percent bias. The percent bias measure provides an indication of the relative error in the prediction. This measure is expressed as a percentage because the magnitude of the error is often correlated with the true (i.e., unbiased) value of the estimate. This characteristic facilitates the relative comparison of alternative CPMs and alternative applications of a given CPM. The percent bias measure is computed using the following equation: Equation 2 ðµððð 100 ð ð ,ð ,

10 where Bias = percent bias in reported value; Np = predicted crash frequency from CPM, crashes/yr; and Np,true = predicted true crash frequency, crashes/yr; Root mean squared difference (RMSD). The root mean squared difference is a measure of the variability of the difference between the CPM prediction with error in input values and the CPM prediction with the estimated values bias. Equation 3 ð ððð¡ ðððð ððð¢ðððð ð·ððððððððð â ðð ð¸ð· ðð ð¸ð·ð . Note that if expressing on a per year basis, divide the RMSD by the number of years of data. Mean absolute difference. The mean absolute difference is a measure of the average absolute difference between the CPM prediction with error in input values and the CPM prediction with the estimated values. Equation 4 ðððð ð´ðð ððð¢ð¡ð ð·ððððððððð â |ðð ð¸ð· ðð ð¸ð· |ð Note that if expressing on a per year basis, divide the mean absolute difference by the number of years of data. Spearmanâs correlation coefficient. The Spearmanâs correlation coefficient is used to compare Network Screening rankings using the CPMs with measurement error to the ranking using the CPM with the original estimated values. Note that the same sites must be represented on both ranked lists. Equation 5 ðððððððð ð ðððððððð¡ððð ððððððððððð¡ ð âð 1 6â ð ððð ð ðððð ð 1 where: Rankerror = rank number using CPM with measurement error Rankest = rank number using CPM with estimated value(s) n = number of sites in ranked list Extreme value. The extreme value is a measure of the magnitude of a high value of the mean absolute deviation. It is recommended to use the 85th percentile value although any percentile value desired by the analyst may be selected. The calculation is based on an assumed Gamma distribution of the values of the absolute difference and uses the methods of moments to determine the alpha and theta parameters of the Gamma distributions using the following equations: Equation 6 ðððâð ðððð ð´ðð ððð¢ð¡ð ð·ðððððððððð ððð¡ ðððð ððð¢ðððð ð·ððððððððð

11 Equation 7 ð¡âðð¡ð ð ððð¡ ðððð ððð¢ðððð ð·ððððððððððððð ð´ðð ððð¢ð¡ð ð·ððððððððð The value of the absolute difference at the desired percentile level can be determined using online calculators such as https://homepage.divms.uiowa.edu/~mbognar/applets/gamma.html, or using statistical textbooks. For example, using the 85th percentile and estimated Gamma distribution parameters, the analyst estimates the value of Absolute Deviation that 85% of sites would be expected to be less than or equal to, or conversely, the value that 15% of sites may exceed. Modified R2. Fridstrom et al., (1995), introduced a modified R2 value. This GOF measure subtracts the normal amount of random variation that would be expected if the SPF were 100 percent accurate. Even with a perfect SPF, some variation in observed crash counts would be observed due to the random nature of crashes. As a result, the amount of systematic variation explained by the SPF is measured. Larger values indicate a better fit to the data in comparing two or more competing SPFs. Values greater than 1.0 indicate that the SPF is over-fit and some of the expected random variation is incorrectly explained as the systematic variation. Equation 8 ð â ð¦ ð¦ â ðâ ð¦ ð¦ â ð¦ where: ð¦ = observed counts. ð¦ = predicted values from the SPF. ð¦= sample average. ð = ð¦ -ð¦ . Mean absolute deviation (MAD). The mean absolute deviation is a measure of the average value of the absolute difference between observed and predicted crashes. Equation 9 ðð´ð· â |ð¦ ð¦ |ð where: ð¦ = predicted values from the SPF. ð¦ = observed counts. n = validation data sample size. Overdispersion parameter. The dispersion parameter, f(k), in the negative binomial distribution is reported from the variance equation expressed as follows: Equation 10 ððð ð ð¸ ð ð ð ð¸ ð Or, Equation 11 ð ð ððð ð ð¸ ðð¸ ð where:

12 f(k) = estimate of the dispersion parameter. Var{m} = estimated variance of mean crash rate. E{m} = estimated mean crash rate. The estimated variance increases as dispersion increases, and consequently the standard errors of estimates are inflated. As a result, all else being equal, an SPF with lower dispersion parameter estimates (i.e., smaller values of f(k)) is preferred to an SPF with more dispersion. Note that f(k) can be specified as a constant or as a function of site characteristics. When f(k) is a constant or a constant per length this may be used to easily compare multiple CPMs. Coefficient of variation of a calibration factor. The CV of the calibration factor is the standard deviation of the calibration factor divided by the estimate of the calibration factor as shown in the following equation. Equation 12 ð¶ð ð CC Where: CV = coefficient of variation of the calibration factor. C = estimate of the calibration factor. V(C) = variance of the calibration factor, can be calculated as follows: Equation 13 ð ð¶ â ð¦ ð â ð¦â ð¦ Where: ð¦i = observed counts. ð¦ = uncalibrated predicted values from the SPF. k = dispersion parameter (recalibrated, as shown above). Measures related to cumulative residual plots. Cumulative residual plots (CURE plots) provide a means of assessing the GOF of a model (Hauer and Bamfo, 1997). Unlike most of the other GOF statistics that look at overall model fit, the CURE plot is primarily aimed at assessing the adequacy of the functional form of a specific independent variable (conditional on other variables being in the model). CURE plots have become popular more recently especially with the development of the following tool by FHWA: The Calibrator: An SPF Calibration and Assessment Tool: User Guide (Lyon et al., 2016). Here is a brief overview of CURE plots based on the discussion in Srinivasan and Bauer (2013), Hauer (2004), and Hauer and Bamfo (1997). Hauer (2004) recommends the use of CURE plots to obtain further insight into whether the selected appropriate functional form was reasonable. Following is a discussion of CURE plots and how they can be used. Suppose the goal is to develop a CURE plot to determine if the functional form used for AADT is appropriate. The first step is to create a data file that includes for each observation (i.e., segment or intersection) the AADT value and the residual from the SPF (the residual is the difference between the observed number of crashes and the predicted number of crashes from the SPF). Then this file is sorted in increasing order of AADT and the cumulative residuals are computed for each observation. The plot of the cumulative residual versus AADT is called a CURE plot. Figure 1 is an example of a CURE plot from Hauer and Bamfo (1997).

13 Figure 1. Example CURE Plot (Source: Hauer and Bamfo, 1997) The data in the CURE plot are expected to oscillate about 0. If the cumulative residuals are consistently drifting upward within a particular range of AADT, then it would imply that there were more observed than predicted crashes by the SPF. On the other hand, if the cumulative residuals are drifting downward within a particular range of AADT, then it would imply that there were fewer observed than predicted crashes by the SPF. Hauer and Bamfo (1997) also derived confidence limits for the plot ( ï³2ï± ) beyond which the plot should go only rarely. Figure 2 from Hauer and Bamfo (1997) shows the CURE plot from Figure 1 but with its confidence limits. This is an example of an accepTab CURE plot where the plot stays well within the confidence limits.

14 Figure 2. CURE Plot with Confidence Limits (Source: Hauer and Bamfo, 1997) In the context of CURE plots, it is important to recognize that the plot is not only a reflection of the functional form of the particular explanatory variable, but also whether other relevant explanatory factors have been included in the model in an appropriate form. Further discussion of this issue can be found in Srinivasan and Bauer (2013). In addition to these measures, other GOFs have been used. For example, the Scaled Deviance and the Pearson chi-square are two traditional GOF statistics calculated in GLMs (Wood, 2002). In addition, many pseudo R2 statistics have been used by statisticians and other safety researchers. Examples include the pseudo R2 based on the log-likelihood ratio, weighted (variance stabilizing) residuals, Freeman-Tukey transformation residuals, and the overdispersion parameter of the SPF. Further discussion of pseudo R2 used in the highway safety field can be found in Fridstrom et al., (1995), Wood (2002), and Miaou (1996). The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are examples of measures that provide an assessment of the relative quality of specific models, for a given dataset. AIC and BIC penalize models based on the number of parameters (coefficient estimates). In other words, they deal with the trade-off between the GOF of the model and the complexity of the model. Therefore, AIC and BIC are statistics typically used as model selection criteria rather than for GOF assessment. Further discussion of AIC and BIC can be found in Burnham and Anderson (2004). There is still more research needed to better understand the limitations of the different GOF measures, and the implications of using one GOF instead of another. More discussion about this topic in the context of fitting and updating CPMs can be found in Connors et al., (2013).