National Academies Press: OpenBook
« Previous: 6 Data Linkage to Supplement Health Surveys
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

7

Combining Multiple Data Sources to Measure Crime

The United States has two major collections of national statistics about crime. The first is the Uniform Crime Reporting (UCR) Program administered by the Federal Bureau of Investigation (FBI), which compiles data from law enforcement agencies. The second is the National Crime Victimization Survey (NCVS), an annual sample survey administered by the Bureau of Justice Statistics (BJS) that asks persons aged 12 and older in a randomly selected set of households about their experiences with crime. James and Council (2008); two National Academies of Sciences, Engineering, and Medicine reports (NASEM, 2016a, 2018); Lohr (2019); and Morgan and Thompson (2022) provided overviews of these two data collections.

Table 7-1 displays the types of crime included in the UCR and the NCVS. Because UCR statistics are compiled from law enforcement agency submissions, they include only crimes that are reported to the police and thus undercount the total numbers of crimes against residents. The NCVS asks persons about their victimization experiences, and thus has information about criminal victimizations that are not reported to the police as well as those that are reported. Because the NCVS is a household survey, though, it does not measure crimes against businesses and organizations (which are measured in the UCR if known to the police); it also does not measure crimes against persons living in institutions (such as nursing homes or prisons), persons experiencing homelessness, and children under age 12. Furthermore, NCVS respondents may forget or fail to mention some of the victimizations they experienced. And some crimes, such as corporate or environmental crime, are not measured in either data source.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

TABLE 7-1 Crimes Included in the Uniform Crime Reporting (UCR) Program and the National Crime Victimization Survey (NCVS)

Represented in NCVS Not Represented in NCVS
In UCR Crimes against noninstitutionalized U.S. residents aged 12+ that are reported to police and are measured by both data sourcesa Crime types measured in UCR but not measured in NCVS (e.g., homicide)
Crimes reported to police and measured in UCR against:
  • Businesses and organizations
  • People out of scope for NCVS (e.g., children aged 0–11, people in institutions, people experiencing homelessness, non-U.S. residents)
  • NCVS respondents who do not report the crime on the survey
Not in UCR Crimes against noninstitutionalized U.S. residents aged 12+ that are measured in NCVS and not reported to police (or not measured in UCR) Crime types measured in neither UCR nor NCVS (e.g., fraud against government agencies, environmental crimes)
Unrecognized crimes (e.g., romance scams in which victims and law enforcement are unaware a crime has been committed)
Crimes not reported to police against:
  • Businesses and organizations
  • People out of scope for NCVS
  • NCVS respondents who do not report the crime on the survey

SOURCE: Panel generated.

NOTE: This table gives the classification under the idealized UCR in which all law enforcement agencies submit data to the FBI.

a Hanson (2021) and https://ucr.fbi.gov/nibrs-in-brief listed the 52 Group A Offenses and 10 Group B Offenses that are measured by NIBRS. The major categories for the Group A Offenses are animal cruelty, arson, assault offenses, bribery, burglary, counterfeiting/forgery, destruction/vandalism of property, drug/narcotic offenses, embezzlement, extortion/blackmail, fraud offenses, gambling offenses, homicide offenses, human trafficking offenses, kidnapping/abduction, larceny/theft offenses, motor vehicle theft, pornography/obscene material, prostitution offenses, robbery, sex offenses, stolen property offenses, and weapon law violations. The NCVS measures rape and sexual assault, robbery, aggravated and simple assault, burglary, theft, motor vehicle theft, and pocket-picking/purse-snatching. In addition, NCVS supplements have measured fraud, identity theft, and school crime for various years. A new NCVS questionnaire is expected to be phased in during 2024 (Truman & Brotsos, 2022).

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

There is therefore great potential for using multiple data sources to enhance statistics about crime. The NCVS, as a household probability survey, can be blended with other sources using methods such as data linkage and small area estimation. While some challenges are similar to challenges in the areas of income and health statistics (for example, coverage of only the noninstitutionalized population), others are unique to the NCVS because of the relative rarity of crime and the sensitive nature of the information collected.

The UCR Program presents a distinct set of data-combination challenges. It is, in essence, a cooperation between states and the federal government of the same type as the National Vital Statistics System (see Chapter 4). Individual law enforcement agencies submit data on crimes within their jurisdictions to state UCR programs, which, after data processing, forward them to the FBI.1 The UCR is intended to be a census of incidents known to the more than 18,000 law enforcement agencies in the United States. It thus has the potential to produce detailed information about crime for small geographic and demographic subpopulations, but challenges include missing data (some agencies do not submit data or submit data for only part of the year), ensuring the quality of the information collected and reported, and aligning the information with data from other sources.

The National Academies provided a comprehensive review of and a vision for the future of crime statistics, with an emphasis on crime classification and measurement (NASEM, 2016a, 2018). Box 7-1 reproduces some conclusions and recommendations from those reports. This chapter examines developments that have occurred since those National Academies reports and, in particular, the potential of combining data sources for measuring crime, as discussed in the workshop session Measuring Crime in the 21st Century.

Sections 7.1 and 7.2 describe the UCR Program and the NCVS, respectively, and identify challenges that might be addressed through use of multiple data sources. Section 7.3 outlines other national data collections about crime, and Section 7.4 explores the potential for obtaining more timely crime statistics directly from police department databases and websites. Sections 7.5 and 7.6 describe some initiatives for combining data sources to study crime, and Section 7.7 discusses possible future directions for using multiple sources to improve the quality and equity of data about crime.

___________________

1 Some law enforcement agencies submit their data directly to the FBI instead of through state programs.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

7.1 THE UNIFORM CRIME REPORTING PROGRAM

The UCR Program combines data voluntarily submitted by states and law enforcement agencies and has thus relied on multiple data sources since its inception in 1930. From 1930 to 2020, UCR statistics were based on data collected in Summary Reporting System (SRS) format. Law enforcement agencies reporting to the SRS provided monthly counts of “Part I offenses” (homicide, rape, robbery, aggravated assault, burglary, larceny/theft, and motor vehicle theft) occurring in their jurisdictions and the number cleared by arrest.2 In the 1960s, the FBI expanded the UCR

___________________

2 See FBI (2013, p. 110) for the “Return A” form used to collect data for the SRS. In addition to crime counts, the form also collected monthly breakdowns of these statistics by characteristics such as weapons used. The original seven Part I offenses were defined by the

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

data collection to encompass more detailed information about homicides, including age, sex, and race of the victim and offenders, circumstances of the crime, weapons used, and relationship between victim and offender. The Supplementary Homicide Report data were incident based—instead of aggregate counts, details were collected separately for each homicide incident. For other crimes, however, the SRS collected only summary statistics.

___________________

Uniform Crime Reporting Committee, which established the form of the UCR (International Association of Chiefs of Police, 1929). Arson was added as a Part I offense in 1979, and two human trafficking offenses were added in 2013. Volumes of Crime in the United States through 2020, however, reported statistics only for the seven original Part I crimes (the only major modification of the original definitions occurred when the definition of rape was revised in 2013). For Part II offenses such as simple (non-aggravated) assaults, fraud, vandalism, and drug abuse violations, only arrest data were collected.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

In 2016, the FBI announced that the SRS would be retired on January 1, 2021, and that all UCR data submissions from 2021 onward would be through the National Incident-Based Reporting System (NIBRS). NIBRS began in the late 1980s with the goal of obtaining more detailed information about crime, and it extends incident-based data collection to all of the 52 types of offenses measured. Through 2020, the FBI encouraged NIBRS submission but allowed law enforcement agencies and state UCR programs to report UCR data in either SRS or NIBRS format; NIBRS data were converted to SRS format to compute national statistics for the amount of Part I crime. Beginning in 2021, SRS data were no longer accepted.

Table 7-2 outlines the differences between the SRS and NIBRS data collections and describes the types of detailed information measured in NIBRS. The additional variables measured in NIBRS allow its crime information for demographic groups to be combined or contrasted with information from other sources, enabling “the analysis of data in proper geographic, demographic, sociological, and economic context” as called for in Conclusion 2.1 of a National Academies 2018 report on Modernizing Crime Statistics (NASEM, 2018; see Box 7-1). Jarvis (2015) and Hanson (2021) described other advantages of NIBRS data relative to SRS data.

Lauritsen (2022a), Smith (2022), Martinez (2022), and Veitenheimer (2022) emphasized the advance represented by a national dataset of police-reported crime containing information beyond mere crime counts. The additional details about incidents allow tabulations by victim and offender demographics, relationship between victim and offender, and other characteristics described in Table 7-2. Veitenheimer (2022) commented that the transition to NIBRS allows more focus “on the contextual information and the characteristics of certain crimes like homicide or robbery or burglary or fraud or drug cases or sexual assaults, to look at things like victim and offender demographics, who’s committing crimes against who, what are the circumstances behind some of the aggravated assaults that have occurred, what’s the makeup of drug seizures that have happened or are happening for drug crime, what sorts of weapons or how often have weapons been involved in the commission of crimes.”

As an example of the potential for exploring contextual information and characteristics of crime, Smith et al. (2018) highlighted how NIBRS data could be used to better understand sexual violence. Martin (2021) created an interactive report on sexual assault statistics for 15 states (those certified to report all of their 2019 crime data in NIBRS format): users can click on a state to view statistics about the percentage of violent victimizations that involved a sexual assault; incident characteristics such victim-offender relationships, demographics of victims and offenders, and weapons used; and rates of sexual assault by location type, time of day, and victim age,

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

TABLE 7-2 Uniform Crime Reports Estimates under the Summary Reporting System and the National Incident-Based Reporting System

Summary Reporting System (2020) National Incident-Based Reporting System (2021)
Crimes reported Numbers and rates for homicide, rape, robbery, aggravated assault, burglary, larceny/theft, and motor vehicle theft. Statistics included only the most serious offense for each incident. Estimated numbers and rates for 52 types of crime. Up to 10 offenses are counted for each incident.
Law enforcement agency coverage 15,875 of the 18,623 agencies submitted data for at least three months of the year. 11,333 of the 18,806 agencies submitted data for at least three months of the year.
Population coverage 97 percent of U.S. population lived in an area served by at least one reporting agency. 65 percent of U.S. population lived in an area served by at least one reporting agency.
Geographic detail Estimates reported for all states and metropolitan statistical areas, plus tabulations within states by type of community (metropolitan statistical area, other cities, rural) and counties. Estimates for some states, metropolitan areas, and types of agencies suppressed because of insufficient data.
Incident detail Incident-level details on victim and offender demographics and relationships, weapon used, and crime circumstances collected for homicides; counts alone for other crimes. Includes date; time; location type (e,g., restaurant, home, cyberspace); age, race, ethnicity, sex of victims and offenders; relationships between victims and offenders (e.g., spouse, sibling, neighbor, employer, stranger); injuries; property loss; weapons; alcohol or drug involvement; bias motivation (if offense was recorded as being motivated by bias against race, religion, disability, ethnicity, or sexual orientation); clearance and arrest information.
Estimation procedure Data reviewed for quality and outlier detection. Imputation of crime counts for agencies with fewer than 12 months of data. National crime statistics estimated using statistical models. Details of the procedure had not yet been published as of October 2022.
Standard errors Not reported because of high population coverage. Estimates accompanied by confidence intervals generated through estimation procedure.

SOURCE: Panel generated with information from Addington (2019); U.S. Bureau of Justice Statistics (2021a); FBI (2021, 2022b); Barnett-Ryan and Berzofsky (2022); Berzofsky et al. (2022); and National Archive of Criminal Justice Data (2022).

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

race, ethnicity, and sex. None of these statistics could have been calculated from the SRS data format used through 2020.

The NIBRS data present a tremendous opportunity for enhancing understanding about crime. But they also present new challenges for calculating and interpreting estimates of crime numbers and rates. The panel identified four main challenges:

  1. Missing Data. In 2020, the last year in which SRS-format data were accepted, UCR estimates of crime were based on data contributed by 15,875 of the 18,623 law enforcement agencies in the country (85%). The agencies submitting at least three months of data served areas representing about 97 percent of the U.S. population—close to full coverage.3

    For the 2021 UCR estimates, the first to be computed entirely from NIBRS data, agency participation and population coverage were much lower. The 2021 UCR estimates were based on data submitted by 11,333 of 18,806 law enforcement agencies (60%), serving areas that represent about 65 percent of the U.S. population (FBI, 2022b, p. 19; Berzofsky et al., 2022, p. 4). In some states, all law enforcement agencies submitted 2021 NIBRS data; in other states, including California, Pennsylvania, and Florida, fewer than 3 percent of agencies submitted NIBRS data. Only 62 of the 87 agencies serving populations of 250,000 or more participated in the 2021 NIBRS; nonparticipating agencies included the New York City and Los Angeles Police Departments (FBI, 2022a; BJS, 2022b).

    Thus, unlike the SRS data used for the UCR in 2020 and previous years, NIBRS has large amounts of missing data. The agencies submitting NIBRS data in 2021 were essentially a convenience sample from the population of law enforcement agencies.4 The FBI

___________________

3 Source: FBI Crime Data Explorer, https://cde.ucr.cjis.gov and Barnett-Ryan and Berzofsky (2022). Only agencies with at least three months of submitted data were used for estimating crime statistics. Note that statistics in the Crime Data Explorer are revised as new data come in and may differ slightly from statistics in this report, which were retrieved between April and October, 2022. The coverage statistics for the SRS include agencies that reported data for only part of the year; their data for missing months were imputed. National Academies (2016a, p. 47) commented that because of these partial reporters, the actual coverage of the SRS has been lower than claimed.

4 Note that, in 2013, the FBI and BJS attempted to obtain a representative sample of agencies submitting data to NIBRS (U.S. Bureau of Justice Statistics, 2021a). They selected a probability sample of 400 law enforcement agencies from the set of agencies that had submitted SRS data in 2011. The goal was to expedite the sampled agencies’ transition to NIBRS; if all 400 sampled agencies submitted NIBRS data, then the FBI would be able to calculate unbi-

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
  • estimated national crime statistics for 2021 using data from the participating agencies (FBI, 2022b). Barnett-Ryan and Berzofsky (2022) and Berzofsky et al. (2022) gave nontechnical summaries of the estimation procedures used for 2021, which attempt to compensate for nonparticipating agencies’ missing data through statistical modeling, weighting, and imputation.

  • Uncertainty Estimates for National and State Crime Statistics. One big change for 2021 UCR statistics is the addition of confidence intervals for the estimates. Through 2020, the FBI reported counts alone for the UCR, with no standard errors or other measures of uncertainty; Berzofsky et al. (2022, p. 6) stated that confidence intervals for SRS data were unnecessary because of the high coverage rate of the SRS.5 For 2020, for example, the FBI reported 21,570 homicides and 921,505 aggravated assaults with no measures of uncertainty.6 In 2021, however, because of the high number of nonreporting agencies for NIBRS, national estimates were accompanied by confidence intervals and state-level estimates were produced only for states with high NIBRS participation. The FBI estimated that there were 22,900 homicides in 2021, with 95 percent confidence interval [21,300, 24,600] (FBI, 2022b, p. 3).

    Because the agencies participating in NIBRS were not from a probability sample, the validity of these estimates and confidence intervals relies on how well the statistical model accounts for missing data. The panel could not evaluate the quality of the 2021 UCR estimates or the NIBRS estimation procedures because technical documentation, with details of the modeling process, had not yet been published as of October 2022. Piquero et al. (2022) stated that technical documentation will be released at a later date.

___________________

ased estimates of crime: the agencies that were already submitting NIBRS data in 2011 would represent themselves, and the probability sample of 400 agencies would represent the rest of the population. As of August 2021, however, only 210 of the 400 agencies in the probability sample were certified for the NIBRS program (https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/NCS-X_Sample_Agencies.pdf). Because of the high and nonrandom nonresponse, the 210 agencies do not form a representative sample of the agencies that were reporting data in SRS format in 2011.

5 With almost all of the population served by an agency that reported SRS data for at least part of the year, the missing data would have little effect on estimates. If confidence intervals had been produced for SRS data, accounting for the imputation used for agencies reporting fewer than 12 months of data, they would likely have been narrow.

6 Table 1, Crime in the United States, 2020, https://cde.ucr.cjis.gov (data reported on September 30, 2021).

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
  1. Estimating Year-to-Year Changes in Crime. The switch from SRS to NIBRS created a discontinuity in the time series for crime statistics from the UCR. The method used to estimate year-to-year changes in crime from 1930 to 2020, comparing annual crime totals from the SRS data, could not be used to estimate the change in crime from 2020 to 2021 because the SRS system was retired (Rosenfeld, 2022).

    Changes in crime from 2020 to 2021 were estimated by applying the NIBRS estimation method to the agencies that submitted NIBRS data in 2020 (FBI, 2022b). But 2020 had more missing NIBRS data than 2021 and, consequently, estimates from both years have low precision.7 The FBI (2022b, p. 1) stated that most changes in crime between 2020 and 2021 were not statistically significantly different from zero, but “that the main contributor to that finding is the large amount of variation—both random and systematic—that is measured in the 2020 data due to low coverage of participating agencies.” The estimates need higher precision to detect changes in crime.

  2. Evaluating Measurement Quality. With the increased amount of information collected in NIBRS comes increased potential for missing data and measurement error on each item collected. Lynch (2018, p. 449) commented on the contrast in studies of data quality for survey and administrative data: “This contrast is apparent in the discussion of crime trends in the NCVS and the UCR where the survey had extensive discussions of sampling and measurement error, but almost nothing was said about the UCR except perhaps for missing data.” Characteristics such as category of offense, race, ethnicity, relationship, circumstances, and possible bias motivation for the crime are provided by the law enforcement agency (in contrast to surveys, in which data elements are self-reported), and more research is needed on the uniformity and accuracy of these measures.

The panel anticipates that some of these challenges will be resolved as more law enforcement agencies submit NIBRS data. The 2021 crime estimates for states with high NIBRS coverage, such as South Dakota

___________________

7 In 2020, only 9,993 law enforcement agencies, covering 53 percent of the population, submitted data in NIBRS format (https://cde.ucr.cjis.gov). The FBI (2022b, p. 3) estimated from the 2020 NIBRS data that there were 22,000 homicides in 2020, with 95 percent confidence interval [21,000, 23,000].

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

and Oregon, have narrow confidence intervals because little estimation is needed—almost all law enforcement agencies in those states submitted data. It will be important to continue monitoring NIBRS coverage in the coming years.

In addition, alternative data sources could be used to address some of the challenges in producing accurate crime estimates. Possible paths include using data obtained directly from police departments to impute crime statistics for nonparticipating law enforcement agencies or to study measurement quality (see Sections 7.4 and 7.6).

CONCLUSION 7-1: The National Incident-Based Reporting System (NIBRS) provides details about each crime incident that were not available in the previous Summary Reporting System of the Uniform Crime Reports. NIBRS represents an important step in the production of detailed and accurate crime statistics. But the transition to NIBRS is still under way and variations in measurement and data reporting across jurisdictions need further study.

7.2 NATIONAL CRIME VICTIMIZATION SURVEY

The NCVS began in 1972 as an effort to measure crimes that were not reported to the police, and to learn about the details of crimes from victims’ perspectives.8 When the NCVS began, the UCR was collecting only counts of offenses known to the police, without details on characteristics of victims and offenders (except for homicide). The NCVS was designed to meet four primary objectives:

  1. To develop detailed information about the victims and consequences of crime;
  2. To estimate the numbers and types of crimes not reported to the police;
  3. To provide uniform measures of selected types of crimes; and
  4. To permit comparisons over time and types of areas (BJS, 2021b, p. 4).

The NCVS was launched in part to provide an independent measure of the crime statistics in the UCR, and the set of crimes measured by NCVS parallels, but does not exactly coincide with, those collected in the SRS of the UCR (NASEM, 2016a, p. 51). However, NCVS definitions of crimes

___________________

8 The name “National Crime Victimization Survey” was adopted in 1993. Before that, it was called the “National Crime Survey.” In this report, the acronym NCVS is used to refer to the entire data collection.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

differ from the definitions in both the SRS and NIBRS, and measurement of characteristics of victims, offenders, and incidents also differ in the sources (see Table 7-1).

The NCVS asks respondents about all victimization incidents in the measured categories that they have experienced, whether reported to the police or not. Follow-up questions ask details about victimizations such as relationship to offender, location of the incident, injuries, financial losses, and whether the victimization was reported to police. From the outset, NCVS data have shown that the UCR Program fails to capture substantial amounts of crime—in 2019 and 2020, about 40 percent of violent crimes and one-third of property crimes were reported to the police (Morgan & Thompson, 2021).9

In recent years, annual estimates from the NCVS have been based on about 240,000 interviews of persons aged 12 and older from a probability sample of households.10 From its inception through the early 1990s, the NCVS regularly achieved household response rates exceeding 95 percent. In the first two decades of the 21st century, however, response rates have dropped (see Figure 2-1). In 2020, about two-thirds of households eligible for the survey completed interviews. There was additional nonresponse because only 83 percent of the persons within responding households agreed to participate in an interview, giving an overall person-level response rate of 56 percent (Morgan & Thompson, 2021; Peterson & Will, 2021). Moreover, population subgroups have differing within-household response rates, with lower rates for persons under age 25 (the age group most likely to be victimized by violent crime), for males, and for Hispanic persons and those of a race other than Black or White, raising questions about possible nonresponse bias, particularly for groups underrepresented among the respondents.

The NCVS sample is designed to give precise estimates of victimization for the nation as a whole. However, the sample size is not large enough to produce reliable estimates for all subpopulations of interest, and often several years of data must be accumulated to compute estimates of violent and property crime for regions or states, as in Lauritsen (2022b). In 2016, the sample size was increased to allow production of state-level estimates for the most populous states using three years of aggregated data (BJS, 2022a).

State-level estimates have also been produced using small area models similar to those described for the Small Area Income and Poverty Estimates program (see Box 2-2). These models predicted a state’s NCVS crime counts from state-level crime counts from the UCR Program, along

___________________

9 Violent crimes include rape and sexual assault, robbery, aggravated assault, and simple (non-aggravated) assault. Property crimes include burglary, motor vehicle theft, and theft.

10https://bjs.ojp.gov/programs/ncvs

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

with information from the American Community Survey and the decennial census (Fay, 2021; Liao, Zimmer, & Berzofsky, 2021). Small area estimates represent one promising arena for combining data sources to obtain more detailed information about crime (see Section 7.5 for other potential methods for combining statistics at the area level to enhance the value of crime statistics).

7.3 OTHER NATIONAL DATA SOURCES ABOUT CRIME

The UCR and NCVS are valuable sources of data, but both have limitations in population and crime coverage (see Table 7-1). Estimates from each data source are typically published in September or October of the following year, thus lacking timely availability for studying impacts of events such as the COVID-19 pandemic or changes in laws. This section and Section 7.4 discuss other data sources that might be used, either singly or in combination with the NCVS and UCR, to enhance knowledge about crime.

National Vital Statistics System

Information about homicide is also available through the National Vital Statistics System (NVSS; see Section 4.3 and Regoeczi & Banks, 2014). Definitions of homicide differ slightly from those in the UCR, but homicide rates measured through the NVSS have closely tracked those from the UCR over time. The NVSS data allow calculation of disaggregated statistics by the victim’s state of residence, age, race, ethnicity, sex, marital status, educational attainment, cause of death, and injuries sustained. As with the UCR and the NCVS, there is a lag in publishing mortality statistics (although the data-modernization system under way is expected to speed production of statistics). Unlike the 2021 NIBRS data, the NVSS has nearly full coverage of deaths.

Other Surveys about Crime

The NCVS is the only national survey that collects data on a wide range of crimes, but other surveys ask about specific types of crime. The National Intimate Partner and Sexual Violence Survey, for example, asks about past-year and lifetime experiences with sexual violence and about the health consequences of that violence (Black et al., 2011). Individual localities also collect their own surveys about crime and perceptions of safety.

Crime measurement in surveys is sensitive to the questions asked, how the survey is administered (by an interviewer or self-administered; in person or by telephone, mail, or internet; and who else is present when the respondent answers the survey questions), and nonresponse (Cook et al.,

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

2011; Catalano, 2016). That sensitivity shows up in differences in crime estimates between the NCVS and other surveys, and can make it challenging to directly combine or compare estimates. For example, in 2011, the estimated number of rape and sexual assault victimizations from the NCVS was 244,190; the estimated number of rape victims from the National Intimate Partner and Sexual Violence Survey was 1,929,000—nearly eight times larger (U.S. Government Accountability Office, 2016, p. 25).11

Data Collected by Regulatory Agencies

Several government agencies, including the U.S. Postal Inspection Service, Federal Trade Commission, Securities and Exchange Commission, and Environmental Protection Agency, collect data about specific types of crime as part of their regulatory missions. The Federal Trade Commission (2022), for example, publishes annual national and state statistics on fraud and identity theft reports it has received. Like the UCR, these data collections include only crimes that come to the attention of the agencies, which may be a small fraction of the total crimes committed.

Data from Crowdsourcing and Webscraping

Chapter 2 describes The Guardian’s database of killings by police, assembled from reader reports and by webscraping of news stories. Other data sources include the Global Terrorism Database, a database of terrorist incidents from around the world from 1970 onward, and the Gun Violence Archive.12Lauritsen (2022a) commented that “crowdsourced data can be useful for these types of incidents because most terrorists seek publicity,

___________________

11Krebs (2014) ascribed a large part of the difference to the questions asked about sexual assault. The National Intimate Partner and Sexual Violence Survey asked nine (for women) and 11 (for men) behaviorally specific questions describing acts that are considered to be rape, and detailed specific examples of nonconsent. The NCVS asked two general screening questions about being forced or coerced to engage in unwanted sexual activity. Lohr (2019) discussed other potential reasons for the differences in the two sets of survey estimates, including survey context, response rate, and mode of data collection. The redesigned NCVS questionnaire, to be phased in during 2024, contains revised, behaviorally specific questions about rape and sexual assault (Truman & Brotsos, 2022).

12 The Global Terrorism Database (https://start.umd.edu/gtd) contains information about the date and location of the incident, weapons used, the number of casualties, and the identity of the perpetrator (when known). The Gun Violence Archive (https://www.gunviolencearchive.org) was established in Fall 2013 with the “goal to provide a database of incidents of gun violence and gun crime. To that end we utilize automated queries, manual research through over 7,500 sources from local and state police, media, data aggregates, government and other sources daily. Each incident is verified by both initial researchers and secondary validation processes.”

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

and because many shootings become known to media.” Crowdsourced and webscraped data typically require validation from external data sources but may be available earlier than data from the UCR and NCVS.

7.4 POLICE DEPARTMENT DATA

Individual law enforcement agencies are another potential source for data on crimes around the country. Most large police departments post crime statistics on their websites; some update the statistics daily. Websites of the New York City, Los Angeles, and Chicago Police Departments, for example, provide up-to-date maps and statistics on crimes in city neighborhoods.

Noting that many police departments post crime data online, Planty et al. (2018) explored the possibility of using data scraped from police department websites to supplement UCR data. They observed a number of challenges in doing so, however—primarily, a lack of uniformity in how statistics are compiled and presented. Police departments may use crime definitions for their websites that are different from those used by the UCR, with various data formats and frequencies of reporting. Police departments also have various practices for updating information. For example, a crime might be recorded as an aggravated assault on a police department website, but if the victim later dies from the injuries, UCR protocols call for the crime to be classified as a homicide. Data on a police department’s website might not be updated after the initial posting.

Despite the lack of uniformity, data from comprehensive police department websites have the advantages of timeliness and granularity, which allow for real-time analyses of crime trends. Although the UCR Program releases some updates during the year, final statistics about crime in a particular year are typically not available until September of the following year. Websites and databases may also have more precise information about locations and characteristics of crimes. But not all police departments report data online, and the set of police departments with comprehensive websites is not representative of the nation as a whole, especially given the resources required to keep such data up to date and accurate. While such sources do not provide national coverage of crimes, they may contain sufficient temporal and geographic granularity to provide richer data for specific jurisdictions.

The time lag for producing statistics reduces the usefulness of UCR data for studying effects of crime-prevention programs or external events. Many researchers were concerned about the effect of the COVID-19 pandemic on crime rates. In the absence of timely UCR data, they used crime data published online from selected cities to compare estimates of homicides and certain other violent crimes within those cities before and during the

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

pandemic (e.g., see Ashby, 2020; Boman & Gallupe, 2020; Kim & Phillips, 2021; Rosenfeld & Lopez, 2022; and Schleimer et al., 2022). As the authors acknowledged, however, these datasets are not nationally representative and crime definitions and measurements (particularly for crimes such as intimate partner violence) vary across cities.

Other researchers have created databases of offenses from publicly available crime data. Ashby (2019) described the Crime Open Database, containing 16 million offenses from 10 U.S. cities over an 11-year period. Data were obtained from open-access crime databases of each city and converted to consistent formats for geolocation. Offense lists from each city were manually mapped to NIBRS categories.

Although data from police departments do not necessarily use the same crime categories and protocols as NIBRS, they may be useful for improving the accuracy of NIBRS estimates. Berzofsky et al. (2022) did not mention using crime data external to the NIBRS system in the 2021 estimation methods, but including data obtained from nonreporting law enforcement agencies’ websites (when available, and particularly for larger agencies) in an imputation model may be helpful for improving accuracy of national NIBRS statistics.

7.5 COMBINING STATISTICS COMPUTED FROM MULTIPLE DATA SOURCES

Social science researchers have linked statistics calculated from UCR program data or from local police departments to area-level statistics calculated from the census or ACS to investigate factors associated with higher crime rates. For example, Stucky, Payton, and Ottensmann (2016) linked geocoded UCR data from the Indianapolis police department with publicly available tract-level income statistics from the ACS, finding that lower levels of income, and higher within-tract income inequality, were associated with higher UCR violent and property crime rates. Martinez (2022) discussed the wealth of information available from local police departments and medical examiner offices for studying crime, which includes narratives that provide context for many of the crimes. Martinez (2015) linked data in homicide files from police investigative units and medical examiners’ offices in five cities (Chicago, El Paso, Houston, Miami, and San Diego) with publicly available tract-level information from the decennial census to study homicide in Latino and immigrant communities.

The increased sample size for the NCVS, and the additional contextual and demographic information for NIBRS, provide new opportunities for combining subpopulation statistics from these sources with other data. For example, the NCVS small area estimation models (see Section 7.3) used crime counts from the pre-NIBRS-conversion UCR Program and state-level

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

statistics from other administrative and survey data sources to predict the amount of crime at the state level. The new information collected in NIBRS could be used to improve predictions from these models and possibly allow small area estimates to be calculated for smaller geographic areas and for demographic subpopulations.

Past comparisons of UCR and NCVS data have involved national statistics—for example, Morgan and Thompson (2021) compared the rate of crime reported to the police in the UCR and the NCVS for rape and sexual assault, robbery, aggravated assault, burglary, and motor vehicle theft. This set of crimes was measured in both systems with similar, although not identical, definitions. Demographic subpopulations could not be compared because the UCR SRS Program collected only count data; smaller geographic areas could not be compared because the NCVS sample size limited production of these estimates. When estimates for the 22 most populous states are published from the NCVS (see Section 7.2), however, NCVS state-level estimates of crimes reported to the police will be able to be compared with state-level estimates from NIBRS. The contextual information collected by NIBRS will also allow comparison of the two sources for demographic subgroups.

There is also potential for combining NIBRS statistics calculated for small geographic areas (where there is complete reporting) with data from other sources. Fouch and Martin (2022) outlined plans for a “NIBRS Data Dashboard” that will provide context for crime by linking area-level statistics about crime with statistics from sources such as the U.S. Census Bureau’s Community Resilience Estimates.13 This Dashboard would allow researchers to study relationships between crime rates estimated from NIBRS (disaggregated by victim or offender characteristics, weapon use, and other characteristics if desired) and county-level information on characteristics such as health insurance coverage, poverty, and demographics.

Data from the NCVS and NIBRS (or other law enforcement data) could also be combined to obtain larger sample sizes of crime victims. Early discussions about the NCVS explored the idea of using a dual-frame survey (Turner, 1983), in which the NCVS sample of households would be supplemented by a probability sample of persons taken from police records. Although a dual-frame approach was not adopted for the original NCVS, it may be time to explore the idea anew, in light of the availability of detailed records from NIBRS. Chromy and Wilson (2013) discussed the potential of using multiple-frame surveys to obtain larger samples of sexual assault victims. These could also be used to explore measurement differences among data sources.

___________________

13https://www.census.gov/programs-surveys/community-resilience-estimates.html

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

There are several challenges in linking crime data at the area level. Data sources may use different definitions or measurements of crime. Classification errors (i.e., what constitutes a crime and what type of crime it is) may affect data sources differently. Resolving such differences and aligning definitions may help improve the quality of crime statistics.

A second challenge in linking NIBRS or police department data at the area level involves geographic alignment of areas covered by law enforcement agencies. Many areas are served by multiple law enforcement agencies. A crime occurring on a university campus, for example, may be investigated by one or more agencies: the city police department, the state police department, the county sheriff, or the university police department. The UCR Program has protocols for avoiding duplication when two or more agencies are involved in the investigation of the same offense, but duplication may be an issue if data are obtained directly from law enforcement agencies. In addition, crime locations may be recorded differently. NIBRS and police department data count a crime incident in the state and jurisdiction where it occurred; the NCVS and NVSS count it at the victim’s residence. Discrepancies may arise for crime location when data are combined at small geographic levels.

Studying and combining statistics at the subpopulation level would allow more insights into crime and measurement of crime without linking individual records. Record linkage might be explored with data sources containing sufficient identifying information, but because of the sensitive nature of criminal victimization, it is important to prioritize confidentiality and consent issues (see Boxes 3-4 and 3-5).

7.6 LINKING INDIVIDUAL RECORDS ACROSS DATA SOURCES

There has been less linkage of survey data records with administrative records for crime than there has for income and health statistics. There have, however, been initiatives in which records from various administrative data sources are linked to provide a more comprehensive picture of some types of crime, to study the accuracy of crime data, or to provide detailed information to law enforcement agencies. This section provides a few examples.

Linkage to Add Variables About Crime Incidents, Victims, or Offenders

The National Violent Death Reporting System, a state-based surveillance system administered by the U.S. Centers for Disease Control and Prevention, links information about persons who died by suicide or homicide from death certificates, coroner or medical examiner reports, and law enforcement reports. Some states include information from additional

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

sources such as NIBRS reports, state-level Child Fatality Review team data, hospital data, and court records. Each data source contains different information about the decedent and the circumstances of the crime, and the linked data present a more comprehensive picture of homicide victims—with more than 600 data elements—than can be compiled from any single source (CDC, 2022).

Crosby, Mercy, and Houry (2016) outlined research made possible by the National Violent Death Reporting System that could not have been conducted using only a single source.14 The system allows researchers to identify violent deaths that occur in the same event, thereby enabling study of topics such as characteristics of homicides followed by a suicide.

The individual record linkage in the National Violent Death Reporting System is feasible because there are multiple data sources on violent deaths and records contain identifying information that can be used for linkage. For other crimes, however, information available from other sources may be more limited. Researchers may be able to link police records of aggravated assaults with hospital data, for example, but there may be little additional information for reports of fraud.

Datasets can be linked on several dimensions. Issues for linking crime data are similar to those for linking data about health, in which units vary across data sources (e.g., persons, doctors, hospitals, diagnoses, health care claims). The National Violent Death Reporting System links data related to each violent death. NIBRS and police department data could potentially be linked by location, incident, victim, or offender. The NCVS has some capacity for longitudinal linkage of persons or households through its panel survey design; survey participants could potentially be linked with other data sources such as the National Death Index.15

The focus of this chapter is on using multiple data sources to measure crime, but it is important to note that many researchers have linked data sources to study arrest and prosecution, correctional populations, and recidivism. For example, the Criminal Justice Administrative Records System links records across the criminal justice system to create a longitudinal dataset that follows individuals from arrest through discharge (Finlay, Mueller-Smith, & Papp, 2022). Other studies have explored linking BJS administrative datasets about persons in correctional institutions with other data sources (e.g., see Carson, 2015; Goerge & Wiegand, 2019; and Fernandez et al., 2022).

___________________

14https://www.cdc.gov/violenceprevention/datasources/nvdrs/index.html lists recent publications using the linked data.

15 The NCVS conducts seven interviews, at six-month intervals, at sampled addresses. But the residents at those addresses can change during the course of the interview series.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

Linkage to Study Crime Measurement or Law Enforcement Procedures

Data sources measure crime in different ways, and more research is needed on their measurement properties. Record linkage can be used to compare measurement of crime concepts across data sources, and thereby suggest improvements to measurement methods. Early NCVS research on the feasibility of measuring crime through a survey examined how accurately survey participants recalled details about victimization incidents by comparing responses to the survey with linked police reports for the incident (Lehnen & Skogan, 1981).

A more recent example of data linkage to study measurement is from Pattavina, Hirschel, and Scearbo (2013), who requested paper copies of original local police department incident reports for a sample of incidents in NIBRS involving intimate partner violence. They compared the results “obtained from coding information directly from the jurisdiction’s police reports with those from the same incidents that were submitted electronically to the FBI NIBRS data program” (p. 27). They found that while gender, location, offense, and injury variables were similar in the two sets of records, there was a large discrepancy in the substance use variable—NIBRS records reported substance use in 12 percent of the incidents, while the independent reviewers coded substance use in 26 percent of the incidents.

Wadsworth and Roberts (2008) linked data from the Supplementary Homicide Reports of the UCR with police department records, to study patterns of missing items and evaluate the accuracy of imputations for items missing from the Supplementary Homicide Reports but present in the police data. A similar approach could be used to study accuracy of data elements in NIBRS, perhaps for a probability sample of law enforcement agencies.

Record linkage can also provide information that can be used to inform law enforcement agency procedures. Veitenheimer (2022) described a project to inventory unsubmitted sexual assault kits in Wisconsin. The investigators developed “linkages between those kits that were inventoried and the sexual assault incidents that were reported or supposed to be reported” in NIBRS, and used that information to identify ways the state Department of Justice could “educate law enforcement agencies on […] reporting sexual assaults.”

Some individual police departments link multiple data sources, sometimes in conjunction with artificial intelligence algorithms, to allocate law enforcement resources or predict where crime is likely to occur. These predictive policing (sometimes called “data-driven policing”) programs use a variety of datasets that can include the police department’s internal datasets on crime complaints and arrests; city and state agency data about foreclosures, vacant buildings, building code violations, and transit ridership; U.S. Census Bureau data about neighborhood characteristics; information gathered from online

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

sources; data from automated license plate readers and surveillance cameras; and data purchased from brokers. In predictive policing, datasets are not combined to measure the amount of crime, but rather to develop strategies to prevent or detect it. Brayne (2017) and Ferguson (2017) described predictive-policing methods as well as equity concerns that can arise from algorithmic biases of the type described in Box 3-1. The National Academies noted that predictive policing information “has already been used by police to determine on-street activity, and may ultimately prove useful in refined statistical collection as well” (NASEM, 2016a, p. 124).

7.7 IMPROVING THE QUALITY OF CRIME DATA

The previous sections give examples of data-combination activities that have been used or are anticipated for the near future. This section discusses potential longer-term projects in which combining data sources could enhance quality of data about crime.

Improve Population and Crime Coverage

Lauritsen (2022a) argued that any depiction of crime in the United States is incomplete without reference to all crime types. She stated that the UCR and NCVS both focus on “street crimes, which, we have learned from a century of criminological research, is disproportionately found in poor areas and sociologically disadvantaged communities,” but “neglect of measuring many crime types beyond those available in the UCR and NCVS has produced an incomplete and biased picture of who commits offenses and who experiences the greatest harms from violations of the law.” The National Academies proposed an alternative crime classification that encompasses not only the violent and property crimes measured in the UCR and NCVS, but also acts involving fraud, deception, and other types of crime (NASEM, 2016a, Section 5.2; see Box 7-1).

Even for the crimes within the scope of the UCR and NCVS, both data-collection programs miss some crimes and some parts of the population. The UCR Program, of course, captures only crimes that are known to the police and forwarded to the FBI. The SRS did not allow researchers to study crime (other than homicide) for subpopulations because it did not collect information about the circumstances of the crimes. NIBRS data have more information on demographics and circumstances, but data from many law enforcement agencies are missing from the FBI statistics.

The NCVS excludes persons living in institutions such as nursing homes and prisons, as well as persons experiencing homelessness and children under age 12. Other subpopulations may be underrepresented in the survey because of nonresponse (see Section 7.2).

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

Using NIBRS and the NCVS together provides a fuller picture of crime than either source alone, with the NCVS providing information on crimes not reported to the police and NIBRS providing information on crimes against businesses and people who are out of scope for the NCVS (and potentially providing insight into possible nonresponse bias in the NCVS). But, as shown in Table 7-1, some crimes are missing from both data collections—for example, crimes against children aged 0–11 that are not reported to the police. Addington and Lauritsen (2021) described other surveys and administrative datasets that have information about intimate partner violence and violence against children but emphasized that all data sources are incomplete.16

Enable Production of Disaggregated Statistics

The Office for Victims of Crime (2013) emphasized the importance of obtaining more information about:

…the incidence and prevalence of crime victimization in historically underserved populations, as well as the barriers they face in asserting their rights as victims and gaining access to services. These populations include persons with disabilities, boys and young men of color, adults and juveniles in detention settings, youth and women who are trafficked, LGBTQ victims, undocumented immigrants, Americans who are victimized while living in foreign countries, and American Indian/Alaska Native peoples (p. 3).

Crimes against some of these underserved populations cannot currently be studied with NIBRS data because the characteristics defining the subpopulations are not measured. For example, one of the data elements in NIBRS concerns bias motivation for the incident (FBI, 2021). But, for crimes without a known bias motivation, NIBRS does not collect information on the sexual orientation or disability status of the victim, so NIBRS data by themselves cannot give estimates of crimes against the LGBTQIA+ population or against persons with disabilities.

However, the NCVS can provide estimates of victimizations for persons in those groups, because it began asking all respondents about sexual orientation, gender identity, and disability status in 2016. Harrell (2021) found that, from 2017–2019, the rate of violent victimization against persons with disabilities was about four times the rate against persons without disabilities. Examining data from 2017 through 2020, Truman and Morgan (2022, p. 1) found that “rates of violent victimization were significantly higher for persons aged 16 or older who self-identified as lesbian, gay, or

___________________

16 The National Data Archive on Child Abuse and Neglect (https://www.ndacan.acf.hhs.gov/index.cfm) collects microdata from various sources that concern violence against children.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

bisexual than for those who identified as straight” and that the rate of violent victimization against persons who self-identified as transgender was 2.5 times higher than the rate against persons who self-identified as cisgender. These differences in victimization rates could not be studied with NCVS data before 2016.

It may be possible to study victimization in other historically underrepresented or unidentified groups by linking data with other sources. For example, Nixon et al. (2017) linked records from an Australian registry of persons with intellectual disabilities with a statewide police database, to study criminal charges and victimizations against persons in the registry.

Even when attributes are collected, however, they may be missing or subject to measurement error. NIBRS collects data on race and ethnicity of victims and offenders, when available, but these are input by law enforcement personnel and thus may differ from the race or ethnicity that would be self-reported. Linking NIBRS data with other sources, as done by Arias, Heron, and Hakes (2016) for death certificate information (see Section 3.5), could provide information about the accuracy of race and ethnicity information, as well as other characteristics, in NIBRS data.

Improve Cooperation for Data Collection

Most of the examples of combining data sources in Chapters 5 and 6 focus on linking survey data with administrative records. For crime, however, one of the two major data sources is itself a blending of data contributed by states and individual law enforcement agencies. In that respect, the UCR Program resembles the NVSS (see Section 4.3). While the NVSS is also voluntary and in early years only a few states participated, the system now collects data in standardized form from every state.

The UCR Program, however, has historically lacked the level of personnel or financial resources that enabled the NVSS to achieve nearly complete population coverage, though funds were available to help law enforcement agencies convert to NIBRS. Despite its limited resources, the UCR Program managed to attain a high level of cooperation for the SRS used through 2020. But converting police department data systems to collect and report the more detailed information required by NIBRS is expensive and the program requires local personnel to have expertise in the data-collection and reporting protocols (Smith, 2017). Barnett-Ryan and Swanson (2017) identified lack of funding for state programs as a major contributor to the variability in data quality, and they recommended more research to assess the effects of state quality-control programs on the quality of NIBRS submissions.

As stated in Conclusion 7-1, the UCR Program is still in transition, with incomplete data reporting and variation in measurement methods across jurisdictions. Short-term priorities include improving coverage and

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

consistency of measurement for NIBRS, as well as continuing to develop and refine statistical methods that produce accurate statistics, with valid measures of uncertainty, for the population of law enforcement agencies covered by NIBRS. Sections 7.4 and 7.6 mention that alternative data sources could contribute to these endeavors, through improving imputation models for missing data and through providing external sources of information for evaluating measurement accuracy.

Longer-term activities, however, which may include a more fundamental consideration of some of the alternative data sources described in Sections 7.3 and 7.4, will require a different approach. The National Academies’ reports on Modernizing Crime Statistics concluded that continued improvement in crime statistics will require “enhancements to and expansions of the current data collections, as well as new data collection systems” (NASEM, 2018, p. 6) and that there is “currently no entity responsible for reporting on the full range of crimes” (NASEM, 2018, p. 10). These reports recommended that the U.S. Office of Management and Budget establish a structure for the governance of “the complete U.S. crime statistics enterprise” (see Box 7-1). Such a structure could include crimes measured by regulatory agencies, such as information on fraud and identity theft collected by the Federal Trade Commission.

Acquiring and using data from other sources will require cooperation from data providers: “Data sharing is incentivized when all data holders enjoy tangible benefits valuable to their missions, and when societal benefits are proportionate to possible costs and risks” (NASEM, 2023, p. 6). NIBRS requires law enforcement agencies to submit data in NIBRS format. This has advantages of standardizing data collection and data elements, but also has costs to law enforcement agencies and states.

An alternative model for data sharing, particularly if previously excluded types of crime and new data sources are to be included, is to “shift some burden of data standardization from respondents to the state and federal levels” (NASEM, 2018, Conclusion 3.1; see Box 7-1). As an example, in a panel discussion, Smith (2022) commented on the potential for using data directly from law enforcement agencies to obtain more timely national statistics:

We could be looking experimentally at what it means to bring in crime incident data directly from the source—not as a bypass to how official statistics are captured by the FBI through the NIBRS system [but to enhance] the information that we get, collecting it in a much more timely way for a smaller subset of agencies and then being able to expand that picture with these national collections that are already in place. Some of that direct connection to crime incident data could involve narrative where possible, capitalizing on the [artificial intelligence] and [machine learning]

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

tools that we have available to us now, to really try to understand more specifically what is the connection between what police see, what we see in the victimization data in the NCVS, and some of these other sources of information. There is a lot that technology can do for us.

The COVID-19 pandemic underlined the urgent need for nationally representative and timely data about crime: UCR and NCVS statistics for 2020, published in September 2021, could not help local jurisdictions decide how to deal with changes in crime patterns caused by lockdowns and changed activity patterns. Some researchers filled the void by assembling their own datasets from conveniently available online data (see Section 7.4), but these datasets were not nationally representative.

As suggested by Smith (2022), more timely data could be assembled for crimes known to the police by selecting a probability sample of law enforcement agencies to provide real-time crime incident data. The burden of providing such data could be substantially reduced by developing procedures that could take data in the format supplied by each agency and convert it to the format needed for the statistics. In this model, the sampled law enforcement agency would provide data that it already collects, along with documentation (perhaps developed jointly with BJS) about the data elements. BJS would then map the agency’s data onto standardized crime, demographic, and circumstance categories. This would reduce the burden on individual data providers while providing timely national statistics about crime. Technology might similarly be able to help speed measurement of crimes not reported to the police by, for example, collecting real-time reports of incidents from a probability sample of people who were provided with smartphones for that purpose.

The panel holds that the area of crime statistics could benefit greatly from increased use of multiple data sources to improve coverage of crimes, improve coverage of populations and businesses affected by crime, and allow research about the differential impact of crime on subpopulations. Profiting from these data sources, however, will require investment in data infrastructure, personnel, and statistical methods for working with the data sources, as well as a structure for coordinating data collection across agencies.

CONCLUSION 7-2: Improving crime statistics will require coordination of the National Crime Victimization Survey and Uniform Crime Reporting Program with new data sources that can provide timely and detailed information about crimes, including those measured in the current classification systems and those that are currently unmeasured. This will entail increased investment in research on directly using data collected by police departments and on developing new data resources.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

This page intentionally left blank.

Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 141
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 142
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 143
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 144
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 145
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 146
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 147
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 148
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 149
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 150
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 151
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 152
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 153
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 154
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 155
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 156
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 157
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 158
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 159
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 160
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 161
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 162
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 163
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 164
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 165
Suggested Citation:"7 Combining Multiple Data Sources to Measure Crime." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 166
Next: 8 Using Multiple Data Sources for County-Level Crop Estimates »
Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources Get This Book
×
Buy Paperback | $35.00 Buy Ebook | $28.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Much of the statistical information currently produced by federal statistical agencies - information about economic, social, and physical well-being that is essential for the functioning of modern society - comes from sample surveys. In recent years, there has been a proliferation of data from other sources, including data collected by government agencies while administering programs, satellite and sensor data, private-sector data such as electronic health records and credit card transaction data, and massive amounts of data available on the internet. How can these data sources be used to enhance the information currently collected on surveys, and to provide new frontiers for producing information and statistics to benefit American society?

Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources, the second report in a series funded by the National Science Foundation, discusses how use of multiple data sources can improve the quality of national and subnational statistics while promoting data equity. This report explores implications of combining survey data with other data sources through examples relating to the areas of income, health, crime, and agriculture.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!