National Academies Press: OpenBook

Massive Data Sets: Proceedings of a Workshop (1996)

Chapter: Management Issues in the Analysis of Large-Scale Crime Data Sets

« Previous: Massive Data Sets in Semiconductor Manufacturing
Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

Management Issues In The Analysis Of Large-Scale Crime Data Sets

Charles R. Kindermann

Marshall M. DeBerry, Jr.

Bureau of Justice Statistics U.S. Department of Justice

1 The Information Glut

The Bureau of Justice Statistics (BJS), a component agency in the Department of Justice. has the responsibility for collecting, analyzing, publishing and disseminating information on crime. criminal offenders, victims of crime. and the operation of justice systems at all levels of government. Two very large data sets—the National Incident-Based Reporting System (NIBRS) and the National Crime Victimization Survey (NCVS)-are part of the analytic activities of the Bureau. A brief overview of the two programs is presented below.

2 NIBRS

NIBRS, which will eventually replace the traditional Uniform Crime Reporting (UCR)1 Program as the source of official FBI counts of crimes reported to law enforcement agencies, is designed to go far beyond the summary-based U CR in terms of information about crime. This summary-based reporting program counts incidents and arrests, with some expanded data on incidents of murder and nonnegligent manslaughter.

In incidents where more than one offense occurs, the traditional UCR counts only the most serious of the offenses. NIBRS includes information about each of the different offenses (up to a maximum of ten) that may occur within a single incident. As a result. the NIBRS data can be used to study how often and under what circumstances certain offenses. such as burglary and rape, occur together.

The ability to link information about many aspects of a crime to the crime incident marks the most important difference between NIBRS and the traditional UCR. These various aspects of the crime incident are represented in NIBRS by a series of more than fifty data elements. The NIBRS data elements are categorized into six segments: administrative. offenses, property, victim, offender. and arrestee. NIBRS enables analysts to study how these data elements relate to each other for each type of offense.

1  

The Uniform Crime Reporting (UCR) Program is a nationwide, cooperative statistical effort of approximately 16.000 city. county, and state law enforcement agencies voluntarily reporting data on crimes brought to their attention. The Federal Bureau of Investigation (FBI) has administered this program since 1930.

Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

3 NCVS

The Bureau of Justice Statistics also sponsors and analyzes the National Crime Victimization Survey (NCVS) and ongoing national household survey that was begun in 1972 to collect data on personal and household victimization experiences. All persons 12 years of age and older are interviewed in approximately 50.000 households every six months throughout the Nation. There are approximately 650 variables on the NCVS data file. ranging from the type of crime committed, the time and place of occurrence. and whether or not the crime was reported to law enforcement authorities. The average size of the data file for all crimes reported for a particular calendar year is 120 megabytes.

The NCVS utilizes a hierarchical file structure for its data records. In the NCVS there are four types of records: a household link record, followed by the household, personal. and incident records. The household record contains information about the household as reported by the respondent and characteristics of the surrounding area as computed by the Bureau of the Census. The person record contains information about each household member 12 years of age and older as reported by that person or proxy. with one record for each qualifying individual. Finally, the incident record contains information drawn from the incident report, completed for each household or person incident mentioned during the interview. The NCVS is a somewhat smaller data set than NIBRS. but may be considered analytically more complex because 1) there is more information available for each incident and 2) it is a panel design, i.e., the persons in each housing unit are interviewed every six months for a period of three years, thereby allowing for some degree of limited longitudinal comparison of households over time.

4 Data Utilization

An example of how those interested in the study of crime can tap the potentially rich source of new information represented by NIBRS is seen in the current Supplementary Homicide Reports data published annually by the FBI in its Crime in

the United States series. Crosstabulations of various incident-based data elements are presented, including the age, sex, and race of victims and offenders, the types of weapon(s) used, the relationship of the victim to the offender, and the circumstances surrounding the incident (for example, whether the murder resulted from a robbery, rape, or argument). The NIBRS data will offer a variable set similar in scope.

Currently, portions of eight states are reporting NIBRS data to the FBI. In 1991. three small states reported 500,000 crime incidents that required approximately one gigabyte of storage. If current NIBRS storage demands were extrapolated to full nationwide participation. 40 gigabytes of storage would be needed each year.

Although full nationwide participation in NIBRS is not a realistic short-term expectation. it is realistic to expect that a fourth of the U.S. could be represented in NIBRS within the next several years. The corresponding volume of data, 10 gigabytes each year. could still be problematic for storage and analysis.

Certain strategies may be chosen to reduce the size of the corresponding NIBRS data files. For example, most users of NIBRS data may not need or desire a data file that contains

Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

all twenty-two types of GROUP A offences, which contains crimes such as sports tampering, impersonation, and gambling equipment violations. If a user is interested in much smaller file, only the more common offenses, such as aggravated assault, motor vehicle theft, burglary, or larceny/theft, could be included in the data set. Another area in which data reduction can be achieved is in the actual NIBRS record layout. Although the multiple-record format may aid law enforcement agencies in the inputting of the data, it can create difficulties in analyzing the files. For example, in the current NIBRS format, each record, regardless of type, begins with 300 bytes reserved for originating agency identifier (ORI) information. Currently, nearly a third of each ORI header is filler space reserved for future use. Moreover, the records for the different incident types have been padded with filler so as to be stored as fixed length records instead of variable length records. This wasted space occupied by multiple ORI headers and filler can be eliminated by restructuring and reorganizing the current file structure into a more suitable format that current statistical software packages can utilize.

Even with the restructuring of the current record formats, the annual collection of NIBRS data will still result in a large volume of data to be organized, stored, and analyzed. One strategy BJS is considering is to sample the NIBRS data in order to better manage the volume of data expected. Since the NIBRS program can be viewed as a potentially complete enumeration of incidents obtained by law enforcement agencies, simple random sampling could be employed, thereby avoiding the complications of developing a complex sample design strategy and facilitating the use of "off the shelf" statistical software packages.

Using the sample design of the NCVS, BJS has produced a 100 megabyte longitudinal file of household units that covers a period of four and one half years. This file contains information on both interviewed and noninterviewed households in a selected portion of the sample over the seven interviews. The NCVS longitudinal file can facilitate the examination of patterns of victimization over time, the response of the police to victimizations. the effect of life events on the likelihood of victimization, and the long term effects of criminal victimization on victims and the criminal justice system. However, current analysis of this particular data file has been hampered by issues relating to the sample design and utilizing popular statistical software packages. Since the NCVS utilizes a complex sample design, standard statistical techniques that assume a simple random sample cannot be utilized. Although there are software packages that can deal with complex sample designs, the NCVS data are collected by the Bureau of the Census under Title 13 of the U.S. code. As a result, selected information that would identify primary sampling units and clusters is suppressed to preserve confidentiality. Researchers, therefore, cannot compute variances and standard errors for their analyses on this particular data sets. BJS is currently working with the Bureau of the Census to facilitate the computation of modified sample information to be included on future public use tapes that will facilitate the computation of the appropriate sample variances.

Most of the current statistical software packages are geared to processing data on a case by case basis. The NCVS longitudinal file is structured in a nested hierarchical manner. When trying to examine events over a selected time period, it becomes difficult to rearrange the data in a way that will facilitate understanding the time or longitudinal aspects of the data. For example, the concept of what constitutes a "case record" depends on the perspective of the current question. Is a case all households that remain in sample over all seven interviews,

Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

or is it those households that are replaced at every interview period? Moving the appropriate incident data from the lower levels of the nested file to the upper level of the household can complicate obtaining a "true" count of the number of households experiencing a victimization event, since many statistical software packages duplicate information at the upper level of the file structure down to the lower level.

5 Future Issues

Local law enforcement agencies will be participating on a voluntary basis. NIBRS data collection and aggregation at the agency-level will be far more labor and resource-intensive than the current UCR system. What are the implications for coverage and data accuracy?

Criminal justice data have a short shelf life, because detection of current trends is important for planning and interdiction effectiveness. Can new methods be found to process massive data files and produce information in a time frame that is useful to the criminal justice community? Numerous offenses such as sports tampering are not of great national interest. A subset of the full NIBRS file based on scientific sampling procedures could facilitate many types of analyses.

How easy is it to integrate change into such a data system, as evaluations of NIBRS identify new information needs that it will be required to address? Does the sheer volume of data and reporting agencies make this need any more difficult than for smaller on-going data collections? As data storage technology continues to evolve, it is important to weigh both cost and future compatibility needs, particularly in regards to distributing the data to law enforcement agencies and the public. BJS will continue to monitor these technological changes so that we will be able to utilize such advances in order to enhance our analytic capabilities with these large scale datasets.

Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 77
Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 78
Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 79
Suggested Citation:"Management Issues in the Analysis of Large-Scale Crime Data Sets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 80
Next: Analyzing Telephone Network Data »
Massive Data Sets: Proceedings of a Workshop Get This Book
×
Buy Paperback | $65.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!