Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
500 Fifth Street, NW Washington, DC 20001 Phone (202) 334-2934 Fax (202) 334-2003 www.TRB.org October 14, 2013 Mr. Victor M. Mendez Administrator Federal Highway Administration U.S. Department of Transportation 1200 New Jersey Avenue, SE Washington, DC 20590 Mr. David L. Strickland Administrator National Highway Traffic Safety Administration U.S. Department of Transportation 1200 New Jersey Avenue, SE Washington, DC 20590 Mr. Bud Wright Executive Director American Association of State Highway and Transportation Officials 444 North Capitol Street, NW Suite 225 Washington, DC 20001 SUBJECT: Second Report from the Committee on the Long-Term Stewardship of Safety Data from the Second Strategic Highway Research Program Dear Mr. Mendez, Mr. Strickland, and Mr. Wright: In response to a request from the U.S. Department of Transportation (US DOT), the National Research Council (NRC) formed a committee to examine the long-term stewardship requirements for the second Strategic Highway Research Program (SHRP 2) driving-safety data and to provide advice on strategies to meet those requirements. (See Appendix A for the committeeâs statement of task and Appendix B for biosketches of the committee members.) The committee issued its first letter report on May 3, 2013.1 As discussed in that report, SHRP 2,2 including the collection of Naturalistic Driving Study (NDS) data3 and the development of the Roadway Information Database (RID),4 is scheduled to end in March 2015, and decisions are being made about the disposition of the completed driving-safety data set. The committee indicated in its first letter report that there has been far too little experience with using driving- safety data at the scale and complexity of the SHRP 2 data. As a result, there are many uncertainties about the availability and use of the data set that make firm decisions about long-term institutional and financial arrangements concerning the data ill-advised at this time. The committee therefore recommended a phased approach to the long-term administration of the driving-safety data. The first phase (referred to as Phase 1) would be a period of experimentation with the 1 The committeeâs first letter report is available at http://www.trb.org/SafetyHumanFactors/Blurbs/168924.aspx 2 http://www.trb.org/StrategicHighwayResearchProgram2SHRP2/General.aspx. 3 http://forums.shrp2nds.us 4 http://www.ctre.iastate.edu/shrp2-s04a/
2 administration of the driving-safety data and its actual use for research purposes. As explained in the committeeâs first letter report, Phase 1 would be expected to last about 5 years to allow sufficient time to develop and test strategies for making the data set available, facilitate its productive use, evaluate efforts to ensure confidentiality, and identify long-term and sustainable funding strategies for subsequent phases. At the committeeâs most recent meeting, held July 31 through August 1, 2013, the Federal Highway Administrationâs (FHWAâs) Associate Administrator for Safety indicated that US DOT agreed with the recommendations the committee provided in its first letter report and that the agency intends to pursue Phase 1 after SHRP 2 ends.5 US DOT is seeking to execute an agreement with the National Academy of Sciences (NAS) to establish the Phase 1 governance board (or oversight committee) and administer Phase 1 activities. The governance board (expected to be composed of stakeholders and other interested individuals with relevant expertise, as listed in the committeeâs first report) would be responsible during Phase 1 for making critical policy decisions about overall operation and for setting policies for data access and use, information privacy and confidentiality, security, pricing, types of product offerings, and performance evaluation. The governance board would also oversee Phase 1 pilot studies of operating strategies and design a transition plan to subsequent phases of long-term productive use of the driving-safety data. The committee noted that the existing SHRP 2 oversight committee could serve as a model for how the board might function. Like the SHRP 2 oversight committee, the governance board would have the ability to ask the National Research Councilâs Transportation Research Board (TRB) to appoint ad hoc technical committees to provide expert advice on specific topics to the board. The committee recommended that the governance board be convened as early as possible in 2014 while SHRP 2 is still in operation. It could prepare a plan for transitioning to Phase 1 and for obtaining key empirical information about the operation and uses of the combined NDS and RID data set. In addition, convening the board before this committee has been disbanded would avoid a gap in the capacity to provide advice for planning Phase 1. In this report, the committee provides a set of principles intended to maximize the use of the data and to ensure that their use is appropriate (e.g., that privacy is protected) and sustained for a long time. In addition, the committee provides recommendations concerning priority issues for the governance board to consider and specific activities for obtaining key empirical information in Phase 1. This report builds on the committeeâs first report and its discussion of Phase 1 planning, as well as on reports of recent experiences of users of the NDS data as presented at the committeeâs recent meeting (see Appendix C). This report has been reviewed in draft form in accordance with procedures approved by the NRC Report Review Committee (see Appendix D for a list of reviewers). 5 Presentation to committee by Tony Furst, Associate Administrator for Safety, FHWA, US DOT, July 31, 2013.
3 PRINCIPLES Principle 1: Facilitate the use of the data by a variety of researchers. Greatest benefits will be derived from the driving-safety data if they are used by the widest possible array of researchers. Minimizing barriers to data access and making the data easier to use will promote their widespread use, which is expected to lead to benefits of increased safety, including crashes avoided and lives saved. The committee expects that the volume of useful results coming from analyzing the data set will be proportional to the number and variety of users who have access to the data. Researchers are likely to come from a wide variety of different organizations, including federal and state agencies, original equipment manufacturers (OEMs) (e.g., motor vehicle manufacturers), academic and nonacademic research institutes, and nongovernmental organizations. Therefore, the researchers can be expected to have a wide range of analytic capabilities, financial resources, and data needs for studying the driving-safety data set. Potential impediments to broad use of the data set include the cost of accessing it, the lack of researcher experience with the data, and the limited availability of analytic tools. Researchers who have not had experience working with data sets as large and complex as this one will need expert assistance to get the most out of it. A wide variety of researchers would be able to work with the data if they are readily accessible and available at moderate cost in formats that would not require excessive training in data handling. Accessibility and availability of many forms of data (e.g., raw data; cleaned and coded data; refined data; and derived, specialized data sets) need to be considered in planning for management and dissemination of the data. Access issues, particularly confidentiality requirements, will vary for different types of data (e.g., data with and without personally identifying information (PII)). Reduced data sets, with PII removed or transformed, could be designed to meet the needs of some users. Such data sets can be made available through the Internet or other mechanisms that do not require high levels of security or computing power. However, some researchers will need access to data containing PII. Because access to data with PII poses additional privacy and confidentiality concerns, a higher level of security will be necessary to protect such applications. Making some of the complex data products available to advanced users at reduced cost and developing a library of software tools would foster the growth of a user base. It will be valuable to encourage third-party data-tool development, through, for example, incentives such as prize competitions (possibly sponsored by private entities). Data management, analysis, and visualization tools are evolving rapidly, and significant changes in capabilities and application potentials can be expected over the useful life of the driving-safety data set. In Phase 1 and beyond, to maximize the value of the data set, it will be important for the governance board to adopt a flexible approach to data policies and management that will ensure the effective use of new methods and tools.
4 Principle 2: Offer multiple access opportunities. These will promote wider use of the driving- safety data. At the present time, one operator is providing a single point of access to the NDS data for researchers. RID data are being developed separately. The committee considered two potentially beneficial approaches to facilitating the wider use of driving-safety data during Phase 1: 1. Multiple operators, each with a copy of the entire driving-safety data set, and 2. A single (prime) operator holding the sole copy of the data set, along with multiple, geographically dispersed operators at other facilities that would have remote access toâbut not possession ofâ the data set itself. Either of those options would have several potential benefits compared with having a single operator and a single point of data access: ï· Increased capacity to provide efficient and economical data access and technical support to researchers, ï· Diversification of services provided to meet the varied needs and circumstances of researchers, ï· Reduced travel burden on those researchers, who would need face-to-face access to expert guidance or to the PII data, and ï· Encouragement of innovation in support services, access, and delivery modalities. The second option appears to be more practicable than the first one for Phase 1 because it would not involve the costs of establishing an appropriate infrastructure to house and operate the data at multiple sites and of transferring a copy or copies of the entire large and complex data set to multiple operators. The committee notes that the Phase 1 experience may elucidate the benefits of creating multiple full or partial copies of the data set in several locations. Researchers will need to get Institutional Review Board (IRB) approval before they access the data. Some local IRBs may not be familiar with data of this type, so it will be more difficultâ and take longerâfor them to review proposed uses of the driving-safety data. Also, foreign researchers, operating under different legal requirements, may be involved in using the data either independently or in collaboration with domestic researchers. To facilitate and promote consistency in IRB reviews, it may be desirable to train or at least brief the diverse IRBs expected to review projects that use the data. A forthcoming report about the NRC study on Proposed Revisions to the Common Rule for the Protection of Human Subjects in Research in the Behavioral and Social Sciences may provide some insights into how users could interact with the IRB process more expeditiously.6 6 In its forthcoming report, the committee will address proposed changes to the current regulations for protecting research participants under Title 45, Part 46, in the Code of Federal Regulations (âthe Common Ruleâ). The committeeâs report is expected to be released in December 2013 with recommendations concerning the changes that were proposed by the HHS Office of Human Research Protections (OHRP). One of OHRPâs proposed changes pertains to streamlining IRB review of multisite studies under a central IRB to increase efficiencies. A related workshop report is available at http://www.nap.edu/catalog.php?record_id=18383.
5 Principle 3: Protect privacy and ensure data integrity. Protecting privacy and ensuring data integrity are vital for all aspects of data use. Procedures will need to be established for protecting privacy (which includes ensuring confidentiality) and ensuring integrity of the data while providing remote access to the data by researchers.7 Because the NDS data contain a large amount of PII, a number of privacy issues, particularly the risk of violating the commitment to keep individual driver data confidential as promised to volunteers in the consent agreements, need to be addressed in Phase 1. It will be important to guard against the risks of disclosure that may arise when driving-safety data without PII are linked to roadway data or data from other sources and the possibility of the application of widely used re-identification techniques. In addition, during Phase I, protocols will need to be developed and evaluated to protect the integrity of the driving-safety data to guard against unauthorized modification.8 The application and effectiveness of data management strategies to ensure continuing integrity and security of the data will be important performance aspects of driving-safety data operation as researchers are provided appropriate access to data. Principle 4: Document early strategies and use best practices. Near-term strategies, decisions, and approaches concerning the driving-safety data can have implications over the long term. The driving-safety data will likely be managed, distributed, and analyzed through different physical infrastructures over the long time span of expected data use. To help ensure stability and extensibility, clear documentation of those evolving data structures is needed. Application of best data-management practices9 by the operator will allow data users in the future to know exactly how the reduced and specialized data sets were developed. Continuity planning for phases beyond Phase 1 will be essential to the long-term sustainability of the database. Important Phase 1 activities will include: ï· Designing a transition plan to subsequent phases, ï· Evaluating performance of the management and dissemination processes, ï· Developing a process for appointing a governance board after Phase 1, and ï· Considering relationships between the new board and the data owner in future phases. 7 Security and Privacy Controls for Federal Information Systems And Organizations, NIST Special Publication 800-53, Revision 4, Joint Task Force Transformation Initiative, Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology. Available at http://dx.doi.org/10.6028/NIST.SP.800-53r4. 8 The Glossary of Key Information Security Terms provided in National Institute of Standards and Technology Computer Security Division âInteragency or Internal Report 7298, Revision 2â (May 2013, R. Kissel, ed.) defines âdata integrityâ as âthe property that data has not been altered in an unauthorized manner. Data integrity covers data in storage, during processing, and while in transitâ (Source: SP 800) and as âthe property that data has not been changed, destroyed, or lost in an unauthorized or accidental manner.â (SOURCE: CNSSI 4009). 9 See, for example, http://www.oracle.com/technetwork/articles/entarch/oea-best-practices-data-gov-400760.pdf
6 Principle 5: Enhance researcher collaboration. Frequent interactions among independent researchers and the governance board regarding use of the driving-safety data will promote collaboration, foster synergies, improve learning, and contribute to plans for future phases of safety-data implementation. The governance board will have the opportunity to facilitate the continual exchange of information among diverse researchers, including information about how the data are being used in the American Association of State Highway and Transportation Officialsâ (AASHTOâs) âconcept to countermeasureâ program, the National Highway Traffic Safety Administrationâs (NHTSAâs) speeding study, and the Toyota Collaborative Safety Research Centerâs creation of a driver distraction database. FHWA is currently conducting a feasibility study of its plan to establish a data enclave10 at the Turner-Fairbank Highway Research Center that will support the use of highway-safety data, including data from SHRP 2.11 If this plan moves ahead, it will present an opportunity for FHWA to share learning and approaches for accessing the data and ensuring the confidentiality of drivers involved in the SHRP 2 study. The committee looks forward to hearing more details about FHWAâs project at its next meeting. There may be important interactions between that project and the Phase 1 planning for multiple access points to the data. Principle 6: Use a sustainable financial model. Such a model is needed to ensure long-term availability of the data. Preliminary estimates of the cost of making the safety data available to researchers (including housing, managing, providing access and ensuring confidentiality) were provided to the committee.12 It will be important to develop refined cost estimates based on the actual costs of producing data products and providing researcher access during Phase 1. It will also be important to assess usersâ willingness and ability to pay for data access and support services, knowing that willingness to pay for the data may change over time. The Phase 1 experience with marketing and pricing data from this unusual data set will provide a factual basis for setting fees. Obtaining such experience is essential because there is a need to balance operating cost recovery with ease of data access. The committee is not aware of any large database made available to the broad community of researchers that has been able to recover all costs directly from individual end-users over multiple years. As the committee indicated in its first report, consideration of private sources of funding, including public-private partnerships, will be required for developing long-term, sustainable funding. This would be in addition to identifying strategies for marketing the data to potential users.13 10 A data enclave is a secure environment through which confidential data can be accessed remotely. 11 Presentation to committee by Tony Furst, Associate Administrator for Safety, FHWA, US DOT, July 31, 2013. 12 In a presentation to the committee on July 31, 2013, Jon Hankey, Senior Associate Director, Virginia Tech Transportation Institute, said, âThere are a number of fixed costs that require a minimal amount of resources; even with cost share and user fees, this will be $4â5 million per year.â Volpe National Transportation Systems Center of the US DOT Research and Innovative Technology Administration, Cambridge, Massachusetts, estimated annual ongoing infrastructure costs to be more than $3 million (see page 60 of Volpeâs Jan 2013 draft report Options for Long-Term Stewardship and Ownership of the SHRP 2 Safety Data). 13 For additional discussion, see Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information, by the Blue Ribbon Task Force on Sustainable Preservation and Access. http://brtf.sdsc.edu/
7 RECOMMENDATIONS US DOT is seeking to execute an agreement with NAS to establish the governance board and to administer Phase 1 activities. In addition, agreements will need to be established with one or more operators. In making the following recommendations, the committee does not presuppose the terms of any agreements that will be established for Phase 1. Nevertheless, the committee recommends these priority actions by the governance board: 1. Pursue the option of using one prime operator that maintains the entire data set (containing NDS and RID data) and other operators that provide multiple access points to the data set. To meet the range of data-access, support-services, and price needs of diverse users, different locations and types of operators should be selected. The resulting experience should inform plans to move management of the database beyond Phase 1. 2. To the extent that multiple operators are to be involved in Phase 1, ensure that the agreement with the prime operator (which will hold a copy of the entire data set in Phase 1) specifically permits and facilitates multiple access points maintained by other operators, with which the prime operator will actively cooperate. 3. Explore possibilities for lessening potential IRB-related research delays, including offering consistent IRB training for cases where approvals from multiple IRBs may be required and developing an information package describing the driving-safety data designed for the researchersâ use in their submission to their IRB. 4. Develop a compendium of privacy-protection and data-integrity approaches and assess their effectiveness in the context of the NDS and RID data sets as well as the planned linkages between them. Consider the applicability of those approaches to the use of the data by foreign researchers either individually or through collaboration with domestic researchers. 5. Develop operating policies for data use and qualification criteria for users. 6. Develop approaches for determining ownership of and access to derivative data products and the distribution of those products among data-access points. 7. Explore approaches for building capacity in the research community both for using the safety data and for providing user support services and access tools. For example, workshops and training courses could focus on particular types of data or applications of the data, and âtrain-the-trainerâ programs could be used to develop knowledgeable people who can provide various types of one-on-one support, as well as one-to-many training, at multiple locations. 8. Foster an interactive and collaborative user community. The governance board should plan to serve as a clearinghouse by facilitating the continual exchange of information among diverse researchers who are using the driving-safety data in Phase 1, and it should consider how this function can grow in effectiveness after Phase 1. 9. To facilitate ease of use by researchers and enable data to be replicated or moved, use best practices in managing large and secure data sets. Establish requirements for detailed
8 and accurate documentation of data sets, database structures, and other data files. During Phase 1, encourage the development of a diverse portfolio of software capabilities and tools for data manipulation and analysis through incentives such as development grants and competitions. 10. Define the process for developing a business plan for long-term, sustainable funding, including exploring potential funding sources in addition to US DOT such as other federal agencies that might contribute as public-sector funding partners (e.g., the National Institutes of Health, the Centers for Disease Control and Prevention, and the National Science Foundation) and potential private sources (e.g., original equipment manufacturers, the insurance industry, and public interest groups). CONCLUSION The committee expects that Phase 1 will provide important experiences with the uses of the data that will inform the development of long-term policies and management strategies for data use, protection, costs, and pricing. Given current uncertainties and resource limitations, it may not be feasible to fully answer all priority questions concerning these policy and management issues during Phase 1 according to a predetermined schedule. Furthermore, analysis methods, software, and hardware for data coding, analysis, dissemination, and security are evolving rapidly, so both capabilities and user needs may be quite different 5 years from now. It will be important for the governance board to take an adaptive and iterative approach to experimentation and decision making. Management and decision making will need to be flexible and opportunistic as outcomes from previous actions and other events become better understood. By the end of Phase 1, the governance board will need to have assessed the efficacy of this model for data delivery, security, and funding. Key assessment considerations will include the following: ï· User perspectives on accessing the driving-safety data and responses to the pricing for the access, ï· Effectiveness of measures to preserve privacy and data integrity, ï· Costs and cost recovery or cost-sharing approaches per unit of service provided, ï· Research products produced (e.g., derivative data sets and publications), and their use and impacts, and ï· Performance-evaluation metrics for the data-delivery process. Sincerely, Joseph L. Schofer Chair, Committee on the Long-Term Stewardship of Safety Data from the Second Strategic Highway Research Program Attachments