National Academies Press: OpenBook

Designing the Archive for SHRP 2 Reliability and Reliability-Related Data (2014)

Chapter: Chapter 3 - Preparatory Analysis

« Previous: Chapter 2 - Approach
Page 13
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 13
Page 14
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 14
Page 15
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 15
Page 16
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 16
Page 17
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 17
Page 18
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 18
Page 19
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 19
Page 20
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 20
Page 21
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 21
Page 22
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 22
Page 23
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 23
Page 24
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 24
Page 25
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 25
Page 26
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 26
Page 27
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 27
Page 28
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 28
Page 29
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 29
Page 30
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 30
Page 31
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 31
Page 32
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 32
Page 33
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 33
Page 34
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 34
Page 35
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 35
Page 36
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 36
Page 37
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 37
Page 38
Suggested Citation:"Chapter 3 - Preparatory Analysis." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.
×
Page 38

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

13 Preparatory Analysis The project team performed a thorough preparatory analysis task to get more familiar with the data archiving state-of- practice and understand the available content management technologies that can be used for L13A. This task included review of the L13 report (Section 3.1) as well as the past work on data archiving systems. This preparatory analysis was con- ducted to share the outcomes of the team’s efforts on review- ing existing archived data user services (Section 3.2), online archive systems (Section 3.3), and commercially available archiving technologies (Section 3.4). The major objective of this analysis was to help the team with the following: 1. Come up with a preliminary system design; and 2. Identify an existing commercial off-the-shelf (COTS) content management system on which the Archive system could be built (Section 3.5.3). 3.1 Review of the L13 Report 3.1.1 Summary The SHRP 2 L13 project, Requirements and Feasibility of a System for Archiving and Disseminating Data from SHRP 2 Reliability and Related Studies, was completed by Weris, Inc. between September 2008 and March 2010. The final report is available at http://onlinepubs.trb.org/onlinepubs/shrp2/ SHRP2_S2-L13-RW-1.pdf (Tao et al. 2011). The L13 (prototype) project report set out to identify the best way of meeting the three main goals of the Reliability Archive: • Preserving the SHRP 2 digital assets for up to 50 years; • Providing open access to transportation practitioners; • Establishing a framework that can be used in other projects or for collaboration purposes. Using those criteria, the SHRP 2 L13 research team focused on a version of an “active” archive system that could serve as a repository capable of managing files and metadata from differ- ent content sources. The aim was to preserve a diverse but related collection of digital artifacts and to make them accessi- ble to practitioners and subsequent generations of researchers. The L13 research team proposed that the conceptual design pattern for the archival system follow that of a digital library or museum. The research team assessed the technical, economic, and business aspects of the proposed archiving and dissemina- tion system. This process was accomplished through inter- views with the key stakeholders and a literature review of available and emerging technologies that might be appli- cable to the Archive. Based on this foundational work, the research team developed a vision for the Reliability Archive system that contained key high-level goals. The goals pro- vided guiding principles for the development of a conceptual design and a detailed set of requirements for the Reliability Archive. Starting from a conceptual design—based on their vision of a digital museum—the L13 authors created detailed sys- tem requirements and computed estimated life-cycle costs for three alternatives: • An in-house File Transfer Protocol (FTP) web cluster; • An in-house relational database; • A commercial cloud-based system. Of the three alternatives, the research team found that the commercial cloud-based system exhibited the lowest initial costs, the lowest recurring costs, the highest flexibility, and the best user accessibility. Given this finding, the team recom- mended a cloud storage system, which uses a pay-as-you-go, web-based access model. The research team found that the in-house alternatives require significant up-front equipment purchase and instal- lation that may be time-consuming and subject to bureau- cratic delays. C h a p t e R 3

14 3.1.2 Findings 3.1.2.1 SHRP 2 Management Perspective One of the primary objectives of the Reliability Archive was to allow users to find and validate the research results from relevant SHRP 2 projects and to refine and build on research results in the future. Another primary objective of the Reli- ability Archive was to preserve research project data. In other words, there was agreement that the research conclusions need to be archived along with the data. 3.1.2.2 Project Contractor Perspective The research team interviewed contractors of active Reliabil- ity projects and relevant capacity projects to help understand the data used and produced by these projects that would need to be archived. 3.1.2.3 Literature Research As part of the L13 project, the research team conducted a lit- erature review. A survey of the literature in the public domain revealed that the ability to archive digital resources—and the effectiveness of doing so—has grown considerably with the explosive growth of digital information. The L13 report specifically discussed the Reference Model for an Open Archival Information System (OAIS) which has been adopted by the International Organization for Standardiza- tion. The OAIS model defines the major entities and functions of a digital repository. OAIS is a conceptual framework and does not prescribe any specific implementation on any level. The OAIS paradigm has three general parts: • Data ingestion—accepting digital objects into an archive with metadata in Metadata Encoding and Transmission Standard (METS) format; • Data archive and management—storing, managing storage hierarchy, updating administrative and metadata, software and hardware maintenance; and • Data access—locating, applying access controls, and gen- erating responses. 3.1.2.4 Role and Importance of Metadata The L13 report recognized METS as a suitable metadata stan- dard for the Archive system. METS is an Extensible Markup Language (XML) schema that provides a mechanism for recording various relationships that occur between pieces of content and between the content and the metadata that make up a digital object. METS was specifically designed to act as an OAIS information package. Packaging the metadata with the digital object ensures that the object is self-documenting. 3.1.2.5 Conceptual Design for the Archival System The research team’s observations yielded the conclusion that the proposed archival system could not be thought of as a database, the structure of which is known up front. The team proposed that the conceptual design for the archival system follow that of a digital library. The L13 report noted that the project teams would create initial Submission Information Packages (SIP) for conveyance to the archival system. Planning and preparation, for the even- tual submission of SIPs to the Archive toward the end of each project, would need to commence early within each research project. This includes selecting the most preservation-friendly file formats and creating descriptive metadata. All aspects of copyright, privacy, and proprietary rights would need to be documented. Once the necessary preaccessioning work has been per- formed, the six core archive functions of ingestion, data management, archival storage, access, administration, and preservation planning would be performed according to the OAIS model. 3.1.3 System Requirements Proposed by L13 Project The research team noted that consumers are expected to be a worldwide community of transportation practitioners who would use the information directly, as well as researchers who would validate and build on the information base. In addi- tion, the team expected consumers to interact with the archi- val system through a web-based portal. According to the L13 report, all SHRP 2 Reliability projects would produce a range of document-centric files, such as reports and presentations, in various formats. Thus, there would be a need for a document management system. The use of COTS Enterprise Content Management (ECM) packages was discouraged in the L13 final report for various reasons. The L13 report concluded that the Archive should be pre- served for up to 50 years, even though the report only calcu- lated the maintenance costs until 2035. 3.1.4 User Interfaces The L13 report determined that the user interface (UI) would be based on four general user types: • Managers of transportation agencies; • Technical staff of transportation agencies; • Nontransportation professionals; and • Researchers and analysts. Managers of transportation agencies are interested in business processes, strategies, institutional structures, and

15 performance measures. They need to quickly find the conclu- sions of each project, executive summaries, and presentations. The technical staff of transportation agencies are interested in various Reliability products, such as data sets, tools, and reports. They need to quickly find the end products of the projects, which may be organized and grouped by categories such as planning, design, and operations. Nontransporta- tion professionals with some relationship to transportation, such as law enforcement, are interested in the end products related to operational strategies, incident management, and travel time reliability improvement. They need to quickly find project conclusions, results, and operational strategies. Researchers and analysts are interested in understanding transportation, conducting studies, and developing their own methods and technologies. Their focus is the interface to individual projects. The L13 report proposed a UI consisting of the following pages: • Home page; • Navigation of Reliability research projects; • Direct project lists; • Reliability themes; • Data set organization—including data set name, collection method, related projects, location, format, size, derived data, and research results; • Grouping of research products—by the three general catego- ries of planning, design, and operations; and • Search—both simple and advanced, and navigation of project-level data and results. 3.1.5 Data Integrity and Quality Data integrity and quality control were determined to be cru- cial for a successful archive. The L13 report identified three logical points of data quality control: • Within individual Reliability projects; • Through Reliability Project L13A (previously Reliability Project L16 for assistance in preparing data for submission to the Archive); and • Through active enforcement of the preservation policy within the archival system. When a Reliability project is ready to deliver its data to be archived, the project team would be expected to submit the data (and metadata) along with the project’s quality control standards, methods, and assessment. The L13 report sug- gested Reliability Project L13A team would be responsible for reviewing the data quality assessment and would either con- firm or modify the quality rating. The quality rating would be a metadata attribute that would be part of the metadata to be prepared and collected by individual projects, known as Pres- ervation Description Information. The research team identified two types of quality issues with project metadata. The first issue is that each project would most likely use and collect different metadata ele- ments. The other issue is that some metadata information may be inaccurate or incomplete. Detailed metadata guide- lines would need to be developed to define the mandatory metadata and the specifications for data quality. The team suggested that a quality control screen should be set up to assess the project metadata. Once the project metadata passes the data quality screen test, the metadata would be archived to the metadata repository in the L13A Archive. 3.1.6 Data Rights The researchers found that generally there had to be few or no restrictions on the derived data from the Reliability projects. The raw data typically came from the contractor’s existing data sets, a state DOT, or other transportation agency as well as the private sector. The report proposed that access to the data be protected with usage stipulations. 3.1.7 Institutional Framework and Governance As with any large archive of information, the research team stated the necessity for a proven and reliable institutional framework to provide long-term stewardship of the Archive. The L13 research team documented some best practices of national systems and referred to the SHRP 2 implementation report (Committee for the Strategic Highway Research Pro- gram 2, 2009) for recommendations. One of the key recom- mendations of the SHRP 2 implementation report was to designate a principal implementation agent responsible for leading and supporting the SHRP 2 implementation. In addi- tion, the recommendation was to have a similar role estab- lished for the Archive. To support the principal implementation agent, the L13 research team recommended that a stakeholder advisory group provide strategic guidance and technical advice on the long-term stewardship and use of the Archive. 3.1.8 Technical Issues The research team explored specific technical issues that were cited in the L13 Reliability Project request for proposal (RFP): • Data normalization and denormalization; • Online analytical processing (OLAP) and user-defined functions;

16 • Service-oriented architectures (SOA); and • Virtualization. 3.1.8.1 Normalization and Denormalization Normalization and denormalization are used to organize the data by efficient data storage and relationships (normaliza- tion), or optimization for quicker queries at the expense of duplicating data sets (denormalization). The research team determined that data normalization and denormalization do not have any application in the proposed Archive in terms of the postresearch part of the process of preparing data for preservation. 3.1.8.2 Online Analytical Processing and User-Defined Functions The Archive’s purpose is to serve the transportation commu- nity by preserving transportation project information and facilitating lookup, presentation, and downloading of such information. Therefore, it was the L13 research team’s posi- tion that it is not within the scope of the archival system to perform analysis on the stored data, or to perform other open- ended or dynamic user-defined functions on the data. 3.1.8.3 Service-Oriented Architecture Service-oriented architecture involves web-based services pro- vided by a system that exposes their functionality. The report mentioned that SOA and web services could be used to deliver mashups and could also be expected to play other roles in the Reliability Archive. 3.1.8.4 Virtualization Virtualization uses software to abstract a hardware environ- ment. The virtualization software runs on a host operating system, allowing one or more guest operating systems to run on the same hardware platform. This application of virtual- ization was expected to play a role in the deployment of the Reliability Archive, particularly in terms of hosting applica- tion software involved in managing the repository or hosting software that provides user access to the repository. For stor- age, virtualization is used to abstract logical storage from physical storage. The research team found it likely that some form of storage virtualization would be used in the actual deployment of the proposed archival system. 3.1.9 Establishing Solution Alternatives The research team mapped system requirements against potential solution building blocks and concluded that these requirements fell roughly into three blocks of functionality, connected via some kind of workflow as shown in Figure 3.1. The L13 research team identified and discussed two criti- cal issues that would influence the selection of potential alternatives: • The relative importance of certain system functionality over time; and • The estimated total data volume to be preserved in the Archive. 3.1.10 Solution Components and Implementation Approaches In coming up with solutions, the L13 research team consid- ered a wide range of potential technology choices. The L13 team looked at commercial off-the-shelf (COTS) technology, standardized versus proprietary hardware, open source soft- ware (OSS), in-house developed software, hosting, and stor- age and software as a service (SaaS). The L13 research team concluded that in-house software development should be considered only as a last resort and only for limited functionality for which the need is short-term. Based on the L13 report, community-supported OSS should also be considered only under similar circumstances because it generally requires developing significant in-house expertise to implement and support it. COTS software seemed to be the most attractive option for the application and infrastructure software portion of the system, eliminating the burden and issues that arise with self-support of either in-house developed software or community-supported OSS. The research team recommended that cloud storage be considered because the cost of acquiring and managing storage is likely the single largest cost of the system’s lifetime. Figure 3.1. Functional blocks of proposed archival system.

17 The visioning and filtering process that the L13 research team went through led to the conceptual solution framework as shown in Figure 3.2. Using this framework, the research team proposed a num- ber of alternative system solutions, which are described next. 3.1.10.1 Alternative 1 This alternative is a bare minimum solution whose imple- mentation is straight forward, but its capabilities are very limited: storing data in a file system. Its components are listed below and shown in Figure 3.3. 1. Research teams have password-protected access to a spe- cific directory in which they build their project file tree. 2. Web cluster consists of FTP server for uploading data and Hypertext Transfer Protocol (HTTP) server for providing access to the data. 3. Archival storage is provided by self-hosted network- attached storage. Disk size per the network-attached stor- age is 16 TB. 4. Institution staff uses the Archivist Toolkit to catalog the files deposited into the storage. 5. User access to the Archive is provided through directory browsing in Windows Explorer fashion. The L13 report concluded that this alternative was unattract- ive and could only be considered as the last resort. 3.1.10.2 Alternative 2 This alternative is based on digital object repository manage- ment software designed for libraries, museums, and archives. Known content management systems are listed here: http:// en.wikipedia.org/wiki/List_of_content_management_systems. The components of this alternative are listed next and shown in Figure 3.4. 1. Research teams submit the content into the repository via web interface that provides all the necessary forms and enforces access restrictions. 2. Review stage involves automatic, semiautomatic, and man- ual workflows resulting in editing, deleting, and approving the content before its ingestion into the repository. 3. The proposed Relational Database Management System (RDBMS) is Oracle. The idea is that the runtime database holding the content can be automatically built from the METS-formatted metadata. The web, application, and database cluster is a number of self-hosted commodity servers. 4. Digital objects themselves are stored in self-hosted archi- val class storage under write-once, read-only policy with Figure 3.2. Solution framework. Figure 3.3. Alternative 1 concept. Figure 3.4. Alternative 2 concept.

18 object replication to ensure their security and integrity over time. 5. Researchers and practitioners access the repository through a web portal. Web publishing is automatic and driven by the repository metadata, the look and feel being customized by Extensible Stylesheet Language Transformations (XSLT) and Cascading Style Sheets (CSS). Users can navigate the repository through fixed and dynamic classification menus/ paths and perform full-text and faceted searches. 3.1.10.3 Alternative 3 This alternative is almost the same as Alternative 2. The differ- ence is that it is cloud-based. Items 1, 2, and 3 are the same as in Alternative 2. Digital objects are stored in the cloud. The UI is the same as in Alternative 2. This alternative was the solution promoted in the L13 final report. It was justified as a minimal cost alternative (equipment maintenance and system adminis- tration are outsourced). The alternative is shown in Figure 3.5. 3.1.11 Life-Cycle Costs Analysis A section of the L13 report included the research team’s esti- mates on the costs of each alternative archival system, while considering all the life-cycle costs that could be identified over a 25-year period. The life-cycle cost assumptions considered in the analysis included costs associated with initial acquisi- tion, operations, and maintenance as well as periodic upgrades to accommodate technology advances and obsolescence. The life-cycle costs of the three alternatives were summa- rized in the L13 report. Alternative 3 was the minimum cost alternative. The report estimated the cost of Alternative 3 at $5,530,132 over 25 years. The cost of implementation was esti- mated at $173,425 per year, and the duration was estimated to be 1½ years. 3.2 archived Data User Services The U.S. Department of Transportation included Archived Data User Services (ADUS) in the National Intelligent Trans- portation Systems (ITS) Architecture in 1999, envisioning “the unambiguous interchange and reuse of data and infor- mation throughout all functional areas” (FHWA 1998). ADUS requires that data from ITS systems be collected and archived for historical, secondary, and non–real-time uses, and that these data be made readily available to users. This section reviews existing federal guidance on the devel- opment of ADUS systems and reviews transportation-related ADUS systems that have been developed in several states in the United States. 3.2.1 Introduction to Federal Highway Administration ADUS Guidelines The FHWA funds and monitors many state ADUS programs. In the past 10 years, the FHWA has published a number of reports reviewing the progress of ADUS programs and sum- marizing the challenges of ADUS programs across the coun- try (U.S. DOT 2003). The 2003 report identified the major functions of ADUS systems as • Operational data control; • Data import and verification; • Automatic data historical archive, to store the data permanently; • Data warehouse distribution, to provide data to the plan- ning, safety, operations, and research communities; and • ITS community interface. A complete list of ADUS programming procedures and spec- ifications has been compiled by Iteris, and is available online at http://itsarch.iteris.com/itsarch/html/user/usr71.htm. 3.2.2 FHWA ADUS Functions and Guidelines Operational data control is extensively described in a report prepared by the Texas Transportation Institute (TTI) for the FHWA (Turner 2007). Data control—but most important, the resulting data quality—is an important aspect of ADUS systems, as users will likely disregard the validity of the entire system if they encounter erroneous data points. The TTI document provides data control guidelines to ensure data quality. In the interest of promoting a unified approach to ADUS, the FHWA partnered with the ASTM International (formerly American Society for Testing and Materials) to devise national ADUS standards (ASTM 2011). The ASTM report focuses on the technical considerations of implementing an ADUS sys- tem, which is referred to as an Archived Data Management System (ADMS). ASTM developed 10 guiding principles, which it grouped on the basis of whether the focus is on (a) acquiring data, (b) managing the ADUS, or (c) retrieving data and serving information. Table 3.1 is an adaptation of these principles. Figure 3.5. Alternative 3 concept.

19 An April 1998 report to the FHWA’s Office of Highway Policy Information is largely dedicated to ADUS’s “institu- tional issues for implementation” (Margiotta 1998). Among the institutional issues, privacy concerns, liability, and training and outreach are the most relevant to the SHRP 2 L13 project. The Margiotta report describes ways to address these issues. 3.2.2.1 ADUS Transportation Research Board 2007 Workshop An interesting review of the institutional issues described above was organized by the FHWA at the 2007 Transporta- tion Research Board (TRB) annual meeting. Several presen- tations on ADUS implementation and the lessons learned from such implementations are described in Bertini (2007). The workshop involved a discussion about issues with the use of ADUS systems and possible solutions. Table 3.2 pro- vides a starting point for understanding the needs of trans- portation professionals by matching their needs to current Table 3.1. Guiding Principles for ADMS Development Acquiring Data Managing the ADMS (ADUS) Retrieving Data and Information •  Get archived data  from other  centers. •  Integrate  selected other  transportation  data, including  roadside data  collection. •  Manage the  archive to account  for data quality. •  Provide security  for the ADMS. •  Specify and main- tain metadata to  support the  ADMS. •  Manage the inter- faces of the  archive data  administrator. •  Interact with other  archives and  monitor other  standards. •  Process user  requests for data. •  Support analysis  of the archived  data. •  Prepare data for  government  reporting  systems. Source: Adapted from ASTM (2011). Table 3.2. Needs of ADUS Stakeholders Collection and Use of Stakeholder Group Application Method or Function Current Data ITS-Generated Data Metropolitan plan- ning organization  (MPO) and state  transportation  planners Congestion manage- ment systems Congestion monitoring Travel times collected by “float- ing cars”: usually only a few  runs (small samples) on  selected routes. Speeds and  travel times synthesized with  analytic methods (e.g., High- way Capacity Manual, simula- tion) using limited traffic data  (short counts). Effect of inci- dents missed completely with  synthetic methods and mini- mally covered by floating  cars. Roadway surveillance data  (e.g., loop detectors) provide  continuous volume counts  and speeds. Variability can  be directly assessed. Probe  vehicles provide same travel  times as floating cars but  greatly increase sample size  and areawide coverage. The  effect of incidents is embed- ded in surveillance data, and  Incident Management Sys- tems provide details on inci- dent conditions. Long-range plan  development Travel demand forecast- ing (TDF) models Short-duration traffic counts  used for model validation.  Origin–Destination (O-D) pat- terns from infrequent travel  surveys used to calibrate trip  distribution. Link speeds  based on speed limits or  functional class. Link capaci- ties usually based on func- tional class. Roadway surveillance data pro- vide continuous volume  counts, truck percentages,  and speeds. Probe vehicles  can be used to estimate O-D  patterns without the need for  a survey. The emerging TDF  models [e.g., the Transporta- tion Analysis and Simulation  System (TRANSIMS)] will  require detailed data on net- work (e.g., signal timing) that  can be collected automati- cally via ITS. Other TDF for- mulations that account for  variability in travel conditions  can be calibrated against the  continuous volume and  speed data. (continued on next page)

20 Table 3.2. Needs of ADUS Stakeholders (continued) Collection and Use of Stakeholder Group Application Method or Function Current Data ITS-Generated Data MPO/state  transportation  planners  (continued) Corridor analysis Traffic simulation  models Short-duration traffic counts  and turning movements used  as model inputs. Other input  data to run the models col- lected through special efforts  (signal timing). Very little per- formance data available for  model calibration (e.g., inci- dents, speeds, delay). Most input data can be col- lected automatically and  models can be directly cali- brated to actual conditions. Traffic management  operators ITS technology Program and technology  evaluations Extremely limited; special data  collection efforts required. Data from ITS provide the ability  to evaluate the effectiveness  of both ITS and non-ITS pro- grams. For example, data  from an incident management  system can be used to deter- mine changes in verification,  response, and clearance  times due to new technolo- gies or institutional arrange- ments. Freeway surveillance  data can be used to evaluate  the effectiveness of ramp  meters or high-occupancy  vehicle restrictions. Predetermined control  strategies Short-duration traffic counts  and floating car travel time  runs. A limited set of prede- termined control plans is usu- ally developed, mostly due to  the lack of data. Continuous roadway surveil- lance data makes it possible  to develop any number of  predetermined control  strategies. Predictive traffic flow  algorithms Extremely limited. Analysis of historical data forms  the basis of predictive algo- rithms: “What will traffic con- ditions be in the next 15  min?” (Bayesian approach). Transit operators Operations planning Routing and scheduling Manual travel demand and   ridership surveys; special  studies. Electronic fare payment systems  and automatic passenger  counters allow continuous  boardings to be collected.  Computer-aided dispatch  systems allow O-D patterns to  be tracked. Automatic vehicle  identification (AVI) on buses  allows monitoring of schedule  adherence and permits the  accurate setting of schedules  without field review. Air quality analysts Conformity  determinations Analysis with the  MOBILE model Areawide speed data taken  from TDFs. Vehicle miles  traveled (VMT) and vehicle  classifications derived from  short counts. Roadway surveillance provides  actual speeds, volumes, and  truck mix by time of day.  Modal emission models will  require these data in even  greater detail, and ITS is the  only practical source. MPO/state freight  and intermodal  planners Port and intermodal  facilities planning Freight demand models Data collected through rare  special surveys or implied  from national data (e.g.,  Commodity Flow Survey). Electronic credentialing and AVI  allow tracking of truck travel  patterns, sometimes includ- ing cargo. Improved tracking  of congestion through the use  of roadway surveillance data  leads to improved assess- ments of intermodal access. (continued on next page)

21 Safety planners and  administrators Safety management  systems Areawide safety moni- toring; studies of  highway and vehicle  safety relationships Exposure (typically VMT)  derived from short-duration  traffic and vehicle classifica- tion counts; traffic conditions  under which crashes  occurred must be inferred.  Police investigations, the  basis for most crash data  sets, performed manually. Roadway surveillance data pro- vide continuous volume  counts, truck percentages,  and speeds, leading to  improved exposure estima- tion and measurement of the  actual traffic conditions for  crash studies. ITS technolo- gies also offer the possibility  of automating field collection  of crash data by police offi- cers [e.g., Global Positioning  System (GPS) for location]. Maintenance  personnel Pavement and bridge  management Historical and fore- casted loadings Volumes, vehicle classifica- tions, and vehicle weights  derived from short-duration  counts (limited number of  continuously operating sites). Roadway surveillance data pro- vide continuous volume  counts, vehicle classifications,  and vehicle weights, making  more accurate loading data  and growth forecasts  available. Commercial vehicle  enforcement  personnel Enforcement of com- mercial vehicle  regulations Hazardous material  inspections and emer- gency response Extremely limited. Electronic credentialing and AVI  allow tracking of hazardous  material flows, allowing bet- ter deployment of inspection  and response personnel. Emergency manage- ment services  (local police, fire,  and emergency  medical) Incident management Emergency response Extremely limited. Electronic credentialing and AVI  allow tracking of truck flows  and high-incident locations,  allowing better deployment of  response personnel. Transportation  researchers Model development Travel behavior models Mostly rely on infrequent and  costly surveys: stated prefer- ence and some travel diary  efforts (revealed preference). Traveler response to system  conditions can be measured  through system detectors,  probe vehicles, or monitoring  in-vehicle and personal  device use. Travel diaries can  be embedded in these tech- nologies as well. Traffic flow models Detailed traffic data for model  development must be col- lected through special efforts. Roadway surveillance data pro- vide continuous volume  counts, densities, truck per- centages, and speeds at very  small time increments. GPS- instrumented vehicles can  provide second-by-second  performance characteristics  for microscopic model devel- opment and validation. Private-sector users Truck routing and  dispatching Congestion monitoring Current information on real-time  or near real-time congestion  is extremely limited. Roadway surveillance data and  probe vehicles can identify  existing congestion and can  be used to show historical  patterns of congestion by time  of day. Incident location and  status can be directly relayed. Information service  providers Trip planning Information on historical con- gestion patterns is extremely  limited. This information could  be used in developing pretrip  route and mode choices,  either alone or in combination  with real-time data. Source: Adapted from Margiotta (1998). Table 3.2. Needs of ADUS Stakeholders (continued) Collection and Use of Stakeholder Group Application Method or Function Current Data ITS-Generated Data

22 practice and to equivalent solutions available from ADUS systems. The table was compiled by Margiotta and published in Margiotta (1998). 3.2.2.2 Summary of FHWA ADUS Guidelines In summary, the FHWA has stressed the importance of addressing both the technical and institutional aspects of an ADUS system. The technical considerations have been widely studied and documented as a result of partnerships with TTI, ASTM, Iteris, and others. However, institutional concerns are not as well understood. For this reason, the FHWA has recently sponsored workshops, seminars, and research to exclusively deal with tailoring and promoting ADUS systems to transpor- tation planners and engineers. 3.2.3 Existing ADUS Systems This section presents a review of existing ADUS systems in the United States and other countries. The purpose of the literature review was to guide the devel- opment of the Archive. Because of the prolonged develop- ment and data procurement period of the L13A Archive, the current versions of the ADUS systems below may be signifi- cantly different from their descriptions. Nevertheless, the lit- erature review captures the features and concepts that were considered for the SHRP 2 L13A Archive. 3.2.3.1 PeMS, California The California Department of Transportation (Caltrans) Performance Measurement System (PeMS) was established in the early 2000s with the help of University of California, Berkeley’s Partners for Advanced Transportation Technology (PATH). The system was set up to process 30-sloop detector data from freeways across the entire California network. At the time PeMS was set up, it processed 2 GB of data per day (Choe et al. 2002). The data are published in real time through a web interface and stored for historical analysis. Traffic volume, speed, and occupancy data for freeways are archived in PeMS. Travel time data of some freeways are collected through electronic toll-tag collectors. Data can be accessed by selecting the entire length of freeway or section of freeway. More recently the state has begun adding arterial roads to the PeMS system. PeMS develops performance management information from fairly rudimentary and raw data (detector volumes and occupancies). Using the volumes and occupancies the PeMS system produces travel time estimates, time-space diagrams, count curves, and other graphic tools that can be used to understand and improve freeway operations. The combination of both the input (volumes) and perfor- mance data (such as speed or VMT) enables the creation of contour and across-space plots that can aid in determining the location of bottlenecks. This can be done by comparing the occupancy and count curves of two nearby detectors. When- ever a bottleneck forms, occupancy spikes and starts a wave of increased occupancy that moves upstream to other detectors. PeMS contains algorithms to automatically identify, classify, and report bottlenecks to the graphical user interface (GUI), as shown in Figure 3.6. Other potential uses of PeMS include level-of-service charac- terization, incident impacts, and anything that requires high- resolution speed data. Furthermore, PeMS has been used to calibrate simulation models and test new traffic flow theories by researchers throughout the state of California. The strength of PeMS lies in its ability to combine multiple data sources into an easy-to-use interface that produces useful visualizations of the data. Some of the larger data sources are • Loop detectors; • Census detector stations; • Weigh-in-motion stations; • Toll-tags; • Bluetooth sensors; • Incident logs from the California Highway Patrol; and • Transit schedules. More detail on these sources can be found in Petty and Barkley (2011). PeMS data are easily accessible. The only requirement in setting up a user account is indicating why one needs the data. Users only need to apply for an account once at http://pems .dot.ca.gov/. 3.2.3.2 PORTAL, Portland The Portland State University (PSU) ITS laboratory is archiving Oregon Department of Transportation (ODOT) freeway inductive loop detector data in a systematic way. The data are streamed to the server located at PSU and then archived in a RDBMS. This system is known as the Portland Transportation Archive Listing (PORTAL). The system has been in operation since July 2004, streaming data from the ODOT Traffic Moni- toring Operations Center to PSU (Bertini et al. 2005). The PORTAL system focuses mainly on freeway data. One of the design goals of the system has been to adhere to the national ITS architecture. The PORTAL system includes a detailed metadata repository and maintains metaschema for all data entering the system, including information generated in the field at the controller and in the traffic management center.

23 The PORTAL system covers the Portland-Vancouver met- ropolitan region. The current system (as of the time when the literature review was conducted) archives a wide variety of transportation-related data including the freeway loop detector data from the Portland-Vancouver metropolitan region, weather data, incident data, transit data, and freight data. Information on available data can be obtained from the PORTAL website (http://demo.portal.its.pdx.edu/Portal/ index.php/systems). The system is very flexible and provides various user- configurable parameters. Among the options provided are the following: • Systems. PORTAL provides a color-coded speed display of the Portland-Vancouver system. The user has the option to choose date and peak periods. • Highways. This option displays volume and speed data for freeways. Users can choose any freeway within the system coverage area. • Station. By choosing this option, the user can view differ- ent counting stations within the coverage area. Users can choose a specific detector station to obtain speed, travel time, number of lanes, and mile post information. • Arterial. Volume and speed information can be obtained by selecting date and time ranges. The resolution of these data is available in 5-min, 15-min, 1-h, monthly, and yearly increments. • Bluetooth. Travel time data are available at some selected locations. Users have the ability to select time and date for data, and start and end stations of the road segments. • Transit. An interactive map displays different attributes in the PORTAL coverage area. These include transit service areas, transit stops, routes, and boarding frequency. • Downloads. Speed, volume, and occupancy data can be downloaded from within the user interface by selecting start and end date. These data can be easily accessed using the PORTAL website. • FHWA data. The data coverage includes freeway transit and arterial data for the I-205 corridor in Portland, Oregon. The selected corridor is approximately 10-mi long. The data set contains freeway loop detector data, weather data, incident data, arterial counts, signal phasing data, limited Bluetooth travel time data, and bus and light rail data. • Data quality. Information on detector health is provided. These include offline detectors, communication errors, dam- aged detectors, and configuration errors. Figure 3.6. PeMS bottleneck identification.

24 3.2.3.3 CATT Lab, Maryland The University of Maryland Center for Advanced Transporta- tion Technology Laboratory (CATT Lab) builds, operates, and maintains the transportation data archive for the Washington metropolitan area and other states (University of Maryland 2012). The system is called the Regional Integrated Transpor- tation Information System (RITIS). The data include volume, speed, incidents, weather, and system delays, which are col- lected by various state and local transportation agencies and transmitted to the CATT Lab’s system. RITIS then parses, fuses, and loads the data into databases for analysis, redistri- bution, and display in near real time. CATT archives the majority of the data for use in other applications including real-time simulation, travel time estimation, traffic mapping and visualization applications, research, and planning. The RITIS database can be accessed at https://www.ritis .org/. Users need an account to access certain data. A sample of one of the archived incident database application inter- faces is shown in Figure 3.7. 3.2.3.4 Center for Transportation Studies, Virginia The ADMS Virginia project is hosted at the Smart Travel Lab- oratory, a joint facility of the Virginia Department of Trans- portation and the University of Virginia. ADMS Virginia is a development effort to archive ITS data for transportation applications. The web-based system uses historical traffic, incident, and weather data to provide traffic data in a variety of formats to users of the system. The website (http://adms.vdot.virginia.gov/ADMSVirginia) is integrated with Google Maps to produce graphical displays of color-coded travel patterns as shown in Figure 3.8. To access the ADMS users need to have an account. The account can be requested online via e-mail at the project website. 3.2.3.5 AITVS, Virginia Virginia Polytechnic Institute and State University’s Spatial Data Management Lab has developed the Advanced Inter active Figure 3.7. Screenshot showing data selection options in RITIS.

25 Traffic Visualization System (AITVS) that provides real-time highway monitoring capabilities via comprehensive visualiza- tion components. AITVS provides a rich set of multi dimensional visual components for real-time and historical traffic data analy ses (Lu et al. 2006). The AITVS provides six distinct visualization components that comprehensively cover the various performance metrics of a road system. These visualization components are time plot, date plot, highway station plot, highway station versus time plot, highway stations versus day-of-the-week plot, and time versus day-of-the-week plot (Lu et al. 2006). The speed profile, volume, and occupancy plot, shown in Figure 3.9, can be obtained by selecting pairs of stations. 3.2.3.6 Houston TranStar, Texas The Houston TranStar consortium is a partnership of four government agencies: Texas Department of Transporta- tion, Harris County, the Metropolitan Transit Authority of Harris County, and the City of Houston. TranStar collects real-time data covering a total of 770 directional freeway miles. Traffic data collection in TranStar relies mostly on automatic vehicle identification (AVI) information. In addition, closed-circuit television (CCTV) cameras cover 335 freeway centerline miles. TranStar has been archiving 15-min aggregated AVI travel time and speed data since October 1993. In addition, the database has freeway inci- dent data dating back to May 1996, emergency road closure data from August 2001, and construction lane closure data from May 2002. Houston TranStar provides information for multiagency operations and management of the region’s transporta- tion system, motorists, and traffic management operators in Houston (Houston TranStar Consortium 2010). Real-time traffic information from the database is displayed in a map interface at the TranStar website (http://traffic.houstontran star.org) as shown in Figure 3.10. Archived speed data from various freeway segments can be compared in different time horizons. 3.2.3.7 TDAD, Washington State The Washington State ADUS project, named Traffic Data Acquisition and Distribution (TDAD), was set up to provide traffic data over a wide area over extended periods of time (Dailey et al. 2002). TDAD makes its historical data available online. TDAD obtains its data from loop detectors across the state, which report volume and occupancy at 20-s intervals. TDAD depends on the state’s ITS Backbone Project to obtain the Figure 3.8. Screenshot from University of Virginia, Smart Travel Lab.

26 data and for operational support. The Backbone Project also serves transit and traveler information programs within Washington State DOT (WSDOT) (Dailey 2003). To access TDAD data, individuals outside WSDOT must download a toolkit, the Self-Describing Data interface and software library. Several groups—including Iteris, Wave tronix, HERE (formerly NAVTEQ), and AT&T—have developed applications to continuously download, process, and reuse the WSDOT data. Unfortunately, according to the University of Washington’s ITS website, the funding for the data feed has not been renewed; thus, the ADUS is unavailable at the moment. This is an example of what can happen if adequate funding is not set aside for operations and maintenance when an ADUS system is initially designed. Figure 3.9. Sample plots of volume, speed, and occupancy from AITVS. Figure 3.10. Houston TranStar traffic map.

27 3.2.3.8 Minnesota DOT RTMC The Minnesota Department of Transportation (MnDOT) built the original transportation management center in 1972 to man- age the freeway system in the Twin Cities metropolitan area. The primary purpose of the facility is to integrate MnDOT’s Metro District Maintenance Dispatch and MnDOT’s traffic operations with the Minnesota Department of Public Safety’s State Patrol Dispatch in a unified communications center. The Regional Transportation Management Center (RTMC) now monitors 340 mi of metro-area freeway with 4,500 loop detec- tors and 450 CCTV cameras (Minnesota Department of Trans- portation 2012). The RTMC also covers 85 electronic message signs in the region. The RTMC can be accessed at http://www .dot.state.mn.us/rtmc. MnDOT has developed interface software that transmits a minimum 30-s interval loop detector count and other traffic data from the site to the server located at the RTMC. The data are continuously archived, and more than 6 years are avail- able for download. Lane-by-lane traffic data including vol- ume, speed, occupancy, headway, and density are collected from the permanent loop detectors. The data are available to the public. MnDOT designed the system to provide data through the Internet. An online relationship was established between the data production capability of the Data Center at the Univer- sity of Minnesota Duluth’s Transportation Research Data Lab (TDRL) and the servers at MnDOT. This concept is shown in Figure 3.11 (Kwon 2004). Data can be written to, or read from, the blackboard server by the TDRL Data Center or MnDOT servers. 3.2.3.9 STEWARD Database, Florida The Florida statewide ITS architecture contains an archived data management subsystem known as the Statewide Trans- portation Engineering Warehouse for Archived Regional Data (STEWARD). STEWARD collects and stores statewide data, including daily summaries of traffic volumes, speeds, occupancies, and travel times obtained from SunGuide Transportation Management Centers (TMC) in Florida. The summaries are accumulated over periods of 5 min, 15 min, and 60 min. STEWARD can be accessed at http://cce-trc- cdwserv.ce.ufl.edu/steward/. Several options are available for users to screen the data they want from STEWARD. Interactive maps for all detectors within District 1 to District 7 of the Florida DOT can be dis- played in the STEWARD system. A sample of TMC coverage data selected for download is shown in Figure 3.12. STEWARD has been designed to appeal to TMC managers, district ITS program managers, and traffic engineers. Some of the useful functions built into STEWARD to make it appealing to managers include the following (Courage and Lee 2008): • Identify detector malfunctions; • Provide calibration guidance for detectors; • Perform quality assessment data reliability tests on data; • Provide daily performance measures for system, and state- wide performance measures; • Facilitate periodic reporting requirements; and • Provide data for research and special studies. The existing STEWARD database contains traffic sensor subsystem data from all TMC stations over a 24-h period. STEWARD serves as a central data warehouse for SunGuide data. The STEWARD output can be used for a variety of purposes. Separate processes involved in the operation of STEWARD are shown in Figure 3.13 (Courage and Lee 2009). 3.2.3.10 The Regiolab-Delft, the Netherlands The Regiolab Project is a collaborative project between pub- lic agencies, research institutes, and industry partners in the Nether lands. The project involves collecting real-time traffic monitoring data from all relevant roads in the region, archiving the data, and developing services and tools that make it easier for researchers to use the data for regional analysis. The public agencies involved in the project are the municipality of Delft, the Province Zuid-Holland, and Rijkswaterstaat. Delft Univer- sity of Technology, TRAIL Research School, and Connekt insti- tutes are the researchers; and the industry partners are Vialis and Siemens. According to the project website (http://www.regiolab- delft.nl), the data being archived consist mainly of minute data from inductive loop detectors and variable message signs on the national highways in the province of South Hol- land. Traffic data are collected from detectors on approxi- mately every 500-m interval on motorways. In addition to the loop detectors, local data from traffic control systems and Note: The arrow lines indicate Internet data connections and the sequence of data flow. Figure 3.11. System-level concept of data automation (MnDOT).

28 cameras in the municipality of Delft are also being archived. Sample camera locations are shown in Figure 3.14. The data archive is being stored and managed using the Drupal content management system. The traffic data are available for download to registered researchers from the Regiolab website. The website provides a Matlab Toolbox (the program is written in Matlab software) and Structured Query Language (SQL) and other database software tools for extracting data from the archive. The regional traffic data archive is capable of analyzing traffic flows during the day and can be used to estimate travel times and predict future conditions in the network. Sample charts and visualization tools available from the archive are shown in Figure 3.15. 3.2.3.11 Traffic Data Clearinghouse, Japan The Kuwahara Laboratory at the University of Tokyo has teamed up with the Delft University of Technology to create a traffic data clearinghouse for researchers (Traffic Data Clearinghouse 2012). Currently there are two key data sets on the project website: the Tokyo Metropolitan Expressway and the data from the Regiolab-Delft project. The aim is to attract more partners and researchers to share their data sets to improve the quantity and quality of traffic data available for traffic modeling. The website can be accessed at http:// trafficdata.iis.u-tokyo.ac.jp/index.php. A map of Regiolab in the Delft region from the site is shown in Figure 3.16. 3.2.3.12 Traffic England, England Traffic England provides live traffic information about the motorways and major all-purpose roads in England. The ser- vice is provided by the National Traffic Operations Center of the Highway Agency. Traffic data, traffic volume, speed, and travel time are collected from the motorways and major high- ways using sensors and readers (i.e., inductive loops and auto- matic license plate recognition cameras). The information is updated continuously. Traffic England updates real-time traffic information by dis- playing speed and delays, roadway closures, major dis ruptions, incidents and congestion, adverse weather, and roadside Figure 3.12. STEWARD Florida database.

29 Figure 3.14. Map of camera locations from Regiolab’s website. Figure 3.13. STEWARD overview. Figure 3.15. Sample chart and contour graph from Regiolab-Delft project website.

30 accidents, vehicle breakdowns, traffic signal status, current electronic road pricing rates, and work zones (Figure 3.18). The system can be accessed at http://interactivemap.onemotoring .com.sg/mapapp/index.html. LTA provides real-time traffic updates by displaying speed, accidents, breakdowns, roadwork, other incidents, and traffic signals down. The purpose of this service is to optimize the road network efficiency and improve road safety for the ben- efits of all road users. LTA has deployed various ITS compo- nents as a part of advanced traffic management systems. The collected traffic data are aggregated, integrated, and dissemi- nated at the ITS Center control room for traffic monitoring and incident management. 3.3 Online archiving Systems The L13A team reviewed transportation-related content management systems and existing online archiving systems. This section summarizes the results of the review. 3.3.1 Archived Data Levels To understand the context of services other data archives provide, the L13A team looked into the five categories of information that were introduced by NASA’s Committee on Data Management, Archiving, and Computing (CODMAC) Figure 3.16. Map of Regiolab-Delft. Figure 3.17. Traffic information map of road network, Traffic England. message signs. The purpose of this service is to help the motor- ing public make informed decisions about their journey. Sam- ple real-time information from the Traffic England website (http://www.trafficengland.com/) is shown in Figure 3.17. 3.2.3.13 Land Transport Authority, Singapore Land Transport Authority (LTA), Singapore, developed a sys- tem that provides real-time traffic information including

31 Archived Data Type Ontology, a well-established standard for the handling of archive data. Table 3.3 summarizes the archive data levels suggested by CODMAC. 3.3.2 Document Management Systems and Content Management Systems Document management systems (DMS) and content man- agement systems (CMS) provide much of the technological foundation for organizing, storing, controlling, and distrib- uting data and results in a controlled environment. Both types of systems usually provide storage, version control, and distribution of electronic documents. CMSs typically provide more functionality, including publishing and editing of con- tent. Both systems often include a centralized interface or portal through which all site content can be accessed. DMS and CMS form a solid foundation for the handling of documents and web content. However, handling of data may require additional technologies. Data sets may include millions of individual records that may be related in multiple ways. One user’s data needs may be vastly different from any others’ needs. Storage of data in a fashion that supports individual user requirements implies that data are organized, catalogued, and stored such that they can be accessed according to what data are required by a given user. These are database or data ware- housing functions. 3.3.3 Transportation-Related Document Management Systems The project team reviewed numerous examples of transportation-focused document management systems, such as 1. The National Transportation Library and TRB’s TRID, including the Transportation Research Information Ser- vices database (http://www.trb.org/InformationServices/ InformationServices.aspx) and the Organisation for Eco- nomic Co-operation and Development’s Joint Transpor- tation Research Centre’s International Transportation Research Documentation database (http://www.inter nationaltransportforum.org/jtrc/itrd/); and 2. The National Transit Agency database (http://www.ntd program.gov/ntdprogram/). Figure 3.18. Traffic information map of road network in Singapore.

32 Table 3.3. Archived Data Levels Level Description Example Formats Level 0 Raw data, including raw traffic  data such as volumes and  speeds Raw digital data and  imagery Level 1 Georeferenced data, such as  speed associated with a  specific route and direction Individual records,  processed images Level 2 Derived variables at the same  resolution and location as  the Level 1 source data from  which the variables are  derived Individual records,  processed images Level 3 Variables mapped on space- time grid scales Imagery depicting the  changes in time  and/or space of  variables Level 4 Model output or results from  analyses of lower-level data  (i.e., variables derived from  multiple measurements) Model output files Level 5 Reports and presentations  using lower-level data Abstracts, scientific  papers, and presen- tations, typically in  PDF, Word, or PPT  format Early examples of transportation document/data archives were relatively simple websites providing access to documents and data, such as the University of California, Berkeley, Free- way Service Patrol (FSP) project data archive. It should be noted that many of the reviewed transporta- tion archive systems are primarily focused on Level 5 infor- mation. They provide information about transportation projects and access to reports and documentation but not raw data (FSP is a notable exception to this, providing Level 0 data). By contrast, weather and social science archives focus more on providing the raw data in a form that researchers can use. 3.3.4 Comparison of Existing Online Archives Existing online data archives were surveyed for their rele- vance to the L13 project. Table 3.4 lists a variety of climate, weather, social science, and transportation-related data archives. The nontransportation data archives provide the types of services (to varying degrees) that are envisioned under the L13 project. Transportation archives are noted for their domain relevance. While many of these archives are referenced in Table 3.4, it should be noted that the Research Data Exchange (RDE) is very similar in scope to the L13A data archive. The RDE includes real-time data distribution and some additional capabilities regarding the management of data environ- ments but is otherwise similar. At the time of this writing, the RDE was in development by FHWA. Lessons learned from the RDE project were not available because it was in the early stages of development; however, what is known is that the RDE will use a content management system such as Alfresco or Nuxeo and that it will include database and/or data warehousing functionality as required, depending on the characteristics of the data sets provided by the Connected Vehicle program. Some data archives allow users to view data online using visualization tools. This is most relevant for data that can be organized geographically and overlaid on a map. Such visual- ization can enable rudimentary analysis and help the user deter- mine if the data set may be of value. One large-scale example of this visualization is the one provided through the Earth Observ- ing System Data and Information System (EOSDIS), which can be accessed at https://earthdata.nasa.gov/. EOSDIS is several orders of magnitude larger in size than the L13 project envisages, but other than archive size and dis- tribution rate it is remarkably similar to the L13 project in many ways. It includes collaborative information, project descriptions, data organized as individual files, and visualiza- tion of some of the data without download. Similar visualization can be applied to traffic data, because such data are naturally organized geographically. Many trans- portation management systems use some kind of visualiza- tion to make traffic data easier to follow; a few, such as PeMS, maintain historical data online to permit visual analysis and trending. 3.3.4.1 Commercially Available Archiving Technologies Technologies reviewed to help implement the L13A Archive— including content management, web services, and file distri- bution tools—are summarized in Table 3.5. These technologies were sorted roughly in order of priority. The L13A team assessed feasibility of the listed technologies before starting the development phase (Phase 3). The objective of this assessment was to identify the best archiving or content management technology that • Would provide the core functionality of the Archive; and • Could be customized for delivery of special features like visualization. In Table 3.5, the appropriateness value reflects the project team’s assessment on how likely this system could be used in the Archive.

33 Table 3.4. Sample of Existing Online Data Archives Focused on Research Archive Domain Size Increase Data Levels Real-Time/ Near-Real-Time? Visualization? Collaboration? Search? Notes National Environ- mental Satellite, Data, and Informa- tion Service (NESDIS) http://lwf.ncdc .noaa.gov/ oa/climate/ climatedata.html Climate and  weather 300 TB  (digital) 80 TB/year 1–4 Some data are  available NRT.  NRT varies from  minutes to  weeks, depend- ing on the data. Maps with configu- rable layers No Queries entire site  content Privately hosted  data centers,  including digi- tal and non- digital media Clarus System http://www.its.dot .gov/clarus/ Weather 400 GB 80 GB/year 0–1 Hourly files Map interface link- ing to data and  quality flags; no  visualization No No Earth Observing System Data and Information System (EOSDIS) http://earthdata .nasa.gov Climate 4.8 PB 600 TB/year 0–4 Many data feeds  available in NRT  (minutes, hours) Varies by research  team; map inter- faces and layer  visualization Projects, standards,  and working groups Queries entire site  content; sepa- rate facilities for  searching  archives Privately hosted,  distributed  data archival  and distribu- tion facilities Data.gov http://www.data .gov/ Public data  across a wide  variety of  domains 50 GB 20 GB/year 0–5 None Depends on the  data set, but  much of the data  are viewable in a  visualization tool Yes, forums, blogs,  various RSS feeds Yes, across the  entire site or  subsections Uses Socrata Size based on  current stor- age of roughly  250,000 data  sets, each   set averaging   200 KB in   size. Rate of  increase based  on establish- ment in 2009. Simple Online Data Archive for Popu- lation Studies (SodaPop) http:// sodapop.pop .psu.edu Social Sciences >500 GB a 0–4 None None a Queries entire site  content; sepa- rate facilities for  searching  archives (continued on next page)

34 UCLA Social Science Data Archive http://www.sscnet .ucla.edu/issr/da/ Social Sciences >500 GB a 0–5 None None News posting, inte- gration with Twitter  and Facebook Search for data  only Heavily hyper- linked between  multiple  universities U.S. Census Bureau http://factfinder2 .census.gov/faces/ nav/jsf/pages/ index.xhtml Social Sciences >250 GB a 4–5 None Many data sets can  be displayed on a  map. Feedback only Very detailed and  powerful search  engines, global  site search as  well as detailed  data search Endeca (Oracle)– powered  search Bureau of Transpor- tation Statistics http://www.bts .gov/ Transportation a a 3–5 None Some data sets  have predrawn  visual summaries. None Global site search Next Generation Simulation Community http://ngsim- community.org/ Transportation 70 GB a 0–5 None None User information and  forums Global site search PORTAL ITS data archive http://portal.its .pdx.edu Transportation >60 GB ~10 GB/year 0–4 Current traffic data  are real time, all  available through  visualization. No  external feeds. Extensive map and  performance  measure-based  plots News, Facebook  integration Neither global nor  data search. All  data is accessed  through a variety  of intuitive  interfaces. National Transporta- tion Library (NTL) http://ntl.bts.gov/ Transportation a a 5 None None Interaction with librar- ian only Search documents Caltrans Perfor- mance Measure- ment System (PeMS) http://pems.dot .ca.gov/ Transportation 11 TB 1 TB/year 0–4 Real-time data are  included in the  archive but not  distributed. Map-based Map-based presenta- tion of traditional  traffic measures  and incidents Global site search Connected Vehicle Research Data Exchange (RDE) https://www.its- rde.net/home Transportation 2 TB Projected  500 GB/ year 0–4 As available from  external provid- ers, will distrib- ute real-time  feeds None Forums, feedback to  operators Global site, real  time, and  archive data by  metadata Planning to use  Alfresco or  Nuxeo tech- nologies;   prototype uses  Drupal Ongoing project Note: a = Undetermined. Table 3.4. Sample of Existing Online Data Archives Focused on Research (continued) Archive Domain Size Increase Data Levels Real-Time/ Near-Real-Time? Visualization? Collaboration? Search? Notes

35 Table 3.5. Data Archival Technologies Tool Application Appropriateness (scale of 1 to 10, 10 being highest) Notes WordPress Content management 10 WordPress provides a flexible environment for the develop- ers to easily modify the UI. Alfresco Enterprise Content Management (ECM)   8 Alfresco and Nuxeo are considered affordable. Drupal is  capable but smaller scale. A detailed analysis of these  tools should be performed to select one.Nuxeo Enterprise Content Management (ECM)   8 DSpace Data archive management   8 Capabilities of DSpace are close to those of Alfresco.  Alfresco allows for content management functionality  and thus flexible processing of the uploaded content,  which is important for special treatment of data sets  that are to be visualized. Socrata Service   6 Socrata provides a full range of capabilities but is not  focused on archiving large data sets. OpenKM Document management   5 OpenKM provides document management, not content  management, but could be used with additional work. Drupal Content management   7 Drupal and CKAN would have to be used together. CKAN Portal   7 Cyn.in Content Management   5 Cyn.in would need additional work to manage metadata. S4PA File management   3 S4PA would require web portal, version management, and  other work; however, it is fast and simple. OpenDocMan Document management   2 OpenDocMan is not likely to be used in the Archive. KnowledgeTree Document management   1 KnowledgeTree is not likely to be used in the Archive. Fedora-Commons Data repository   1 Fedora-Commons is not likely to be used in the Archive. EPrints Electronic publishing   1 EPrints is not likely to be used in the Archive. Nesstar Data cataloging system   6 Nesstar is a system for data publishing and visualization.  Nesstar does not have built-in collaboration. Evaluators  did not identify how to integrate third-party tools. 3.4 Commercially available archiving technologies 3.4.1 Overview of Applicable Supporting Technologies The team reviewed each of the technologies in Table 3.5. They provide some components of content management, docu- ment management, and web portal functionality. 3.4.1.1 WordPress WordPress is an open source blogging and content man- agement platform based on PHP and MySQL that runs on a web hosting service. This system has been used widely by many websites. It has a web template system that facilitates UI task building. For more information on WordPress, see Section 3.5.4, Section 7.2, and the website http://www .wordpress.org. 3.4.1.2 Alfresco Alfresco is a free ECM system written in Java and is distributed in two formats: 1. Alfresco Community Edition, Lesser General Public License, licensed open source; and 2. Alfresco Enterprise Edition, commercially licensed open source. Alfresco’s design is geared toward a high degree of modu- larity and scalable performance. While the system is free to obtain, an annual subscription is needed for certified patches, maintenance releases, and technical support. Therefore, there were some challenges in customizing the front end. Alfresco has effective content management functionality, which allows for the flexible processing of the uploaded content that is important for special treatment of data sets that are to be visualized. (See http://www.alfresco.com.)

36 3.4.1.3 Nuxeo Nuxeo is a free ECM system written in Python that includes functionality, such as document management, social collabo- ration, case management, and digital asset management capa- bilities. Nuxeo is similar in scope, scale, and cost to Alfresco and was considered a viable alternative. There were some challenges in customizing the front end. (See http://www .nuxeo.com.) 3.4.1.4 Socrata Socrata is a cloud-based data publication and collaboration service. Socrata is not a component used to build a service, rather it is the service. Socrata includes web-based manage- ment, publication, measurement, and some visualization tools. While Socrata does include a free version, the L13A project required functionality that was only available in the paid versions, including custom metadata. Current pricing plans put L13A beyond the most expensive tier based on the amount of storage required (Socrata’s top tier offers only 2 TB). Using Socrata might still be practical but may require discussion with the service’s sales staff. (See http://www .socrata.com.) 3.4.1.5 OpenKM OpenKM is a free Java-based DMS providing web interface for managing files. It is distributed under GNU General Public License (GPL) v.2. OpenKM could be used to sup- port L13A but would require additional development work beyond an Alfresco- or Nuxeo-based solution. (See www .openkm.com.) 3.4.1.6 Drupal Drupal is an open source content management system. It provides database cataloging and storing of data sets, web front-end development, and an application programming interface (API). It is distributed under GNU GPL v.3. It is less extensive than Alfresco and Nuxeo but includes many of the features needed for L13A. It is a viable alternative, particu- larly if paired with a data portal such as CKAN (see below). (See http://www.drupal.org.) 3.4.1.7 CKAN CKAN is an open source data portal system. It provides data- base cataloging and storing of data sets, web front-end devel- opment, and an API. It is distributed under GNU GPL v.3. CKAN could be a viable alternative for L13A if paired with a DMS such as Drupal. (See http://www.ckan.org.) 3.4.1.8 Cyn.in Cynapse’s digital asset management solution is a module of the Cyn.in ECM offering that enables it to leverage a number of inherent features already provided as part of the wider platform. Based on the project team’s brief investigation of the promotional literature, support for embedded metadata is missing in this system. However, workflow and transcoding facilities as well as desktop clients are available. Cyn.in is writ- ten in Python and Zope. It also uses the Plone open source framework. It is distributed under GPL v.3. (See http://www .cynapse.com.) 3.4.1.9 S4PA The Simple, Scalable, Script-Based Science Product Archive (S4PA) is a data archive and distribution system distributed under the National Aeronautics and Space Administration (NASA) open source agreement. It includes a data acquisition module suitable for real-time ingestion and a data distribu- tion module that provides data files to users. Data are man- aged in a tightly organized UNIX file structure. Data storage and distribution are file-based. The S4PA kernel includes subscription services. Data distribution and acquisition use FTP or sFTP (Secure FTP). S4PA does not include its own web-based front end or any collaboration tools. NASA uses an online visualization tool called Giovanni (http://disc.sci.gsfc.nasa.gov/giovanni/over view/index.html) to allow researchers to visualize and examine aspects of data without having to download entire data sets. Use of S4PA would require the development of a data por- tal front end or integration with another tool such as CKAN, the feasibility of which was not clear. (See http://disc.sci.gsfc .nasa.gov/additional/techlab/s4pa.) 3.4.1.10 OpenDocMan OpenDocMan is a free, open source web-based PHP DMS distributed under GPL. It is not a CMS—it only allows users to upload files with limited metadata description; tag them; maintain revision history; classify documents by category, department, or author; and search by category, department, or author. OpenDocMan runs with PHP 5, MySQL 5, and Apache HTTP server. The system has some simple user management. The team decided OpenDocMan did not have sufficient capabilities for L13A. (See http://www.opendocman.com.) 3.4.1.11 KnowledgeTree KnowledgeTree provides a cloud-based service for document management and workflow. Its representational state transfer

37 (REST) and Simple Object Access Protocol (SOAP) APIs allow integration into third-party websites. This solution did not have sufficient capabilities for L13A as it was not highly customizable and does not handle user metadata. (See http:// www.knowledgetree.com.) 3.4.1.12 Fedora-Commons Fedora defines a set of abstractions for expressing digital objects, asserting relationships among digital objects, and linking services to digital objects. The Fedora Repository Project implements the Fedora abstractions in an open source software system under Apache license. Fedora provides a core repository service (exposed as web-based services with well- defined APIs). Fedora is not an out-of-the-box product that can be installed and run as an application. It is a repository framework, which requires an extensive software develop- ment to be able to run simple examples. Fedora lacks UI; a third-party tool such as DSpace would have to be integrated to provide a collaboration engine, such as user forums, and community/user group management. (See http://www.fedora- commons.org.) 3.4.1.13 EPrints EPrints is open source software under GPL v.3 and Lesser General Public License (LGPL) v.3 for building open access repositories that provide UI as well as a repository engine. Although EPrints allows metadata and UI customization, its focus is on publishing collections of online journals. Thus, it is mostly suitable for document-type content. EPrints does not provide a collaboration engine and does not have detailed instructions about integration with third-party tools. (See http://www.eprints.org.) 3.4.1.14 Nesstar Nesstar is a free software system designed for online publica- tion and dissemination of data and metadata. The system also includes data analysis and visualization tools, including maps. Survey data, multidimensional tables, and text documents are all supported; and the system software allows users to search, browse, and visualize the data online. Nesstar has limitations in UI customization. All Nesstar catalogs on the web look the same. The deployment of Nesstar requires three products: (1) Publisher, a tool for uploading the data and preparing it for publication; (2) Server, a data repository; and (3) Web- View, a UI that allows searching, browsing, and visualization. Nesstar does not have built-in collaboration. Evaluators could not determine how to integrate third-party tools and thus data upload capability for collaborating users. (See http://www.nesstar.com.) 3.4.1.15 DSpace DSpace is open source repository software distributed under BSD license for storing digital content. It manages digital files of any format. DSpace allows for customization of metadata, as well as the user interface. The software is continually expanded and improved by a community of developers. Its capabilities are close to those of Alfresco. DSpace focuses on the approval of content rather than wider workflow customization. (See http://www.dspace.org.) 3.5 Summary of the preparatory analysis As part of the project, the team reviewed the L13 report, exist- ing ADUS systems, and past work on data archiving systems. The major goal of the preparatory analysis was to select an appropriate core CMS engine with which the Archive system would be built. This section summarizes the outcomes of the review effort and goes over the factors that the project team considered for choosing WordPress as the core CMS engine. 3.5.1 L13 Report Review The L13 final report mostly described the system requirements and proposed a web-based solution, using cloud-computing services and COTS software (Alternative 3). It also estimated the cost of this solution at $5,530,132 over 25 years. Based on the L13 report, the cost of implementation would be $173,425 per year. The report did not provide any system design other than the high-level concept shown in Figure 3.5. In addition, it did not specify any particular technology for a CMS, although it recommended COTS over open source and in-house development. The project team generally agreed with the analysis per- formed by the L13 researchers except for their deemphasis on high-level data visualization and their 70-TB storage require- ment. The L13A team concluded that including a high-level visualization tool would provide both a flexible way for users to view objects and a standardized way for visualizing and aggregating objects. The project team’s preliminary assess- ment of SHRP 2 Reliability artifacts confirmed that the pro- posed 70 TB storage requirement seemed excessive. Additional details on the L13 report can be found in Section 3.1. 3.5.2 Archived Data User Service Analysis 3.5.2.1 Federal ADUS Guidelines The L13A Archive has more diverse and unstructured con- tent than a typical ADUS archive. However, there were still

38 lessons that could be drawn from the review of the ADUS guidelines: • Ensuring that institutional issues like privacy concerns, liability, and confidentiality of privately collected data were taken care of in the data provided by SHRP 2 project teams; and • Incorporating training and outreach in the project. The key to successful outreach will be to show that ADUS sys- tems help perform common tasks faster and more easily and accurately. 3.5.2.2 Summary of ADUS Systems Other than the Washington State TDAD database, most of the ADUS systems reviewed have been successful in engaging users even beyond the transportation community. A key ele- ment in engaging users has been the incorporation of analysis tools and map-based displays (which were included in the L13A Archive). The L13A team noted that the University of Maryland was able to use an iterative user engagement process, as proposed for SHRP 2 L13A, in the development of its ADUS. This pro- cess helped the university develop a final product that met the needs of the target audience. The project team also learned that all the state ADUS systems included quality measures to ensure a high level of data accuracy and integrity. An overview of existing ADUS systems can be found in Section 3.2.3. 3.5.3 Online Archiving Systems Analysis The data archives surveyed have a number of features in com- mon that appeared successful and pertained to the L13A Archive: 1. Comprehensive site search that allows the user to query across all site content aside from data archives. This makes it easy to find information about how to use the site and to collaborate with other users. 2. Data archive search by any and all available metadata. This is one of the primary tools that users can use to identify data that may be of interest to them. 3. Data visualization to help users grasp the potential value and applicability of data sets. Many of the archives iden- tified here lack visualization. While they do serve large communities and provide much information, the lack of visualization is a barrier to use; it makes initial investigation of these archives more difficult. It is not clear, but is conceiv- able, that the data are similarly obfuscated to the ostensible users. By contrast, the EOSDIS systems integrate visualiza- tion with search functionality, which provides convenient data preview and engages the user. If practical and afford- able, inclusion of some visualization is desirable. 4. Provision of system performance characteristics, so that contributors can see how their data are being used and thus quantify the benefits of sharing their data. 5. Collaboration tools with feedback mechanisms, such that researchers can provide information about their use of data sets to other researchers. Constructive criticism can yield more useful data in the future, foster additional col- laboration, and encourage use of the Archive. 6. Feedback on archived artifacts. This feature is similar to the previous point but includes a notion of quality to entice or discourage (as appropriate) use of data. Without an understanding of data quality it is hard to determine with confidence how seriously any research should be taken. 7. Following best practices in clean and simple web design. Some of the studied data archives have been around for a long time and have varying degrees of complexity and artistic standards applied to their designs. 3.5.4 Commercially Available Archiving Technologies Analysis The project team considered the following technologies as potential candidates for implementing L13A Archive data: 1. WordPress, 2. Socrata, and 3. Alfresco. The team ruled out the possibility of using Socrata after a cost analysis. The team then analyzed the WordPress and Alfresco systems by building small prototypes. The team tested functionalities and features provided by each platform to check which fit the Archive needs well. Features that the development team looked into included the flexibility of each platform for front-end and back-end customization, complexity of content management standards, XML content modeling, required learning curve, ability to access the data- base directly, risk and cost, and extensibility. In the end, the team decided to use WordPress.

Next: Chapter 4 - System and User Needs and Requirements »
Designing the Archive for SHRP 2 Reliability and Reliability-Related Data Get This Book
×
 Designing the Archive for SHRP 2 Reliability and Reliability-Related Data
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s second Strategic Highway Research Program (SHRP 2) Report S2-L13A-RW-1: Designing the Archive for SHRP 2 Reliability and Reliability-Related Data explores the development, testing, and deployment of the SHRP 2 Reliability Archive system. This archive is a repository that stores the data and information from SHRP 2 Reliability and Reliability-related projects.

This project also produced a document that outlines the high-level architecture of the SHRP 2 Archive system.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!