Skip to main content

Currently Skimming:


Pages 73-105

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 73...
... 73 This chapter contains a variety of tools in support of the Roadmap. They are as follows.
From page 74...
... 74 Guidebook for Managing Data from Emerging Technologies for Transportation • Data modeling and design -- Analysis, design, building, testing, and maintenance of data. • Data storage and operations -- Structured physical data assets storage deployment and management.
From page 75...
... Supporting Tools 75 In addition, the following four focus areas were included to expand the scope of data management to consider the full life cycle of big data (i.e., create, store, use, and share) : • Data collection -- Acquiring new data, either directly or through partnerships, in such a way that the value, completeness, and usability of the data are maximized without compromising privacy or security.
From page 76...
... Question Scoring Score Low Moderate High Considering the data collected by the agency, how relevant are the data to the current agency needs? Data collected are not relevant to current agency needs.
From page 77...
... Question Scoring Score Low Moderate High Has the agency referenced any existing data models or frameworks when designing their data architecture and processes? No data models or frameworks referenced.
From page 78...
... Question Scoring Score Low Moderate High How well is the agency's data organized? Data are organized haphazardly.
From page 79...
... Question Scoring Score Low Moderate High How much of the data collected by the agency (and that could be useful/relevant to the organization's needs) are stored?
From page 80...
... Question Scoring Score Low Moderate High Does the agency maintain a documented disaster recovery plan? No disaster recovery plan.
From page 81...
... Question Scoring Score Low Moderate High How is sensitive information/PII within the data stored? Sensitive information/PII is stored in plain text.
From page 82...
... Question Scoring Score Low Moderate High Is data quality monitored? Data quality is unknown.
From page 83...
... Question Scoring Score Low Moderate High Does the agency have full ownership of and unrestricted access to the data that they obtain from third parties? In most cases, the third party owns the data and severely restricts access and use.
From page 84...
... Question Scoring Score Low Moderate High Are data that the agency uses in a format that allows for easy integration into new systems? Most data cannot be integrated into new systems without significant effort.
From page 85...
... Question Scoring Score Low Moderate High Do stakeholders feel like the organization is getting its worth out of the data? Few stakeholders recognize the value of the data; data are seldom used to meet real business needs.
From page 86...
... Question Benchmarks Score Low Moderate High Do the data need to be moved to a separate system for analysis? Full migration to a separate system is necessary to perform any analysis.
From page 87...
... Question Scoring Score Low Moderate High Does the agency perform or oversee the development of customized data products? No customized data products are developed; out-of-the-box solutions are used exclusively.
From page 88...
... Table 16. Focus area: document and content management.
From page 89...
... 6 Reference data are data that define a set of permissible values to be used by other fields. Master data represent objects and all associated information about those objects that are relevant to the organization.
From page 90...
... 7 Metadata are data about data and are found in a metadata catalog, where users or programs can locate information about the data such as how large a file is, what format that file is in, when the file was last modified, what data types are stored within each column of a table, or whether a numeric value represents hours or minutes. Table 18.
From page 91...
... # of Low Scores # of Medium Scores # of High Scores Question Scoring Score Low Moderate High How open are the data sets within the agency? Data are unavailable to all but a few users (e.g., IT)
From page 92...
... 92 Guidebook for Managing Data from Emerging Technologies for Transportation Below is a list of recommendations to consider when developing a modern data governance approach, based on the work of The Next Generation of Data Governance by Dave Wells. Each recommendation has been divided into one of several aspects of data governance to consider during development (Wells 2017)
From page 93...
... Supporting Tools 93 – Focus on policies for privacy intensive, security sensitive, and compliance sensitive data. This will direct governance efforts to where they will have the most impact.
From page 94...
... 94 Guidebook for Managing Data from Emerging Technologies for Transportation Figure 14. Big Data Governance Framework (Kim and Cho 2018)
From page 95...
... Supporting Tools 95 expert statistician or data analyst are now performed directly by a variety of end users using visual and code-less tools requiring less technical expertise. To accommodate this move toward a distributed use of data, a distributed form of data governance has been adopted by many organizations.
From page 96...
... 96 Guidebook for Managing Data from Emerging Technologies for Transportation Data Name Live Traffic Feed Data Location Z:/DataLake/LiveFeeds/Traffic_XML/ Data Description XML data pulled from roadside sensors every 10 seconds Data Sensitivity No sensitive information or PII Data Governance Roles Name of Role Description of Role Personnel Filling Role Data Owner Exercises administrative control over the data. Concerned with risk management and determining appropriate access to data.
From page 97...
... Table 22. Information cataloging form.
From page 98...
... Data Source Description Ownership Format Size Cost Security Level Granularity Restrictions Update Frequency Projects Last Reviewed Waze Incidents Traffic speeds based on global positioning systems probe data Internal XML 2.1 TB total $70,000 /year Proprietary Predefined roadway segments Cannot share without permission 1 minute Work Zones, Signal Timing 03/12/2019 Snowplow AVL Probe data from snowplows Internal REST API 4 TB total $4 /truck No PII 0.01 mile point None 1 minute DOTPJ, Work Zones 01/15/2019 CoCoRahs Certified crowdsourced weather reports CoCoRahs Network XML 380 MB total Free No PII Interpolated from number of reports None 24 hours SNIC, Possibly DOTPJ 04/03/2019 Incident Reports Individual incident reports collected from participating local agencies Internal CSV 500 MB total $15 /month Sensitive 1 row = 1 incident None Monthly batch upload A-110, possible use in A123 02/22/2019 Table 23. Data source assessment example.
From page 99...
... Supporting Tools 99 • Size -- How much capacity is required to store the data? This can be represented in terms of total storage used and/or how much additional storage is required per month depending on the nature of how the data source.
From page 100...
... 100 Guidebook for Managing Data from Emerging Technologies for Transportation diverse, and fast-changing data sets. As these data sets differ greatly from traditional data sets in terms of their volume, variety, and velocity, they require new and powerful ways of dealing with the data.
From page 101...
... Supporting Tools 101 business unit, outliers from the same data set may be of interest to another, and only a few fields from that data set may be of interest to another. The data lake allows for each business unit to use the same raw data independently from each other and shape them to the specific needs of their applications, business intelligence tools, and/or static reports.
From page 102...
... 102 Guidebook for Managing Data from Emerging Technologies for Transportation transportation agencies will need to concern themselves with until they have built a very mature set of big data management approaches. It will be more effective for agencies to focus first on using the guidance in this document to collect large amounts of data that are properly cleaned, stored, enriched, analyzed, and visualized before diving into deep learning.
From page 103...
... Supporting Tools 103 potential cost savings from adopting new approaches. This same team may also review available data sets or recent big data-enabled achievements from their closest peers to see if they could benefit from pursuing new data products.
From page 104...
... 104 Guidebook for Managing Data from Emerging Technologies for Transportation can provide decision-makers with more detailed, intricate, and timely outputs from which to base their decisions, which simply cannot be offered with the siloed nature of transportation agency data today.
From page 105...
... Supporting Tools 105 directly benefit the agency sharing the data. That said, there are situations where cost sharing may be appropriate.

Key Terms



This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.