Skip to main content

Currently Skimming:


Pages 53-72

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 53...
... 53 This section of the guidebook presents a modern big data management life cycle and framework. The life cycle defines the four major components of managing data throughout their entire life cycle.
From page 54...
... 54 Guidebook for Managing Data from Emerging Technologies for Transportation data set from a third-party provider. Correctly identifying the most appropriate data to acquire is one of the most vital first steps in building a practice to manage data from emerging technologies, as these data form the foundation for all future projects, tools, and analyses.
From page 55...
... Modern Big Data Management Life Cycle and Framework 55 • Collect all data as they are generated, raw, and unaggregated. Do not discard data during collection.
From page 56...
... 56 Guidebook for Managing Data from Emerging Technologies for Transportation • Do not collect data with only selected users in mind. – Open and share data as a whole at no more than a reasonable reproduction cost to allow authorized users to re-use, re-distribute, and intermix with other data sets.
From page 57...
... Modern Big Data Management Life Cycle and Framework 57 – Design and seek agreement on lineage metadata to be added to the third-party data products. – Develop and seek agreement on the quality metrics to be used to assess the data.
From page 58...
... 58 Guidebook for Managing Data from Emerging Technologies for Transportation Few transportation agencies have developed to the point that they have metadata catalogs, database diagrams, or comprehensive data quality monitoring in place. Following are best practices within the big data industry for storing and managing big data.
From page 59...
... Modern Big Data Management Life Cycle and Framework 59 • Folder structures, data sets, and access policies are managed to accommodate end users' needs while maintaining the security and quality of the data. While traditional data architecture allows data access and use to be controlled at the record level in terms of reading and writing data, creating temporary tables, and executing specific queries, modern data architecture, except for a few specialized solutions, only controls data at the file and folder level.
From page 60...
... 60 Guidebook for Managing Data from Emerging Technologies for Transportation • Do not adopt commercial solutions that restrict the system's scalability and responsivity and its ability to keep data open. • Follow a distributed architecture to allow data processes to be developed, used, maintained, and discarded without affecting other processes on the system.
From page 61...
... Modern Big Data Management Life Cycle and Framework 61 • Learn how to encrypt/obfuscate (data masking) , such as hashing techniques and encryption, to anonymize personal information, or hire third parties to perform and maintain encryptions and take responsibility over the security of the shared data.
From page 62...
... 62 Guidebook for Managing Data from Emerging Technologies for Transportation Traditional data analysis relies heavily on the traditional data system architecture and its approach of shaping stored data to fit predetermined analyses. Traditional data systems are optimized for a specific data model, which converts raw data to structured data, removing the fuzziness and outliers and rigidly organizing it using predetermined relationships between each data element.
From page 63...
... Modern Big Data Management Life Cycle and Framework 63 Following are recommendations for analyzing and managing big data within the use data management life-cycle component based on best practices within the big data industry. Recommendations for Managing Data Within the Use Life-Cycle Component The following are recommendations for analyzing and managing data within the use lifecycle component.
From page 64...
... 64 Guidebook for Managing Data from Emerging Technologies for Transportation requirements and are not limited by a predetermined consensus on what resources should be available for analysis but left to business areas to determine which analytic tools best satisfy their analytic needs with their means. As such, do not dictate which tools data users should use to build their data analysis pipelines; let each data user define which tools are best suited for its analyses based on its data, resources, and knowledge.
From page 65...
... Modern Big Data Management Life Cycle and Framework 65 are often approximated. As such, their results are susceptible to variations and disruptions not commonly seen in the traditional data analysis approach; therefore, results are carefully reviewed and monitored.
From page 66...
... 66 Guidebook for Managing Data from Emerging Technologies for Transportation • Do not impose analytics solutions and resources limits on the analysts upon design. In traditional data systems, stability and order are often maintained by tightly restricting the type of software or languages that can be used to develop data analyses and specifying or allocating a maximum amount of resources or priority with which the data analyses can be run.
From page 67...
... Modern Big Data Management Life Cycle and Framework 67 from large combined data sets in ways that were impossible previously, requiring additional caution and care when preparing data sets for public use. When these modern challenges are managed effectively, however, sharing data analyses and receiving validation of their conclusions from external sources could provide valuable benefits to transportation agencies and the public users they serve.
From page 68...
... 68 Guidebook for Managing Data from Emerging Technologies for Transportation protocols such as the Open Database Connectivity (ODBC) or the Java Database connectivity (JDBC)
From page 69...
... Modern Big Data Management Life Cycle and Framework 69 Overall, modern data system architecture favors an open approach to data sharing with the understanding that a few sets of eyes will not suffice to extract value and intelligence from large and complex data sets. Rather, gathering the inputs and insights from many eyes from other agency divisions, universities, and even the public, can help agencies understand and successfully derive value from the data.
From page 70...
... 70 Guidebook for Managing Data from Emerging Technologies for Transportation sharing geospatial data using a single vendor and its proprietary file format and interfaces. Agencies should instead focus on using non-proprietary file formats and APIs to share their data both internally and externally.
From page 71...
... Modern Big Data Management Life Cycle and Framework 71 usernames and passwords were stolen from the website LinkedIn and decrypted in a matter of days before offered for sale online.4 Given how quickly the effectiveness of encryption algorithms is changing, these algorithms need to be carefully chosen. Among the algorithms recommended by the National Institute of Standards and Technology, some (e.g., 3DES)
From page 72...
... 72 Guidebook for Managing Data from Emerging Technologies for Transportation • Identify and track external users allowed to access the data. While traditional data systems intend to control access to the data upfront by tightly controlling it, such an approach is less likely to be successful across the large and complex data sets, distributed processing, and extensive sharing of modern data systems.

Key Terms



This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.