National Academies Press: OpenBook

Application of Big Data Approaches for Traffic Incident Management (2023)

Chapter: Chapter 4 - TIM Big Data Use Cases

« Previous: Chapter 3 - Datasets and Data Quality
Page 37
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 37
Page 38
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 38
Page 39
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 39
Page 40
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 40
Page 41
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 41
Page 42
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 42
Page 43
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 43
Page 44
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 44
Page 45
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 45
Page 46
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 46
Page 47
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 47
Page 48
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 48
Page 49
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 49
Page 50
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 50
Page 51
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 51
Page 52
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 52
Page 53
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 53
Page 54
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 54
Page 55
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 55
Page 56
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 56
Page 57
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 57
Page 58
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 58
Page 59
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 59
Page 60
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 60
Page 61
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 61
Page 62
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 62
Page 63
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 63
Page 64
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 64
Page 65
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 65
Page 66
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 66
Page 67
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 67
Page 68
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 68
Page 69
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 69
Page 70
Suggested Citation:"Chapter 4 - TIM Big Data Use Cases." National Academies of Sciences, Engineering, and Medicine. 2023. Application of Big Data Approaches for Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/27300.
×
Page 70

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

37   This chapter presents and discusses the data pipelines that were developed to demonstrate the feasibility and value of selected TIM big data use cases: • Use Case 1: Improving Incident Detection and Verification to Expedite Response. • Use Case 2: Real-Time TIM Timeline and Performance Measures. • Use Case 3: Understanding Secondary Crashes. • Use Case 4: Exploratory Analysis of Third-Party CV Data for Crash Detection. Each use case and data pipeline are described in the form of a case study. Each of the case studies provides the following information: • Overview of the use case • Datasets leveraged • Description of the data pipeline • Data analysis/products • Lessons learned and recommendations for implementation The following list provides information on diagrams and terminology used throughout the case studies. • Data pipeline. A data pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. In each of the case studies, the data pipeline is described using two different yet equally important processes: the pipeline technical blueprint and pipeline workflow description. – Pipeline technical blueprint. The pipeline technical blueprint is shown visually with an N2 diagram, which illustrates the logical data flow through the pipeline. The blueprint also displays the functions performed and items passed between the functions (i.e., inputs and outputs of each of the functions). An example N2 diagram is shown in Figure 19. It is important to recognize that each step in the diagram is not necessarily equal in its time or complexity. Within each step of the pipeline, one or more intermediate functional actions may be required. – Pipeline workflow description. The pipeline workflow description presents a more practical example of the workflow steps, how these steps are performed, and in which systems these steps occur. • Cloud environment. This is the larger environment in which many primary functions are performed, such as storage, scheduler, and others. This environment is responsible for storing and executing the various functions to collect, analyze, enrich, and curate the data. The cloud environment is also used as the storage area for archival and query-ready data. • Cloud function. Cloud function is a general term encompassing the use of cloud environment tools, functions, and code designed to leverage the resources and efficiencies of the cloud C H A P T E R 4 TIM Big Data Use Cases

38 Application of Big Data Approaches for Traffic Incident Management 2 Figure 19. Example N2 diagram. without restrictions in data size, hardware availability, or processing capabilities. Cloud func- tions can include simple data processing, such as the Collect and Split functions in the data pipeline for Use Case 1, or they may access robust cloud data tools and storage, such as the geo- fence streaming database or crash document database used in the data pipeline for Use Case 2. – Cloud scheduler. This refers to the automated processes that continually execute at a set interval to collect data or execute queries to refresh the display interface. – Cache. A cache is temporary storage of data or metrics that is held in memory to be used in calculations or for analysis rather than stored into permanent tables. The use of caches enables data to be immediately ready for analysis without requiring a query of the complete dataset. For example, a cache provides an optimized approach to compare and perform deduplication of the incoming data without requiring continual queries of the entire stored dataset. Not using a cache for this purpose would introduce latency and increase the required resources. – Deduplicate. The Deduplicate function compares incoming events against a local cache, or temporary storage, of events to determine if the incoming event is new or part of an ongoing event. For ongoing events, the record in the cache is updated to avoid duplication, with only new events added on a per dataset and per record level. • Internal data sources. Data from internal sources have already undergone some level of data storage, enhancement, or curation. Internal datasets require little alteration and, in many cases, may serve as the base network against which the incoming data sources are mapped or enriched. • Incoming external data sources. Data can be ingested into the data environment from sources external to the transportation agency, known as incoming external data sources.

TIM Big Data Use Cases 39 • Document database. Also known as a document-oriented database or NoSQL (not only Structured Query Language) document store, a document database is a non-relational database that stores data as structured documents. • Display and dashboard environment. This environment is a cloud-hosted platform in which data, services, and display mechanisms provide visualizations of results back to the user in a more user-friendly interface or dashboard. These cloud-hosted platforms may include data, services, analytical tools, or other display mechanisms. 4.1 Use Case 1: Improving Incident Detection and Verification to Expedite Response 4.1.1 Overview This use case sought to address the issue of timely and accurate detection of traffic incidents on a statewide basis. Traditionally, states have relied on information from their ITS devices (e.g., cameras, fixed detectors), state police partners (e.g., 911 calls, dispatch), and SSP operators in the field to detect traffic incidents. While these means are effective, particularly in urban areas, they can lack timeliness, detail, and spatial coverage as follows: • ITS devices are typically concentrated in urban areas, with limited deploy- ments in rural areas. • Not all incidents generate 911 calls, and 911 calls for incidents that do generate them can be delayed. • Incident notification/identification can lag minutes behind when incidents occur, especially outside of urban areas. • Manual verification of incidents by traffic management staff can increase the time needed to issue a response. An approach that leverages crowdsourced big data offers an opportunity to more quickly and accurately identify when and where incidents occur across an entire state or region. Crowdsourcing is the practice of addressing a need or problem by enlisting the services of many people via technologies. With cell phones and other technologies, data are sourced whenever and wherever people travel. These big data can assist transportation agencies by filling geographic gaps and improving information timeliness and accuracy, par- ticularly when integrated with agency and other third-party data. 4.1.2 Datasets The datasets used to develop the pipeline for this use case include crowdsourced data feeds from a navigation app provider and the ARNOLD road network data from Massachusetts, Minnesota, and Utah. (Chapter 3 includes a more complete description and assessment of these datasets.) 4.1.3 Description of Data Pipeline The N2 diagram for the Use Case 1 data pipeline is shown in Figure 20 and contains the following functions and steps. 1. The Collect function—an automated function to collect or pull data—captures free navigation app data at a regular frequency, scheduled based on the data refresh rate or availability, and passes it to a Split function. The Collect function for this use case is continuously conducted every two minutes to retrieve the most recently available data. Deduplication Incoming navigation app alerts include only a current timestamp and do not indicate if the event associated with the alert was previously reported. The Remove Duplicates function retains a limited cache of previous alert records to compare against the incoming data. This allows for ongoing traffic events to be updated, not duplicated, with only new events being added.

40 Application of Big Data Approaches for Traffic Incident Management Figure 20. N2 diagram for Use Case 1. 2. The Split function receives the collected navigation app data and splits their contents into single records before passing them to the Remove Duplicates function. This Split function allows for incoming records to be collected in a table and then treated individually as a single record of information throughout the remainder of the pipeline workflow. This approach allows highly granular control and analysis of each navigation app traffic alert. 3. The Remove Duplicates function inspects each newly received alert against an index of the previously received alerts to determine if the record already exists. This pipeline makes use of a bloom filter designed to rapidly compare large data for duplication to perform this function. This comparison is valuable, as ongoing events from the navigation app data are repeated in each data push without the use of an “ongoing” tag to directly track events previously provided. As such, a process is needed to track events that have already registered, filter them out or update records to indicate that an event is ongoing, and then only pass new events that have not been snapped to the network to the Snap to Road Network function. 4. The Snap to Road Network function receives new navigation app traffic alerts and matches their locations to the ARNOLD-based route ID and milepost. It also matches their loca- tions to the uniquely encoded geographic coordinates via a corresponding level 10 geohash value. (A geohash is a unique identifier of a specific region on the Earth. The size of the geohash polygon depends on the geohash level, which ranges from 1 to 12; level 1, measured

TIM Big Data Use Cases 41 in kilometers, is the least precise, and level 12, measured in millimeters, is the most pre- cise.) (Khadka & Singh, 2020). This approach accelerates spatial searches and mapping, then passes events to the Enrich function. 5. Once snapped to the roadway, the Enrich function receives the navigation app traffic alerts and uses their ARNOLD location references to add relevant metadata (e.g., roadway infor- mation, agency ownership information). The Enrich function is a spatial analysis tool designed to identify spatial relationships between different datasets and places where the data do not share common attributes that would traditionally be used to join the data. The Enrich function then stores the output into a cloud-based archive for retention. 6. Step 6 stores the output from the Enrich function for continued analysis in the real-time data pipeline. 7. In Step 7, this second Store function sends navigation app traffic alerts to the Query function, which positions the event data into a data store in a “ready” state for query and analysis. 8. In Step 8, queries received are performed on the data, which can include a variety of query functions and processes, and the results are presented to the Display function. For example, Step 8 would include the base view of a dashboard that is displayed for a user. 9. The Display function in Step 9 is the process of visualizing the data in the dashboard or, in the case of API calls, as responses that are ready for display in the dashboard. This step in the technical workflow includes a loop in which the Display function request is sent by dashboard users back to the Query function. For example, Step 9 could be a specific query generated by the user through interaction with the dashboard. 10. The loop is completed in Step 10 when the Query function sends the request for data back to the second (real-time) Store function to perform the specific user-generated request generated through the dashboard from Step 9. Steps 7 and 8 will then be repeated to return the results to the dashboard. There are additional considerations for efficiency and repeatability that can be implemented in the Steps 7–10 loop for queries that are performed across various use cases. For example, there may be opportunities to create additional, commonly requested datasets (e.g., monthly, yearly, or weekly averages) that could be made available for query rather than requiring each query to calculate these repeated requests. Custodians of similar pipelines are encouraged to review workflows across use cases and technical blueprints on a semi-regular basis to identify these common or frequent queries. This use case focused on developing a data pipeline that automatically pushed real-time incoming navigation app data directly to the user interface/dashboard without requiring a query by the user. The data pipeline is shown in Figure 21. Contained within Figure 21 are the various components of the example system configuration used in the development of this use case. The workflow of the data pipeline in Figure 21 occurs within a cloud environment and con- sists of the following steps, which provide the complete data pipeline to collect, snap, enrich, and present the query-ready data to the end user via a dashboard. 1. Trigger cloud function every minute. The cloud scheduler routine executes to retrieve data from sources. This routine is executed from within the cloud environment as part of the system and is responsible for beginning the update process. 2. Get third-party data. This data-retrieval function receives the most recent navigation app data file from the external source. This process executes the external-data retrieval require- ments, which may require authentication procedures to ensure that only those with ade- quate licensing and rights are accessing the data. This process may require coordination with IT staff to retrieve the data in a secure format, including encrypting the data package or pro- viding system authentication and access from the cloud environment. All other subsequent steps performed within the cloud environment should be able to be completed by the business analyst performing the analysis steps.

42 Application of Big Data Approaches for Traffic Incident Management 3. Deduplicate data. The incoming data file is reviewed by the system to identify new, updated, and expired events. This step adds updates to existing events and entirely new events to a stored cache of data. This cache functions as part of the process step to aid in the deduplication of the navigation app traffic alerts, not as permanent storage. 4. Snap and enrich data. Incoming new data undergo snapping and enrichment with internally available data (e.g., road network) to add data attributes and enrich the dataset to be ready for query from the cloud environment. The enrichment process in this step takes the newly published navigation app traffic alert and uses the location attributes to snap to available ARNOLD points (road network). The process identifies up to four of the closest potential points on the roadway where that incident may be located. These locations are then compared using a formula that scores each of the points based on how far they are from the navigation app traffic alert location and how far each selected roadway’s heading is from the heading of the navigation app traffic alert. The process then selects the top matching score and assigns it the metadata from ARNOLD to enrich the navigation app traffic alert. Once calculated, the elapsed time of each alert from the previous alert is determined and pushed down the pipeline for storage or querying in the cloud environment. 5. Ready data display and query. This step calculates the elapsed time from the previous alert and sends an update to a GIS platform for real-time consumption and to cloud storage for archiving. At this step, the data have undergone necessary changes and enrichment, and they can be considered in a “ready state” for delivery to the GIS platform environment to host, display, and query. 6. Host data. The navigation app traffic alert updates are directly added to a hosted feature layer for up to 12 hours within the GIS platform environment. The data stored in this environ- ment are pushed to the GIS platform cloud, with access and permissions handled by the connection to that environment. These data are served via an API to allow systems to exchange information securely (also known as being “restful”), which allows the data to be queried or used in other dashboards that have a similar requirement of near-real-time data. 7. Display data. The near-real-time event data and updates are visualized into a user dashboard to present the data as a continually refreshing data source. This can be used to view events Cloud Scheduler (frequency) Free Nav App Data Real-Time Dashboard Figure 21. Data pipeline for Use Case 1.

TIM Big Data Use Cases 43 as they are reported and made available through the system. This dashboard is hosted and made available through the GIS platform to take advantage of the hosted data and to allow for the most efficient method of continually displaying real-time data as it is delivered to the GIS platform environment. 4.1.4 Data Products As part of this use case, a prototype dashboard of real-time incident detection and verification was developed with a series of capabilities designed to allow users to navigate the incident data. The dashboard includes the following features and functionalities: • Map interface with click and zoom capabilities, including pop-ups of incident details and metadata; • Ongoing counts of crash and non-crash events; • Constantly refreshing list of navigation app traffic alerts, ordered by duration, that includes elapsed time of event, location, and number of thumbs-up responses; • Filtering capabilities to sub-query third-party alerts; and • Graph illustrating density of incidents across the road network. The use of free navigation app data in this use case provides a consistent data source used across the country; this allows the pipeline to be expanded to include additional states where navigation app data can be made available. Screenshots of the dashboard are shown below for Massachusetts (see Figure 22), Minnesota (see Figure 23), and Utah (see Figure 24). The dashboard automatically updates, and it was designed to provide operators tasked with monitoring the system with an easily viewable interface on which incidents are readily visible. The ability to quickly observe related metadata, such as the number of “thumbs-up” responses, provides additional context to the incidents and can assist with field verification. This is particularly valuable on roadways with fewer sensors and cameras. Active Alerts Massachusetts Dashboard Figure 22. Screenshot dashboard displaying Massachusetts.

44 Application of Big Data Approaches for Traffic Incident Management Minnesota Dashboard Active Alerts Figure 23. Screenshot dashboard displaying Minnesota. Utah Dashboard Active Alerts Figure 24. Screenshot dashboard displaying Utah.

TIM Big Data Use Cases 45 Standardization of the data in this use case could be expanded to provide a single dashboard of the national feed of navigation app traffic alerts. This could allow partnering agencies to readily observe neighboring situations that could impact the system they monitor, particularly in the northeastern states in which interstate travel frequently occurs. In addition to the GIS platform dashboard developed in this use case, additional analysis- specific dashboards could be deployed using the same output data. An example of this is shown in Figure 25. A dashboard was developed for MassDOT as part of the FHWA Every Day Counts Round 5 Crowdsourcing for Operations initiative (Fitzpatrick, 2021). This dashboard depicts the number, location, and rough impact of open navigation app traffic alerts. While the data pipeline was slightly dissimilar, more complicated, and more expensive to run than the pipeline for Use Case 1, this example illustrates that different tools can be used to display the same data for diverse purposes or with varying metrics, depending on the needs of the agency and users. 4.1.5 Lessons Learned and Recommendations During the development of Use Case 1, there were several lessons learned associated with the data used, interacting with the cloud environment, and the general data pipeline. This subsection details these lessons learned and any associated recommendations. Dashboards have improved in recent years, and users now have access to more tools and resources than ever before. As new use cases are developed and implemented, regular reviews should be conducted to improve the overall use of the dashboard/data products or the under- lying data pipeline. Technologies and tools are consistently being improved and upgraded to Alert and Route Filter Open Alerts Significant Alerts by Subtype Top Crashes (Duration and Thumbs Up) Alerts by Subtype Alerts by Route Mile Post Alert Subtype Route Milepost Milepost Standstill traffic Report Duration & Number of Thumbs Up Re po rt C ou nt Cr as h Lo ca tio n Major Crash Figure 25. Alternative data pipeline and dashboard display.

46 Application of Big Data Approaches for Trafc Incident Management better handle and analyze data and make data available for query. ese innovations in speed, and the level of precision available from mobile phones or CVs, will continue to improve the precision of location coordinates, which could allow this and similar data pipelines to be rened. In this use case, the end goal was to show the free navigation app data with the highest level of accuracy on the roadway network to inform real-time operations; this specic end goal drove the process. is meant that the dashboard needed to be simple so users could nd relevant information, which in this case was the time and location of a crash and the number of eld validations the crash received. is helped verify the incident and quickly identify what had happened and where. Many of these datasets are asynchronous in nature. e volume of incoming data can uctuate greatly (e.g., overnight, o peak, peak period). Cloud environments are designed to provide exibility to grow or shrink as these uctuations occur. In the initial stages of planning a data pipeline, it could be helpful to review the incoming datasets to identify data volume uctuation windows and to plan for the resource allocation thresholds that may be needed. Cost alternatives for data pipelines should be reviewed to help determine the most appropriate implementation while balancing cost, eciency, and responsiveness. Cloud environments pro- vide many options for how data are stored, processed, and made available based on the needs of the use case and the level of resources required. Within this use case, two dierent pipelines were outlined and built based on the incoming data and needs. ese pipelines follow the same general methodology, but with a variation in the environments used. By redesigning the pipeline (see Figure 25) to directly push the completed records into the GIS platform, rather than having to store all the data made available in a ready-to-use state (which is resource-intensive), a signicant reduction in cost was recognized. 4.2 Use Case 2: Real-Time TIM Timeline and Performance Measures 4.2.1 Overview is use case sought to leverage a big data approach to support the collection and analysis of data to generate TIM-focused performance measurements as they occur over the timeline of a trac incident. Several key points in an incident timeline (T0–T7) are used to track incident response activities. Some of these points, and the duration between them, represent national performance measures that are collected by many states in one way or another. e TIM timeline, shown in Figure 26, contains eight timestamps representing an incident from the time it occurs until the time any disruption in trac ow caused by the incident ends and trac conditions return to normal (FHWA, 2023). Source: FHWA (2023). Figure 26. TIM timeline.

TIM Big Data Use Cases 47 TIM programs encounter data-related challenges when attempting to populate many of the key points on the TIM timeline, including those associated with the national TIM performance measures recommended by FHWA: RCT and ICT. The data-related challenges include a lack of internal DOT data for all incidents; lack of access to data that resides with other responder agencies (e.g., law enforcement); and insufficient tools to capture, integrate, and display the timelines and performance measures in a way that can provide proactive information regarding incident management. In states where it is a more customary practice to collect data points on the TIM timeline, the data are collected from one source, often at the TMC using ATMS software. Some states have added RCT and ICT to their statewide traffic crash report form. While these data can provide a historical context of performance, they may not be able to provide actionable or timely information to improve TIM performance. An approach that automatically collects, integrates, and stores various sources of streaming data in real time offers the opportunity to improve the timeliness and accuracy of TIM perfor- mance data for analysis by transportation agencies. Real-time data sources that can support TIM performance measurement include crowdsourced data, CAD data, and probe vehicle speed data. This use case explores the utility of several datasets to support TIM performance data analysis. 4.2.2 Datasets The datasets and APIs used to develop the pipeline for this use case include free navigation app crowdsourced data feeds, CAD data, and ARNOLD road network data from California and Minnesota; ATMS-CAD data from Minnesota; and a third-party weather API. While a wide variety of events are captured in the navigation app traffic “alert” data, this use case focuses only on the minor crash alerts, major crash alerts, and alerts associated with traffic congestion or “jams.” The ATMS-CAD data from MnDOT was available once every 24 hours, which is a limitation for this use case because this directly impacted the ability to perform real-time data integra- tion. As such, only the real-time media CAD data feed from CHP was used in the pipeline to demonstrate a real-time data integration proof of concept. (Chapter 3 contains a more complete description and assessment of these datasets.) 4.2.3 Description of Data Pipeline Testing this use case required a detailed data pipeline. The objectives of this data pipeline were a) to create a robust, historical crash dataset that could be used for TIM performance analysis and improvement, and b) to provide crash updates for real-time TIM monitoring and manage- ment. To be successful at populating the TIM timestamps, the pipeline needed to handle multiple overlapping loop functions and queries to identify and match the incoming crashes at different intervals and work in concert to construct the associated timelines. The functions executed and populated data as a crash evolved, up until the crash was closed. The data pipeline for this use case needed to accurately track the crash timeline and identify where in the process each TIM timestamp occurred. Once complete, the archived enriched data can serve to provide a complete timeline, which can then be used to generate TIM performance measures. The N2 diagram for the data pipeline for this use case is shown in Figure 27 and contains 11 steps. The functions and steps of the pipeline are detailed in the following list. Steps 1, 2, 3, and 4 in the pipeline all go through the same workflow (described in the sub-bullets under Step 4) but applied to different datasets. 1. Collect, Split, Deduplicate, and Snap: ATMS/CAD data. 2. Collect, Split, Deduplicate, and Snap: navigation app crash alert data.

48 Application of Big Data Approaches for Traffic Incident Management Figure 27. N2 diagram for Use Case 2. 3. Collect, Split, Deduplicate, and Snap: speed data (from navigation app jams data). 4. Collect, Split, Deduplicate, and Snap: third-party weather data. – Collect. The Collect function obtains the data from the source database or storage mechanism. – Split. The Split function takes the collected data package and splits each packet into indi- vidual records or rows that can be processed and used in the pipeline. – Deduplicate. The Deduplicate function compares incoming events against a local cache, or temporary storage, of events to determine if the incoming event is new or part of an ongoing event. For ongoing events, the record in the cache is updated to avoid duplica- tion, with only new events added on a per dataset and per record level. – Snap. The Snap function reviews the returned individual records, matches each record to the proper roadway section, and snaps the records to the respective ARNOLD reference points (route ID and milepost). 5. Store. The Store function receives the data from Steps 1–4 and then stores the data in the in-memory cache that is used to search and match as well as in the permanent storage database for future use, as needed. 6. Search and Match. The Search and Match function queries the data in the in-memory cache for events that occurred within the same space/time window of the initial event timestamp and matches like records. The process is heuristic and iterative, utilizing a recursive loop to process the data to find connections and relationships as data are made available. As records are matched, the Search and Match function then sends the matched event data to the Standardize function. The time required to match new data to an existing event varies depending on the event. Matching is often successful within an hour, and the process should rarely exceed 24 hours.

TIM Big Data Use Cases 49 7. Standardize. The Standardize function receives matched event data and combines these records into a single event record where the data and attributes are standardized. This stan- dardization takes the events and timestamps across the three data sources and updates them to a consistent timestamp-based record. For example, T1, the “first recordable aware- ness” of an incident, can be identified from any of the three datasets; and regardless of where it is originated, it is stored consistently according to the standardization. 8. Store. This Store function receives the standardized event records, places them into a document database that is ready to be updated as each of the TIM timestamps are iden- tified, and makes them available to the Query function for request. 9. Query. The Query function receives the standardized event data based on either a routinely executed data query or some type of user-driven query. For example, within a TMC, a dash- board may automatically run a query every minute to keep the data refreshed for operators to visualize. In addition, an operator may execute a query to obtain more information about a specific ongoing crash, including all available timestamps, to determine the next action. 10. Display. The Display function receives the standardized event data from the Query function and renders the data for visualization. This function also monitors and sends additional requests based on user selection back to the Query function. 11. Query. The Query function receives requests from the Display function (Step 10), trans- forms them into a query, then sends this request back to the Store function (Step 12) for processing. 12. Store. As part of the final recursive step, queries retrieve information from the data store. This is primarily intended to serve as a read-only query within this use case. However, there have been examples in practice where these queries update or change records (e.g., allow- ing operators to manually update timestamps based on observed values). This may occur during such steps as relating events or closing an event; quality control, where managers review the timestamps and attributes to ensure all fields were completed; or directly inserting missing timestamps that did not match properly. The data pipeline workflow is illustrated in Figure 28. Contained within this figure are the various components of the example system configuration used in the development of this use case. The focus of NCHRP Project 03-138 was on developing big data pipelines and not nec- essarily associated data products, which could vary based on agency needs. As such, the data pipeline for Use Case 2 developed for this project is shown on the left side of Figure 28, and potential data products, in the form of a real-time analysis/data product and a historical analysis/ data product, flowing from the pipeline are shown on the right side of Figure 28. The workflow of the data pipeline consists of the following 12 steps, as depicted in Figure 28. 1. The cloud scheduler executes a trigger event every minute, which kicks off Steps 2, 3, and 4 to collect navigation app jams/speed alerts, CAD events, and navigation app crash alerts, respectively. 2. The cloud function collects and identifies new, updated, and expired navigation app jam alerts. 3. The cloud function collects and identifies new, updated, and expired CAD events. 4. The cloud function collects and identifies new, updated, and expired navigation app crash alerts. 5. The cloud function collects weather events that correspond to the crash locations. 6. As navigation app crash alerts are received, both new events and updates to ongoing events are saved into a crash document database. These records are stored for further refinement and enhancement in Steps 7, 8, and 9. Data Collection and Preparation Steps 2, 3, 4, and 5 in the data pipeline workflow are executed individually and vary based on the different data sources. The data from each of the sources undergo transformations to conflate them to the ARNOLD road network. These steps include Collect, Split, Snap, Deduplicate, and other data changes described in the N2 diagram (see Figure 27).

50 Application of Big Data Approaches for Traffic Incident Management 7. In parallel with Step 6, the received navigation app jam alerts, CAD events, and navigation app crash alerts are sent to a roaming geofence streaming database. The purpose of this database is not storage; instead, this data- base provides the environment for each event to be continually reviewed and to “search and match” events that occur within proximity of each other and may be related. 8. This cloud function continually executes a roaming geofence trigger as data are shared with the database. This function involves comparing the data in the cache to the incoming navigation app crash alerts. This com- parison determines if a relationship exists and updates the appropriate timestamps of the existing crash stored in the crash document database, or it adds the record as a new crash if no existing crash is found. – In either scenario, these data are subject to change or adjustment as the previous steps complete additional enhancement on the crash records. The data need to be deduplicated, and the different data sources need to be constantly evaluated to pull the timestamps associated with a crash. During the development of this use case, the team observed that different data sources do not always overlap for an event. For example, a traffic incident may have ended (i.e., no more navigation app alert updates) while the associated navigation app jam alerts are still being received by the system. This requires that related events be reviewed beyond reporting of the navigation app crash alerts to determine the final timestamp (i.e., return to normal traffic flow in T7). Crash Record Distinction Use Case 2 follows a crash throughout the life of the event, so an individual crash record cannot be considered “completed” or “final” at any stage in the process because those records may be subject to update and enhancement as part of the process. These records may be presented to the user of the dashboard at any stage in the refinement process, and various timestamps may change as the data are confirmed or made available. Real-Time Analysis/ Data Product Historical Analysis/ Data Product 4 6 7 8 9 10 11 Data Pipeline Developed for NCHRP Project 03-138 Potential Data Products (Not Developed in NCHRP Project 03-138) 5 Nav App Jam/ Speed Alerts Nav App Crash Alerts 12 Hosted Layer Crash History Database Cloud Function Cloud Storage Figure 28. Data pipeline and potential data products for Use Case 2.

TIM Big Data Use Cases 51 9. As crash data are updated by the processes in Steps 7 and 8, that information is continuously pushed to a cloud function, which sends updates to the cloud database for long-term flat storage for historical analysis and makes the data ready for further display and analysis. – At this stage in the pipeline, the data are ready for presentation to the end user for either historical or real-time analysis. The type of storage varies based on the intended use of the data. Long-term storage happens in the cloud in a near-raw format for retention. There is a historic database that saves the records for analysis for a defined period (e.g., one year). Finally, there is a real-time dataset that is maintained for 24 hours to maximize query speed and efficiency for operation management needs. 10. As data are processed, they could be automatically saved to a crash database. As described previously, this environment may have different terms of storage or data curation than real-time analysis or long-term cloud storage. For historical analysis, data would remain available for up to some period (e.g., one year), and analysis would typically be driven by user interaction (either direct user interaction or programmatic) via queries. This histori- cal database would be designed to remain in a ready state to process queries faster than is possible with long-term cloud storage. For context, queries into the historical data would be more resource-intensive, yet they would remain capable of returning results in seconds, as opposed to the minutes required to query records in long-term storage. This type of storage is more optimal for direct interaction and analysis. 11. Real-time analysis and support: As data are made available to the cloud function, new and updated records could be automatically pushed to an updated database for rapid and effi- cient use. Analysis and display tools may include real-time direct queries against these data. 12. The available data could be refreshed and pushed for use or display (e.g., alerts, dashboards) via tools, such as online GIS platforms, that support data visualization in a spatial environ- ment or dashboard. 4.2.4 Data Products The data pipeline for this use case creates a new, integrated dataset that continually expands as new crashes are added. Figure 29 is an example of a partial crash document created at end of the data pipeline. (These crash documents make up the growing document database.) As pre- viously described, the crash document contains information from multiple data sources that has been integrated and standardized. The crash document contains the crash document ID, type of crash, location (latitude/longitude), detailed weather data, and what timestamps could be populated across the data sources. This continuously growing dataset could be used for many purposes and to create a variety of different outputs (e.g., a real-time display of the TIM timeline for ongoing crashes, monthly or quarterly TIM performance reports). The growing historical dataset could position a TMC to leverage advanced algorithms, or even machine learning, to detect patterns that could provide additional insights. Tables 16, 17, and 18 show breakdowns of the crashes (from crash data) that were successfully matched with navigation app alerts and CAD events (as described in Step 6 of the N2 diagram in Figure 27 and in Steps 7–8 of the data pipeline workflow diagram in Figure 28), as well as the timestamps that were identified within the data sources. The data were generated between June 1 and June 13, 2022, when the pipeline was stable. During this period, there were 14,381 crashes identified in the CHP media CAD feed. Table 16 shows that 526 crashes were success- fully matched with a navigation app crash alert (T0) and with a navigation app jam alert (T7). Table 17 shows that 291 crashes were successfully matched with both a navigation app crash alert and a CHP CAD event. This table also shows a breakdown of the timestamps that were suc- cessfully identified in the CAD data (T1–T6). Finally, Table 18 shows that only 20 crashes were

52 Application of Big Data Approaches for Traffic Incident Management Figure 29. Crash document created at the end of the data pipeline (Minnesota). TIM Timestamps Identified Count T0, T7 526 Table 16. Crashes matched with both navigation app crash and jam alerts.

TIM Big Data Use Cases 53 successfully matched to all three data sources—navigation app crash alert (T0), navigation app jam alert (T7), and CHP CAD (T1–T6)—and shows a breakdown of the timestamps that were identified in the data (T0–T4, T7). These three tables illustrate that while there were challenges in matching data across the data sources, and in identifying the timestamps within the data sources, the process/pipeline did work and was successful for some crashes. Challenges in matching more of the CAD crashes to the third-party data stem from several issues. First, the sample size of the navigation app data is still relatively small due to the crowdsourced nature of the data. As such, only some of the CAD crashes could be linked to the navigation app crash alerts. Second, there is not always a navigation app jam alert associated with a navigation app crash alert or a CAD event, as not all crashes result in disruptions to traffic flow. Third, the CAD data did not always contain the TIM timestamps; in fact, none of the matches included T5 (RCT). Finally, there were challenges in processing the data. At one point, the pipeline was unstable to the point of crashing due to memory errors. Nonetheless, the pipeline—when stable—was able to integrate the CAD and navigation app data in real time and identify some of the TIM timestamps within both data sources. TIM Timestamps Identified Count T0 60 T0, T1 54 T0, T1, T2 5 T0, T1, T2, T3 15 T0, T1, T3 18 T0, T1, T2, T3, T4 30 T0, T1, T3, T4 27 T0, T1, T2, T4 6 T0, T1, T4 17 T0, T1, T2, T3, T4, T6 39 T0, T1, T2, T6 4 T0, T1, T2, T4, T6 9 T0, T1, T6 2 T0, T1, T2, T3, T6 5 Total 291 Table 17. Crashes matched with both navigation app crash alert and CAD. TIM Timestamps Identified Count T0, T7 3 T0, T1, T2, T3, T4, T7 5 T0, T1, T3, T7 4 T0, T1, T2, T3, T7 1 T0, T1, T3, T4, T7 3 T0, T1, T7 4 Total 20 Table 18. Crashes matched with navigation app crash and jam alerts and CAD.

54 Application of Big Data Approaches for Traffic Incident Management 4.2.5 Lessons Learned During the development of Use Case 2, there were several lessons learned associated with the data used, interacting with the cloud environment, and the general data pipeline. These lessons learned, and any associated recommendations, are detailed in the following list. • The need to perform geofencing of disparate datasets can be a significant cost factor in the cloud. Spatially aligning data is required to join data sources that do not share a common data element (e.g., unique identifier). The data needed for conducting spatial searches needs to be evaluated to confirm that the spatial accuracy is sufficient for the analysis. Answering the following questions can assist in determining the relevance of the data and the size of the geofence that may be required: – What is the spatial accuracy of the data (i.e., recorded geolocation versus actual geolocation of event)? ◾ The lower the spatial accuracy, the larger the geofence or geohash that will be required, which can increase errors as well as increase cloud costs. – For the datasets being compared, what is the temporal accuracy (i.e., when were CAD events and navigation app crash alerts reported compared to each other and compared to the actual time of occurrence)? ◾ The lower the temporal accuracy, the larger the timeframe (and data) needed for analysis, which can increase errors as well as increase cloud costs. – Can the geofence be refined? ◾ Understanding how to design the geofence or geohash area can significantly reduce the time and resources needed for the analysis. A main obstacle encountered in this use case was the ability to rapidly search and match each event with corresponding data. The search area should be kept as small as possible to limit the number of potential matches that must be compared. A more focused search is recommended over the use of a wider bounding box to minimize the resources needed. • As previously noted, some of the timestamps were not available until the very end of the crash, therefore they could not be reported until after the crash had been closed. Defining the specific methodology that will be used to determine each timestamp can maximize the chances that a crash is not inaccurately reported. 4.3 Use Case 3: Understanding Secondary Crashes 4.3.1 Overview Use Case 3 for TIM big data sought to better understand why, when, and where secondary crashes occur. Historically, secondary-crash analysis has been limited because of challenges in identifying secondary crashes among crash data. To identify secondary crashes, researchers have used a variety of approaches, including queuing theory, speed contour maps, shockwave theory, and spatial and temporal analysis. The primary drawback to these approaches is that secondary crashes can be underestimated when they do not occur in queues or because of shockwaves, or they can be overestimated when time or space buffers are too large. While these approaches provide a starting point for identifying secondary crashes, in general, there has been a lack of ground truth data to validate their accuracy. Historically, only a few states have collected data on secondary crashes to support analyses and validation. In 2017, the fifth edition of the MMUCC included a new secondary-crash data element and attribute (NHTSA, 2017). Soon after, in 2017–2018, the FHWA’s Every Day Counts Round 4 (EDC-4) included an innovation on “Using Data to Improve TIM” (FHWA, 2019a). As a result of that effort, several participating states added the MMUCC secondary-crash data element to their statewide traffic crash report forms.

TIM Big Data Use Cases 55 The addition of a data element to capture secondary crashes provides an opportunity for analysis that previously could not be conducted. As such, the approach for this use case was to combine secondary-crash data from multiple states; enrich the data with traffic, roadway, and weather data; and apply big data techniques to uncover relationships and trends that had not been systematically identified with previous approaches. 4.3.2 Datasets As shown in Table 19, the datasets available to develop the pipeline for this use case included state crash report data, the ARNOLD road network data from 10 states—Arizona, Florida, Illinois, Maine, Nevada, Ohio, Tennessee, Utah, Wisconsin, and Wyoming—and a third-party weather API. The team had access to just under 52,000 crashes flagged as “secondary” on the crash reports from these states (see Figure 30). For a more complete description and assessment of these datasets, refer to Chapter 3. 4.3.3 Description of Data Pipeline The N2 diagram for the Use Case 3 data pipeline is shown in Figure 31 and contains the following functions and steps: 1. The Collect function captures the crash data (from statewide crash databases) and passes them to the Store function. 2. The Collect function captures weather data from the weather API and passes them to the Store function. 3. The Collect function captures roadway data from ARNOLD and passes then to the Store function. 4. The Store function receives the referenced data from Steps 1, 2, and 3 and stores each dataset. Once stored, these data are used by the Search and Match function as requested. 5. The Search and Match function queries for crashes that occurred within the same space and time window and matches records that are alike. The process is heuristic and iterative, occurring in a loop that continually processes the data to identify relationships between the data as they are made available. The Search and Match function then sends the matched crashes to the Standardize function. State Statewide Crash Data Number of Crashes in State Crash Data Provided Number of Crashes Marked as Secondary in State Crash Data Percentage of Secondary Crashes Based on State Data Roadway Data (ARNOLD) Weather API Arizona Jan 2018–Dec 2020 824,867 16,093 1.95% √ √ Florida Nov 2017–Dec 2019 653,140 1,264 0.19% √ √ Illinois Jan 2019–Dec 2020 560,053 18,110 3.23% √ √ Maine Sept 2018–Dec 2020 63,896 63 0.10% √ √ Nevada Sept 2018–Dec 2020 162,688 3,030 1.86% √ √ Ohio Jun 2018–Sept 2020 765,415 2,158 0.28% √ √ Tennessee Nov 2014–Dec 2020 1,647,315 7,425 0.45% √ √ Utah Jan 2017–Dec 2019 189,524 182 0.10% √ √ Wisconsin Jan 2017–Dec 2020 539,445 3,062 0.57% √ √ Wyoming Jan 2017–Dec 2020 54,701 469 0.86% √ √ Total 5,461,044 51,856 0.95%* *Percentage of all crashes in provided state crash data that were secondary crashes. Table 19. Data used for Use Case 3.

56 Application of Big Data Approaches for Traffic Incident Management Figure 30. Crashes flagged as secondary for 10 states. Figure 31. N2 diagram for Use Case 3.

TIM Big Data Use Cases 57 6. The Standardize function receives the matched crashes to provide consistency, then passes the standardized data to the Merge function. 7. The Merge function combines the matched crashes into single event records and then passes them to the Clean function. 8. The Clean function receives the merged data and removes records containing erroneous and missing data, then passes them on to the Bin function. 9. The Bin function reviews the features of each cleaned data record and creates a custom series of buckets or bins, then assigns each cleaned data record to a bin/bucket to reduce the effect of minor observation errors. It then passes the binned data record to the Cluster Analysis function. 10. The Cluster Analysis function receives the binned crash records and performs a segmenta- tion analysis to group them by similarities, so that records in the same group are more like each other than those in other groups. Once the analysis is performed, the Cluster Analysis function labels each record with a cluster group number/name and outputs the grouped crash data. Figure 32 shows the data pipeline and the various components of the system configuration used in this use case. The workflow of this data pipeline is more unique than the data from other use cases presented in this document. As the historical crash data used in this use case vary by year and across states, there are a limited number of steps in the pipeline that could be fully automated. Instead, some of the steps were more manual in nature (e.g., data download, data review and standardization across states). However, in the big data approach, it is not uncommon to spend most of a project’s resources preparing data for subsequent analyses. The merged secondary-crash database contained a total of 51,856 crashes flagged as secondary crashes. These secondary-crash data underwent a step-by-step process designed to review the location coordinates to verify if they could be successfully matched to the ARNOLD roadway data, to review the presence/absence of other crash attributes (necessary for subsequent analyses), and to determine if they were in proximity to primary crashes (to verify the secondary-crash designation in the data). The following bullets outline the steps taken and the findings associated with each step: • Through an analysis of the attributes associated with these secondary crashes, the team found that two-thirds contained “none” or “unknown” attributes, including spatial/location attributes such as latitude or longitude. • The team verified the latitudes and longitudes for 50,392 (97.1 percent) of the secondary crashes. (The remaining 2.9 percent of secondary crashes were deemed to have erroneous spatial attributes.) • The team employed a spatial-temporal analysis to identify at least one primary crash candidate for each of the secondary crashes. Based on a sensitivity analysis of various times and distances State Crash Data Roadway/ARNOLD Data Third-Party Weather Data Cloud Storage Data Enrichment & Processing Data Analysis Figure 32. Data pipeline for Use Case 3.

58 Application of Big Data Approaches for Traffic Incident Management from the secondary crashes, the team identified crashes that occurred within two hours (prior to) and within 2 kilometers (in either direction) of the secondary crashes. A primary crash can- didate could not be identified for 31.2 percent of the secondary crashes using these spatial/ temporal criteria. Therefore, these crashes were removed from subsequent steps because they could not be easily verified as secondary crashes, reducing the number of secondary crashes to 34,693. • The team checked the timestamps of the remaining secondary crashes to verify 1) that a time- stamp was present and 2) that the timestamps were properly formatted and without error. Ten percent of the secondary crashes were removed in this step due to missing or incorrect timestamps, leaving a total of 31,105 crashes. • The team compared the route IDs of the secondary crashes and the identified primary crash candidates to verify that the crash occurred on the same route. Only 14,195 of the identified primary crash candidates occurred on the same route as the secondary crashes. While it is possible that a secondary crash could occur on a different road (e.g., cross street) or facility (e.g., on-ramp), most secondary crashes occur on the same route and proximate to the pri- mary crash. • The dataset was then further reduced to remove any secondary crashes with a timestamp earlier than the primary crash, which removed 2.86 percent of crashes, leaving 13,788. Unlike the crash data received from all other states, the crash data from Wisconsin already had the primary–secondary crashes relationship identified within the data as an attribute of both the primary and secondary crash (i.e., by linking the two with an identifier). As such, the team added these 1,660 secondary crashes in Wisconsin to the dataset, bringing the dataset of verified secondary crashes to 15,448 across the 10 states. These crashes are shown in Figure 33. Source: © 2021 Mapbox; © OpenStreetMap. Figure 33. Locations of verified secondary crashes.

TIM Big Data Use Cases 59 4.3.4 Data Products Unlike the other use cases in this report, Use Case 3 tested methodological approaches for data analyses, rather than producing a singular dashboard product or actionable outcome. This use case includes the steps taken to enrich secondary-crash data, to verify secondary crashes by identifying associated primary crashes, and to analyze the verified secondary crashes. This dataset of verified secondary crashes can be leveraged to produce a variety of visual dashboards or leveraged as an input dataset for further research and analysis of secondary crashes and their relationships with other available data. The available crash data from the nine states in this use case provided a dataset that includes differences based on population, geographic location, regional representation, roadway char- acteristics, and weather impacts. As noted previously, the initial analysis steps revolved around molding the data from different states into a consistent data layer, with attributes that would allow crashes from different states to be analyzed together. The identification and review of similar fields were conducted manually. Through this manual process, some fields were con- catenated (i.e., linked together in a series), updated, or delimited to produce a common data schema—data recoding—that could be merged into a singular output containing reported sec- ondary crashes. After merging this dataset, the full set contained nearly 52,000 crashes that were marked as “secondary.” These crash records then underwent analysis to verify them as secondary crashes, to understand the effects of weather, and to evaluate a variety of crash factors. As described previously, after verifying the nature of each secondary crash and its relationship to the primary crash, as well as distance from the primary crash, the number of remaining secondary crashes was reduced to less than 16,000. 4.3.4.1 Cluster Analysis for Crash Factors The team performed a cluster analysis on the reduced/verified dataset to identify similar secondary crashes based on the available crash attributes in the data, including those from the crash reports, ARNOLD roadway data, and third-party weather data. The cluster analysis was based on a k-prototype clustering algorithm, a hybrid clustering algorithm that can process both categorical and numerical data. To run this cluster analysis, the team had to fill in all missing numeric data with “-1” and all missing categorical data with “unknown.” The results of the cluster analysis for each data element/attribute used in the analysis show two general trends. First, for data attributes with many “unknown” values, these unknown values influenced the formation of the clusters. An example is shown in Figure 34, which shows the cluster analysis results for weather conditions. The figure illustrates that, while each of the three clusters contains secondary crashes for all weather attributes except wind, Cluster 1 contains most of the secondary crashes with unknown values for weather. Cluster 1 also contains more of the secondary crashes that occurred in snowy and foggy conditions. Cluster 3 contains most of the secondary crashes that occurred in clear (both day and night) conditions and partly cloudy (day) conditions. Meanwhile, Cluster 2 contains more of the secondary crashes that occurred in cloudy and rainy conditions. The second general trend observed from the results of the cluster analysis is that, for some of the data elements used in the analysis, the results show secondary crashes in all three clus- ters across all attributes. In addition, the distribution of the secondary crashes across the data attri- butes within each cluster generally follows the overall distribution of the data. An example of this is shown in Figure 35. This figure shows the cluster analysis results across the injury severity categories (KABCO). Each of the clusters contains secondary crashes for all severity categories, and these distributions generally follow that of the overall (unclustered) data.

60 Application of Big Data Approaches for Traffic Incident Management Clear Day Clear Night Partly Cloudy Day Partly Cloudy Night Wind 2,800 2,600 2,400 2,200 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 N um be r of S ec on da ry C ra sh es Cloudy Rain Snow Fog Null Cluster 1 2 3 Figure 34. Cluster results for weather conditions. A - Suspected Serious Injury B - Suspected Minor Injury K - Fatal Injury C - Possible Injury O - No Apparent Injury Null 4,500 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0 N um be r of S ec on da ry C ra sh es Cluster 1 2 3 Figure 35. Cluster results for injury severity. Examination of the cluster analysis results across all data elements/attributes did not show any prominent clustering of the secondary crashes that would lead to the classification of secondary-crash types. However, the team made the following observations for the three clusters. • Cluster 1: “Unknown but Unique”—Cluster 1 is primarily characterized by secondary crashes with unknown data attributes. However, this cluster also contains slightly more secondary crashes that occurred on a Monday; in February; and in snowy, icy, and foggy conditions (as well as most of the secondary crashes with unknown weather conditions). Cluster 1

TIM Big Data Use Cases 61 contains a higher proportion of secondary crashes that occurred on lower-classification road- ways (i.e., other principal arterials, minor arterials, major collectors). Cluster 1 contains a slightly lower proportion of secondary crashes that occurred in November and on Fridays, the month and day of the week with a plurality of the secondary crashes overall. • Cluster 2: “Mixed, Wet”—Cluster 2 is a mix between the characteristics of Clusters 1 and 3. This cluster contains some of the secondary crashes with unknown values (e.g., most of the secondary crashes with unknown urban/rural location) and contains many of the most typical types of secondary crashes seen in the data (e.g., occurring on interstate highways). Cluster 2 is set apart from Clusters 1 and 3 in that it contains proportionally more of the secondary crashes that occurred in cloudy, wet, rainy, and dark/not lighted conditions. Cluster 2 also contains a slightly higher proportion of secondary crashes that occurred on Sundays. • Cluster 3: “Expected”—Cluster 3 is primarily characterized as the most typical secondary crashes present in the data, following the expected distributions, as observed in the findings from the descriptive statistics. For example, Cluster 3 includes crashes that occurred in urban areas, on major highways, and in clear/partly cloudy and dry conditions. Secondary crashes in Cluster 3 contain few unknown data attributes. Cluster 3 contains a slightly higher propor- tion of the secondary crashes that occurred in March. 4.3.4.2 Descriptive Statistics There was a large discrepancy between the “original” secondary-crash dataset (as received from the states) and the “verified” secondary-crash dataset (resulting from the spatial-temporal analysis via the data pipeline). Only about 30 percent of the crashes identified as secondary by law enforce- ment officers on the crash reports could be validated through the spatial-temporal analysis. While some of the crashes identified as secondary in the original dataset may have occurred because of a prior non-crash incident (and thus were not verified by the presence of a primary crash), these secondary crashes occur less frequently than secondary crashes that occur because of a prior crash. Therefore, the analysis raises a concern about the veracity of secondary-crash data collected by law enforcement officers via crash reports. To check for similarities and differences between the original and verified secondary-crash datasets, and to highlight larger numbers of secondary crashes with specific attributes, the team ran descriptive statistics on both datasets for multiple data elements of interest. As crash data elements and attributes vary across the states, the team developed a simplified list of data elements/attributes and mapped each state’s unique way of coding to the simplified lists (e.g., any kind of rain was categorized as “rain”). The team then compared the counts/percentages of secondary crashes for each data element/attribute for the original and the verified datasets, both with and without the unknown attributes. The results show that there is a difference in the percentage of secondary crashes across the attribute categories for some data elements. An example is shown in Figure 36. When examining the type or manner of secondary crashes, the data show that most secondary crashes were rear-end collisions (i.e., “Front to Rear”). In this case, without the unknown data attributes, the verified dataset shows a higher percentage of “Front to Rear” crashes (78.19 percent) than the original dataset (60.62 percent). This difference is partially due to the number of crashes catego- rized as “Angle” (original: 22.03 percent; verified: 6.77 percent) or “Front to Front” (original: 3.11 percent; verified: 1.68 percent) in the original dataset that were removed because a potential associated primary crash was not identified in the spatial-temporal analysis. These results also show a better alignment between the verified dataset and what would be expected for sec- ondary crashes based on observations in the field (i.e., more likely to be rear-end than angle or front-to-front crashes), somewhat validating the results of the spatial-temporal analysis and the potential quality issues with the data collection.

62 Application of Big Data Approaches for Traffic Incident Management 0% 20% 40% 60% 80% 100% Verified Original Verified Original W ith U nk no w n W ith ou t U nk no w n Angle Front to Front Front to Rear Rear To Side Sideswipe, Opposite Direction Sideswipe, Same Direction Other/Unknown Figure 36. Comparison of manner of secondary crashes between original and verified datasets, with and without unknown data attributes. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Verified Original Verified Original W ith U nk no w n W ith ou t U nk no w n Road Hazard Roadway/Traffic Condition Driver - Disregard Traffic Control Driver - Skills Driver - Violation Driver - Condition Driver - Distraction Vehicle’s Condition Other/Unknown Figure 37. Comparison of circumstances of secondary crashes between original and verified datasets, with and without unknown data attributes. Similarly, a comparison of secondary-crash circumstances between the original and verified datasets and with and without unknown data attributes was conducted (see Figure 37). In this figure, it can be observed that there is a difference in the percentage of secondary crashes where there was a “road hazard” between the original dataset (9.8 percent) and the verified dataset (28.7 percent). Likewise, there is a difference between the percentage of secondary crashes where there was a “roadway/traffic condition” between the original dataset (8.1 percent) and the verified dataset (21.1 percent). Though there appear to be a considerable number/percentage of

TIM Big Data Use Cases 63 the secondary crashes where “driver skills” was a circumstance, it is less of a factor in the verified dataset (43.5 percent) than the original dataset (61.5 percent). 4.3.5 Lessons Learned During the development of Use Case 3, there were several lessons learned, most of which were associated with the data used. These lessons learned, and any associated recommenda- tions, are as follows: • The discrepancies in data structure between states present a significant challenge in consoli- dating data. Despite the potential to manually process data, schema changes at the state level and year-to-year variations are still present. Automating the consolidation of statewide crash datasets would be challenging. • Manual identification and coding of secondary crashes creates subjective assessments, which cannot be applied uniformly. Synthetic approaches to secondary-crash identifica- tion are not more precise; therefore, segregating secondary crashes will continue to present challenges. • While the cluster analysis did not present the expected findings, the process does suggest that an analysis conducted on a more complete dataset might return results that could not be observed with data from a single state. • When comparing secondary crashes in the original dataset to secondary crashes in the verified dataset, the attributes associated with the latter more closely resemble secondary crashes based on field observations (e.g., rear-end crashes). Original datasets do not represent true secondary crashes, and the methods described in this section may prove useful for extracting datasets that are more likely to be secondary crashes. This process could use additional refine- ment as more states update how they report and code secondary crashes. 4.4 Use Case 4: Exploratory Analysis of Third-Party CV Data for Crash Detection 4.4.1 Overview CV data can deliver critical information as traffic incidents unfold. These data are becoming more available as vehicles capable of interfacing with technology grow in number and replace older models. This presents an opportunity to leverage CV data and probe-based data to gain insights into transportation network performance and provide near-real-time monitoring capabilities to transportation agencies. Traditionally, operators in a TMC monitor the network via ITS devices (e.g., cameras and fixed-field devices), with some incidents remaining unknown until reported from the field (e.g., 911 calls, dispatch). This leads to more reactive response strategies and can result in incident response delays. An approach that leverages the availability of CV data might offer the ability for system operators to take a more proactive approach to TIM, providing near-real-time information on roadway system performance, alerts when performance metrics depart from associated targets, and detailed information about incidents as they unfold. 4.4.2 Datasets The datasets used to develop the pipeline for this use case included one month (November 2019) of third-party CV data from Phoenix, Arizona, and the associated crash data from the Arizona Department of Public Safety. Chapter 3 contains a more complete description and assessment of these datasets.

64 Application of Big Data Approaches for Traffic Incident Management 4.4.3 Description of Data Pipeline The N2 diagram for the Use Case 4 data pipeline is shown in Figure 38, which illustrates the various components of the system configuration. The data pipeline contains the following func- tions and steps. 1. The Collect function retrieves driver events and vehicle movements from the CV data. The data are retrieved from an external storage service as they become available. The data are stored in their raw formats in cloud file storage. The data are then ready for the Schema-Check function, which helps with data formatting. 2. The Schema-Check function reads the latest available driver event data and vehicle move- ment data as JavaScript Object Notation (JSON), which is a file format for semi-structured data used by the third party to store their CV data. The function then acquires the attribute names and properties from the schema, as well as any specific, detailed attributes related to the record. This step is primarily used to check that the incoming data are structured properly, as expected. If the data are not structured as expected, then the data does not proceed forward to the next step. 3. Once the data schema is identified, the Format function transforms the complex structure of attributes into individual columns. The driver event data and vehicle movement data are then formatted to a structured data table that is sorted by the trip’s journey ID. The Format function also converts timestamps into a consistent format. 4. The Filter function only applies to the driver event data. This function reduces the size of driver event data and filters only the relevant event records, based on the acceleration status of the vehicle during the journey. The driver events with acceleration types “HARD_ACCELERATION” and “HARD_BRAKE” are deemed most relevant to this use case. The Filter function helps to link potential driver events to the crash data and helps the query run more efficiently by reducing rows that do not meet criteria. 5. The Distance-Calculate function runs queries to match the relevant driver event data and vehicle movement data to the crashes, based on the time of the crash and the distance between the location of the driver event or vehicle movement and the crash location. For each crash Figure 38. N2 diagram for Use Case 4.

TIM Big Data Use Cases 65 (in the crash data), the Distance-Calculate function uses the latitude and longitude from the datasets to calculate the distance between the crash in the crash data and CV data that are within 10 minutes before and after the crash occurred. The Distance-Calculate function is based on the Spherical Law of Cosines formula: distance_ft = acos(sin φ1 • sin φ2 + cos φ1 • cos φ2 • cos Δλ) • R • 3.2808399 where φ = latitude in radians, λ = longitude in radians, and R = Earth’s mean radius (6,371,000 meters). Once the distance is calculated (with R ⋅ 3.2808399 converting meters to feet), the Match function runs immediately as the second query to filter the driver events and vehicle move- ments that are within half a mile of each crash. 6. The Correlation-Explore function takes the results from Step 5 and explores the possible cor- relations between the crash and its matched CV data. This step only retains the CV data points correlated to the crash at the end, and it passes the refined results to the Display function. 7. The Display function takes the refined results from Step 6 and sends the data to an interactive dashboard for users to view the correlation within a GIS application. 4.4.3.1 Pipeline Workflow Description The data pipeline is shown in Figure 39. While similar to the pipeline technical blueprint, this pipeline diagram presents the steps of the workflow in a more practical example, showing how the various steps are performed and which systems they occur in. Figure 39 contains the various components of the example system configuration used in the development of this use case. The workflow of the data pipeline includes steps to collect, format, filter, and match the data, then explore and visualize the correlations between the crash and CV datasets via dash- boards. These workflow steps are described in the following list. 1. Retrieve the third-party CV data. The data-retrieval function retrieves the available data from an external cloud storage service. This process executes the external-data retrieval Third-Party Connected Vehicle Data Figure 39. Data pipeline for Use Case 4.

66 Application of Big Data Approaches for Traffic Incident Management requirements, which may include encryption and authentication procedures so that only those with adequate licensing and rights can access the data. For this reason, this step may require coordination with IT staff to perform the secure data retrieval. All other subsequent steps performed within the cloud environment can be done by the data/business analyst performing the analysis steps. 2. Data format and filter. The incoming raw data files are read and loaded into a cloud envi- ronment for initial data processing. The data schema is obtained to understand the complex structure of the column attributes, as well as to help transfer the datasets to a structured data table by flattening attributes into individual columns. The timestamp attribute of the datasets is converted from UTC to the appropriate local time of the study area based on the time zone information provided. Once the timestamp conversion is completed, the CV data are ready for the next step of processing. Sometimes, additional filtering based on column attributes can help improve the efficiency and accuracy of matching to crash records. For example, the driver event dataset can be further filtered by acceleration type to keep only the driver events that are coded as either hard braking or hard accelerating in order to simplify the matching process. 3. Data matching and correlation identification. In this step, the data undergo a matching process with internally available crash data. The CV datasets are matched to individual crash events if the driver events or vehicle movement data points are within the predefined time range and distance range of the crash data. Once a set of driver events or vehicle movement data points has been matched to corresponding crashes, the process of identifying matches can begin. These matched CV events are then evaluated more closely, which includes observing the reported time of the crash as well as the reported clearance time for the crash, analyzing speed changes surrounding the crash, and determining if the impact of the crash on opera- tions can be estimated. 4. Hosting data. The matched crash and CV data are added to the GIS platform environment. The data stored in this environment are pushed to the cloud, with access and permissions handled by the connection to that environment. The data are served in a restful capacity and can be queried or used in multiple other dashboards. 5. Display. An interactive dashboard allows users to visualize the matched driver events and crashes along with performance metrics associated with the crashes. 4.4.4 Data Products An interactive dashboard is a suitable tool to allow end users to quickly visualize the matched CV and crash data for this use case. While the dashboard for this use case is designed to present historical correlation, a real-time dashboard could be used in the same way, with events and general traffic metrics displayed on an interactive map. As part of this use case, two prototype dashboards were developed with a series of capabilities designed to allow users to explore information associated with the matched CV and crash data: 1) CV Movement Data on Crash Detection Dashboard and 2) CV Driver Event Data on Crash Detection Dashboard. 4.4.4.1 CV Movement Data on Crash Detection Dashboard The CV Movement Data on Crash Detection Dashboard (see Figure 40) is intended to help users visualize the disruption on the road after a crash occurred. Users can explore changes in speed from before to after the crash. Sections 1–5, identified on the dashboard in Figure 40, display the following information. • Section 1: Users navigate the dashboard by first selecting a crash location from the preloaded list of crashes and then selecting a period relative to the crash (i.e., before the crash, at the time

TIM Big Data Use Cases 67 of the crash, or after the crash). Once a user selects a crash location, the dashboard displays the information associated with that location. • Section 2: Users can view detailed crash-related information, such as the geolocation, reported time of the crash, time the crash was cleared, type of crash, and other crash characteristics. • Section 3: The map automatically zooms to the selected crash location. The map also shows the vehicle movement data points that matched to the crash, which are defined as data points within 10 minutes before and after the crash and within half a mile of the selected crash loca- tion. The dashboard depicts each vehicle movement data point within a red-to-green color scheme based on the associated speed; red represents slower speeds, and green represents faster speeds. • Section 4: This section provides average speeds for the associated vehicle movement data points on the road in proximity of the selected crash location (i.e., within half a mile, 10 minutes before and after). • Section 5: A bar graph and data table (users can toggle between the two) show the vehicle speed for individual matched vehicle movement data points by timestamp, from 10 minutes before to 10 minutes after the crash. Using Sections 3 and 4 of the dashboard, users can understand the variables of disruption in road operations before and after a crash occurs (shown in Figure 41). This figure (which is zoomed into one of the crash locations) compares the crash condition 10 minutes before the crash and approximately 10 minutes after the crash. Section 4 of the dashboard shows a drop of 10 kilometers per hour from 10 minutes before to 10 minutes after the crash. The change in speed in the vehicle movement data can help detect a crash. In this case, multiple vehicle movement data points are red (speeds in the range of 0 to 5 kph) upstream of the crash location 10 minutes after the crash occurred. The CV Movement Data on Crash Detection Dashboard can assist agencies in establishing baseline data on how the transportation network is affected by a crash as it occurs. This dashboard 1 2 4 3 5 Connected Vehicle Movement Data on Crash Detection Figure 40. CV movement data on crash detection dashboard prototype.

68 Application of Big Data Approaches for Traffic Incident Management relies on known crash locations so that associated changes in system performance can be used to develop performance thresholds to identify crashes in real time. 4.4.4.2 CV Driver Event Data on Crash Detection Dashboard The CV Driver Event Data on Crash Detection Dashboard is intended to allow users to explore the information associated with the matched CV driver event data and crashes. The dashboard displays matching driver events at the time of the crash and any hard brake or hard acceleration that drivers performed because of the crash. Sections 1–5 of the dashboard, as shown in Figure 42, display the following information. • In Section 1, users begin by selecting a crash location and the acceleration type (hard brake or hard acceleration). • Section 2 provides information for the selected crash. • Section 3 displays a map that is zoomed in to the selected crash location and driver events. (The driver events are those within 10 minutes before or after the crash and within a half mile of the crash location.) • Section 4 provides statistics on the number of matched driver events that are hard brakes or hard accelerations. It also provides timestamps of the crash and the matched driver event. • Section 5 provides the speeds associated with the matched driver events. A data table to the right of Section 5 (toggle display with bar graph) displays the details of the individual matched driver events. This dashboard provides users with the opportunity to further explore how CV data relate to the crash data. At this time, identifying vehicles involved in the actual crash is unlikely due to low market penetration of CVs. In the future, however, this capability is expected to become increasingly possible as market saturation of CVs increases. While identifying a specific vehicle involved in a crash is difficult at this time, using vehicles in proximity to the crash that experience hard braking or other actions presents opportunities to identify events related to the crash on the dashboard. Avg Speed Before Crash 56.2 KPH Avg Speed After Crash 46.3 KPH 10 Minutes Before Crash 10 Minutes After Crash Figure 41. CV movement data points, 10 minutes before and after crash.

TIM Big Data Use Cases 69 4.4.5 Lessons Learned During the development of this use case, there were several lessons learned while interacting with the data in a cloud environment and general data pipeline: • Immediately after a crash, road network performance may be impacted near the crash loca- tion and then can begin to spread upstream of the crash, in the opposite direction of the crash, and onto connecting roadways. However, detected changes in performance may lead to falsely identifying that a crash has occurred when, in fact, it has not. For example, detected changes may indicate a near-miss event, rather than a crash. To address this challenge, third- party CV event data could be used in addition to other data sources, such as detector or probe speed data, to help verify that a crash has occurred. For example, if a CV event (e.g., airbag deployment, hard braking) occurs along with a vehicle’s sudden departure from expected speed (via probe data), this increases the likelihood that a crash has occurred. • The recorded times and geolocations of crashes are not 100 percent accurate, which can make it harder or impossible to accurately match the data. • While combining driver event data and vehicle movement data into one dataset could help to identify a relationship between the CV data and crash data, this is challenging to execute because these data purposefully lack established linkages (for privacy). Vehicle movement data only contain speeds, and speeds are only available within the driver event data if there is a hard braking or hard accelerating event. • Many crashes could not be linked to CV data, demonstrating the lack of CV market penetration. The prototype dashboards developed for this use case could be enhanced to provide the fol- lowing functionalities for future use cases or implementations. • Color symbology to represent raw speeds reported on the network based on timestamps. For real-time support, a filter to show only points within a 15-minute time frame could provide a view of the network. Connected Vehicle Driver Event Data on Crash Detection 1 2 4 3 5 Figure 42. CV driver event data on crash detection dashboard prototype.

70 Application of Big Data Approaches for Traffic Incident Management • Color gridded symbology of average speeds reported (raw) versus expected speeds. This sce- nario would require either a polygon- or polyline-based geographic area, in which all points contained would be averaged to reflect an average speed over a given period. Tracking a given period would allow for trends to be established based on past data for the selected geographic area versus the time observed. This would then allow values outside of an expected range to be highlighted, which would indicate that the network is experiencing conditions that may require action. • Trigger threshold-based alerts from the incoming data or trends to alert operators on a dash- board (via alert messages) that conditions have met certain thresholds.

Next: Chapter 5 - Estimated Costs of Cloud Environments and Data Pipelines »
Application of Big Data Approaches for Traffic Incident Management Get This Book
×
 Application of Big Data Approaches for Traffic Incident Management
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Big data is evolving and maturing rapidly, and much attention has been focused on the opportunities that big data may provide state departments of transportation (DOTs) in managing their transportation networks. Using big data could help state and local transportation officials achieve system reliability and safety goals, among others. However, challenges for DOTs include how to use the data and in what situations, such as how and when to access data, identify staff resources to prepare and maintain data, or integrate data into existing or new tools for analysis.

NCHRP Research Report 1071: Application of Big Data Approaches for Traffic Incident Management, from TRB's National Cooperative Highway Research Program, applies the guidelines presented in NCHRP Research Report 904: Leveraging Big Data to Improve Traffic Incident Management to validate the feasibility and value of the big data approach for Traffic Incident Management (TIM) among transportation and other responder agencies.

Supplemental to the report are Appendix A through Appendix P, which detail findings from traditional and big data sources for the TIM use cases; a PowerPoint presentation of the research results; and an Implementation Memo.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!