National Academies Press: OpenBook

Data Management and Governance Practices (2017)

Chapter: Chapter Three - Review of Literature on Data Management and Governance

« Previous: Chapter Two - Review of Literature on Transportation Data
Page 12
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 12
Page 13
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 13
Page 14
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 14
Page 15
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 15
Page 16
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 16
Page 17
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 17
Page 18
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 18
Page 19
Suggested Citation:"Chapter Three - Review of Literature on Data Management and Governance." National Academies of Sciences, Engineering, and Medicine. 2017. Data Management and Governance Practices. Washington, DC: The National Academies Press. doi: 10.17226/24777.
×
Page 19

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

12 NCHRP Report 666 (Cambridge Systematics, Inc. et al. 2010) defines data management as “the development, execution and oversight of architectures, policies, practices, and procedures to man- age the information life-cycle needs of an enterprise in an effective manner as it pertains to data collection, storage, security, data inventory, analysis, quality control, reporting, and visualization.” The Inter national Organization for Standardization (2003) offers a more concise definition of data man- agement as “the activities of defining, creating, storing, maintaining and providing access to data and associated processes in one or more information systems.” This synthesis focused on data governance, integration, sharing, warehousing, and quality. Data Governance Data governance deals with ensuring that the data are managed properly. It is the establishment, execution, and enforcement of authority over the management of data assets (Cambridge Systematics, Inc. et al. 2010; Ladley 2012). The terms “data governance,” “data management,” and “data business planning” are often used interchangeably or as components of one another (Stickel and Vandervalk 2014). Ladley (2012) suggested there should be a distinction between managing data (i.e., data man- agement) and ensuring that data are managed properly (i.e., data governance) (Figure 6). Transportation data are maintained by various business units (often called “data management areas”) within transportation agencies. NCHRP Report 814 (Spy Pond Partners, LLC and Iteris, Inc. 2015a) uses the terms “data management area” and “data program” interchangeably, defining them as an organizational function that is responsible for scoping, collecting, managing, and delivering a par- ticular category or form of data. NCHRP Project 08-36 (Task 100) proposed a framework and conceptual design to develop a resource to help transportation agencies assess the adequacy, direction, and manage- ment of their data programs. Task 100 grouped transportation data into seven categories in developing the framework: travel data, system inventory data, system condition data, safety data, operational data, financial data, and customer relations data. NCHRP Report 814 (Spy Pond Partners, LLC and Iteris, Inc. 2015a) provides a guidebook for transportation agencies to implement the self-inspection process, including self-assessment case studies of data management programs at Michigan DOT and Utah DOT for specific business areas: mobility/congestion, facilities management, maintenance, project scoping, and design. That guidebook can be useful for evaluating and improving the value of data for decision making and data-management practices. In most cases, data governance is in the early stages of implementation; thus, its long-term ben- efits have not been measured. However, interviews conducted as part of this study with a sample of transportation agencies indicated that key motivations and early benefits of implementing data governance include: 1. Improved accountability to produce high-quality and reliable data (sources of truth). 2. Ensuring that the data are accessible and integrated using a common linear referencing system. 3. Engaging business areas within transportation agencies in their data, rather than viewing data as strictly an information technology (IT) issue. During the past 10 years, several state DOTs have developed data business plans that describe data governance procedures, bodies/roles, and responsibilities. A recent TRB peer exchange empha- chapter three review of Literature on Data ManaGeMent anD Governance

13 sized that data governance models be evaluated and assessed periodically (Hall 2015). An overview of data governance practices in three agencies (Florida DOT, Minnesota DOT, and U.S.DOT) is provided next. florida Dot Florida DOT (FDOT) is implementing the ROADS (reliable, organized, and accurate data sharing) initiative for enterprise information management and data governance. The goal is to improve data reliability and simplify data sharing across FDOT so the agency has readily available and accurate data to make informed decisions. In the FDOT ROADS project, data governance is the practice of managing information assets and realizing value with a set of standards, processes, and technologies executed through a well-defined governance structure to achieve business goals and objectives. As part of the ROADS project, a list of data/information gaps was identified throughout FDOT. The ROADS project consists of key elements related to people, process, and technology developed to address and close those data/information gaps. The following sections provide an overview of each of these key elements: People: The FDOT ROADS project includes a data governance body consisting of a data governance steering committee, enterprise data stewards, data stewards, and data custodians. The responsibilities for each group are outlined in Figure 7. Data Life Cycle FIGURE 6 V-shaped illustration used to distinguish between data governance and data management. Source: Adapted from Ladley (2012). Steering Committee Enterprise Data Stewards Data Stewards Data Custodians • Technical-focused individuals focused on the day-to-day execution of the governance rules and management activities • Receive direction and guidance from the responsible data steward • Business-focused individuals accountable for data integrity/quality • Recommend operational changes needed to improve data governance • Data governance oversight • Provide strategic direction to the organization • Facilitate cross-subject area/cross-business unit priorities and projects • Lead the data steward working group meetings to better understand data/information issues and get agreement within the business functions • Act as champions of data governance within their program/function area FIGURE 7 FDOT’s data governing body.

14 Process: The FDOT data governance initiative consists of six categories of data governance processes/procedures, as follows: • Business needs assessment: These procedures deal with collecting and documenting business requirements for each new business intelligence (BI) solution or data-related enhancement. BI technologies transform daily operational data into information that facilitates decision making. For instance, a roadway maintenance management system is a BI solution. • Data standards update: These procedures are for updating the data and metadata standards based on ad hoc feedback from business users. Examples include adding, changing, or deleting data or metadata items. • Data standards approval and maintenance: These procedures are for adding new standards, reviewing current ones, or deleting old and obsolete ones in response to requests submitted by business users. • Education/Data guidance: These procedures are for providing training and guidance in response to requests submitted by business users. • Quality monitoring: These procedures deal with establishing data quality agreements that specify the expected level of data quality, profiling data to determine base quality, cleansing data, and monitoring data entities to ensure quality agreements are kept. • Road map: FDOT is organized by functions, and each function should have a road map that will align to the department’s high level road map. The data process/procedures identified are applied to the data governance components shown in Figure 8. A similar model is adopted by the Data Management Association International (Sullivan and Stickel 2015). Technology: As part of the ROADS initiative, tools and technologies will be implemented to support the ROADS goal of leveraging and sharing data across the agency to help FDOT make better informed decisions. These capabilities are needed to support the rollout of the ROADS data gover- nance initiatives that are under way and critical for FDOT to improve the quality and accessibility of data. These tools will support master data management; metadata recording and sharing; extract, transform, and load operations; and reporting efforts across the enterprise. FIGURE 8 FDOT’s data governance component model.

15 Minnesota Dot Minnesota DOT’s current data governance structure consists of nine data domains (with a steward identified for each domain) and five to 20 subject areas within each domain (with a data steward identified for each subject area). Table 2 describes these domains. Table 3 describes the subject areas within the infrastructure data domain, as an example, where a steward is identified for each data subject area. Data Domain Domain Description No. ofSubject Areas Business stakeholder/ customer Data on the interface with external stakeholders with whom MnDOT has business or customer relationships and data about internal and external communications 10 Financial Data related to receiving, managing, and spending funds 14 Human resources Data about individual employees 10 Infrastructure Data on the basic facilities that make up or interface with thetransportation system 13 Planning, programming, and projects Data that provide direction for and management of projects 11 Recorded events Data on time-based occurrences that take place on the transportation system or that affect the transportation system 19 Regulatory Data on topics that are controlled or directed by legal requirements 20 Spatial Data that define locations on earth or in space, including GIS, CAD,latitude/longitude, xyz coordinates, sections of roadway, or boundaries 5 Supporting assets Data on all items that affect or support the transportation system (e.g.,building and facility, fleet, communications towers) 12 TABLE 2 DATA DOMAINS USED IN MINNESOTA DOT DATA GOVERNANCE MODEL Subject Area Description Airport data Data on the publicly owned system of Minnesota airports. Bicycle data Data on bicycle facilities within Minnesota’s transportation system, including existing/future data on state bikeways and U.S. bicycle routes, shared-use paths, protected bike lanes, bike lanes, shared lane markings and bicycle boulevards. Bridge data Data on the design, construction, and maintenance of bridges, including bridge condition and load ratings. Data can be contained within Pontisa and structure information management system (SIMS). Drainage structure data Data on hydraulic features such as culverts, channels, storm tunnels, retention ponds, and drains. Interchange, intersection, and section data Data that describe the location of roadway intersections, the location of specific portions (sections) of roadway, and the location of places where two roadways cross (intersect) designed to permit traffic to move freely from one road to another without crossing another line of traffic. Parking facility data Data on the ABC distributor ramps and other facilities in Minneapolis. Rail crossing data Data on the highway rail grade crossings and characteristics where roadways and railroadtracks intersect. Right of way and contaminated property data Data on the acquisition (purchase, lease) and management of real estate/property in transportation corridors or as part of the state rail bank, which is owned by or up for purchase by MnDOT. Roadway data Data on location, jurisdiction, classification, surface type and width, reference points, cross sections, control sections, oversize/overweight/twin trailer routes, and project history for the statewide highway system. Safety feature data Data on the guardrails, median barriers, railings, crash cushions, roadway lighting, rest areas, and similar hardware or facilities that are used to improve safety on the road system. Sidewalk data Data on pedestrian accommodations within MnDOT’s transportation system, including Americans with Disabilities Act (ADA) compliance data on sidewalks, curb walks, and pedestrian bridges. Smooth road data Data on the ride rating (smooth ride) of the roadways. Traffic control device data Data on all signs, signals, markings, and other devices used to regulate, warn, or guide traffic, placed on, over, or adjacent to state trunk highways. Data on all of the devices covered by the Manual on Uniform Traffic Control Devices. aPontis has been updated to bridge management software (BrM). TABLE 3 DATA SUBjECT AREAS wITHIN MINNESOTA DOT INFRASTRUCTURE DATA DOMAIN

16 u.S.Dot In 2013, the U.S.DOT published a data business plan to help achieve two goals (Vandervalk et al. 2013): • Improve the coordination and communication mechanism across U.S.DOT and FHwA offices involved with roadway travel mobility data. • Improve the coordination of the data capture activities associated with sponsored research at the Intelligent Transportation Systems joint Program Office in wirelessly connected vehicle technologies. A key component of the U.S.DOT data business plan was a data coordination framework. The framework defines a set of data management practices (such as data governance, quality, standards, privacy, and security) and stakeholder groups that are responsible for coordinating these practices (Table 4). Data inteGration anD warehouSinG The terms “data warehouse,” “data mart,” and “operational database” are related but refer to different kinds of systems. Because most readers are familiar with the term “database,” the first item of business is to compare data warehouses and data marts to operational databases. An operational database is designed to support day-to-day operations of a particular application and has limited or no analytical capabilities. In contrast, a data warehouse is a repository that integrates data originating from multiple sources and various time frames. The integrated data are organized in a unified schema and reside in a single site. A data mart is a scaled-down version of a data warehouse. Both data warehouses and data marts have data analysis and decision-support capabilities. Figure 9 depicts common architectures for data warehouses and data marts. The bottom tier consists of operational databases that contain data on day-to-day activities and operations of the agency, such as asset inventory and condition, crash records, and traffic counts. Normally, the data in these databases are too detailed and raw to be easily used for decision making. The data warehouse integrates data originating from multiple operational databases and various time frames. The data mart is linked to a single or limited number of operational databases and has fewer data integration and analytical capabilities. For transportation agencies, data marts appear to be more common than Role Responsibility Mobility Data Coordination Group Finalize data coordination framework with input from data working groups and internal community of interest Develop and approve U.S. Mobility Data Coordination Group charter Individual data working groups Infrastructure/Inventory Travel data Climate (weather) data Modal data Connected vehicle data capture Address stakeholder needs related to respective group area Identify and address gaps and redundancies in respective group area Devise “rules of engagement” regarding collaboration and coordination Develop data standards and stewardship recommendations for consideration by the U.S.DOT Mobility Data Coordination Group Community of interest—internal Coordinate with the data working groups to: Address data gaps and overlaps Share current activities and best practices in data management Coordinate resources and cost sharing strategies to reduce redundancy in data collection, integration, and data systems Facilitate sharing of data with internal/external stakeholders Identify how current and planned data from the connected vehicle initiative can support existing roadway travel mobility data programs Identify how data from roadway travel mobility data programs within U.S.DOT and FHWA can support the connected vehicle initiative Identify existing/future data inventory and data structures/policies/ governance practices that could be applicable to the Research Data Exchange Community of interest—external Not defined TABLE 4 STAkEHOLDERS DEFINED IN U.S.DOT DATA BUSINESS PLAN

17 enterprise data warehouses. The data repository in a warehouse or mart (middle tier) is constructed through a process of data cleaning, integration, transformation, loading, and periodic refreshing (Han et al. 2012). These processes are defined as follows (Han et al. 2012): • Data extraction: gathering data from multiple, heterogeneous, and external sources. • Data cleaning: detection of errors in the data and rectifying them when possible. • Data transformation: conversion of data from legacy or host format to warehouse format. • Data loading: sorting, summarizing, consolidating, checking integrity, and organizing the data in a unified schema. • Refreshing: propagation of updates from the data sources to the warehouse repository. The top tier of this architecture consists of data processing and analysis tools, including: • Information processing: The warehouse or mart processes the data by means of querying, basic statistical analysis, and presentation (e.g., tables, graphs). • Analytical processing: The warehouse or mart processes the data by means of online analytical processing—that is, analysis techniques with functionalities such as summarization and drilling down. For instance, one can drill down on yearly weather data to obtain monthly data. Similarly, one can roll up on performance data stored for roadway sections to obtain data summarized by county, district, or state. • Data mining: The warehouse or mart is equipped with in-depth data mining capabilities, such as data clustering, outlier detection, and prediction. Transportation data warehouses and marts often are equipped with GIS capabilities for visual- ization and spatial analysis. For example, Utah DOT (UDOT) uses the ArcGIS Online platform to access and share transportation data through the agency’s open data portal (UGATE) and mapping application (UPlan) (http://uplan.maps.arcgis.com/home/index.html). UPlan contain multiple data categories, including safety and crash, roadway functional classification, access categories, mainte- nance stations, structure and bridge locations, planned and current construction projects, mile posts, pavement management, transit vehicles and dispersed funding, fiber-optic network, and freight Multiple Operational Databases Extract Clean Transform Load Refresh Data Repository Maps Analysis Reports Query OLAP Mining Metadata (Data Warehouse) Limited Operational Database Clean Transform Load Refresh Data Repository Maps Analysis Reports Query OLAP Mining Metadata (Data Mart) FIGURE 9 Architecture for data warehouses (left ) and data marts (right ).

18 planning and operations data. Other state DOTs have embraced this approach for data access and sharing (e.g., Arizona, Florida, kansas, Idaho, Montana, Pennsylvania). In recent years, there has been increased interest in using cloud services to improve data manage- ment. The premise of this approach is that storing data in off-site data centers (the cloud) provides a degree of standardization and access that often is difficult to achieve in on-site data warehouses. Cloud computing resources are provided to individuals or organizations remotely through the Internet rather than directly on one’s own computer. Some of the benefits of cloud computing include (Lei et al. 2012): • Integrated computing and storage: The cloud computing model integrates computing power and storage. Computing resources can be abstracted from agencies. This eliminates the burdens of setting up hardware and software to store collected/generated data and perform computations. • Ease of information provision: The degree of standardization and access offered by the web-based service model facilitates data integration and information sharing within agency and across agencies. • Scalable and customized computing: Cloud computing provides a flexible storage and comput- ing environment that allows agencies to rent storage and computing power as the need for such services fluctuates. • Performance and security: Cloud computing service providers address many of the vital perfor- mance and security issues that ensure data integrity. Agencies can focus on using data maintained in clouds for business delivery. Data QuaLity Data quality is a multidimensional concept (wang et al. 2001; Lee et al. 2002; Batini and Scannapieca 2006). Accuracy, timeliness, consistency, and completeness are examples of these dimensions. The literature consistently organizes these quality dimensions in four categories: intrinsic, contextual, accessibility, and representational (wang and Strong 1996; Pipino et al. 2002; Hazen et al. 2014). Intrinsic dimensions (e.g., accuracy) describe the quality of objective and native data. Contextual dimensions (e.g., relevancy) are dependent on the context in which the data are used. Representational dimensions refer to data understandability and conciseness. Accessibility refers to data sharing and security. Figure 10 shows the data quality dimensions considered in this study. FIGURE 10 Data quality dimensions considered in this study. Accessibility Consistency Access Security Timeliness Completeness Relevancy Accuracy Quality Data

19 Previous studies (such as NCHRP Report 666 and NCHRP Report 814) suggest that the use of structured methods and instruments for gathering feedback from data users and data managers across agencies can help improve the quality of data maintained by transportation agencies. Depending on the size of the agency, methods that can be used include surveys, focus group meetings, data pro- gram workshops, and research studies (Cambridge Systematics, Inc. et al. 2010). NCHRP Report 814 (Spy Pond Partners, LLC and Iteris, Inc. 2015a) provides a detailed self-assessment guide and tools for continuing data improvement.

Next: Chapter Four - State Departments of Transportation Practices and Experiences »
Data Management and Governance Practices Get This Book
×
 Data Management and Governance Practices
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Synthesis 508: Data Management and Governance Practices develops a collection of transportation agency data management practices and experiences. The report demonstrates how agencies currently access, manage, use, and share data.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!