National Academies Press: OpenBook

Quality and Accuracy of Positional Data in Transportation (2003)

Chapter: Chapter 2 - Findings

« Previous: Chapter 1 - Introduction and Research Approach
Page 6
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 6
Page 7
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 7
Page 8
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 8
Page 9
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 9
Page 10
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 10
Page 11
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 11
Page 12
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 12
Page 13
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 13
Page 14
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 14
Page 15
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 15
Page 16
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 16
Page 17
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 17
Page 18
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 18
Page 19
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 19
Page 20
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 20
Page 21
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 21
Page 22
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 22
Page 23
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 23
Page 24
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 24
Page 25
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 25
Page 26
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 26
Page 27
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 27
Page 28
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 28
Page 29
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 29
Page 30
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 30
Page 31
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 31
Page 32
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 32
Page 33
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 33
Page 34
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 34
Page 35
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 35
Page 36
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 36
Page 37
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 37
Page 38
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 38
Page 39
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 39
Page 40
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 40
Page 41
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 41
Page 42
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 42
Page 43
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 43
Page 44
Suggested Citation:"Chapter 2 - Findings." National Academies of Sciences, Engineering, and Medicine. 2003. Quality and Accuracy of Positional Data in Transportation. Washington, DC: The National Academies Press. doi: 10.17226/21953.
×
Page 44

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

6CHAPTER 2 FINDINGS 2.1 SPATIAL DATA CHARACTERISTICS 2.1.1 Introduction “Spatial data” refer to information that is referenced to a geographic location on the earth and includes the three dimen- sions of space, time, and theme (where-when-what) (1, 2). Spatial data include information that represents the geographic position of features as well as descriptive information about those features. Nearly all transportation data are, or can be, geographically referenced. Geographic Information Systems (GIS) provide an effective way to manage and integrate the spatial data necessary for the planning, design, construction, analysis, operation, maintenance, and administration of trans- portation systems and facilities. Transportation agencies use spatial data to locate or describe events on a transportation sys- tem. The spatial representation of a network can be expressed in one, two, or three dimensions. All spatial data can be char- acterized and defined as one of three basic feature types: points, lines, or areas, which are described as follows (1, 3): • Points refer to data associated with a single location in space and, because of the scale of the map, are repre- sented by symbolic points, rather than by an areal dimen- sion. Examples of point data include wells, post boxes, and lampposts. • Lines refer to data represented by a one-dimensional (1D) line and are described by a string of spatial coordinates. Examples of line data include roads, railways, rivers, and pipelines. • Areas refer to data represented by a common string of spatial coordinates, homogeneous zones as defined by natural property or categories, or alternatively artificial units used for thematic representation or management purposes. Areas are also commonly referred to as poly- gons. Examples of area data include land-use zones, soil classification areas, administrative boundaries, and cli- mate zones. The demand for spatial data in GIS for transportation applications has grown exponentially since 1990. A lack of spatial data is no longer an issue limiting GIS-T applications. The main issues are quality, sensitivity, and long-term (cumu- lative error) effect of both transforming a linearly referenced one-dimensional data model to its cartographic representation and transforming two-dimensional (2D) and three-dimen- sional (3D) data models to linear representations relative to the network. Every year more spatial data become available. In the past, most spatial data originated from government sources. Recently, however, more and more spatial data are being pro- duced by the private sector, for both specific projects and resale. Some of the existing historical spatial data collected and maintained by government agencies are now considered to be of low quality and inconsistent with data available from newer technologies (4). Unknown spatial data accuracy is becoming an increas- ingly common problem. Unknown data quality leads to ten- tative decisions, increased liability, and loss of productivity. Conversely, decisions based on data of known quality are made with greater confidence and are more easily explained and defended (5). The quality of spatial data is becoming increasingly important as large databases are created for access and exchange by many individuals. The recognition, evalua- tion, and resolution of errors associated with spatial data are important issues that, until quite recently, have received little attention, as the problems of error and accuracy were largely unknown (3, 5). Users are becoming increasingly concerned about the quality and reliability of spatial data. There is a need for good quality spatial data, where the term “good quality” is defined by the data’s specific application as well as other information concerning the data quality. Differ- ent users have different perceptions as to the importance of error and accuracy, as the value of spatial data depends on its fitness for a particular purpose. The critical measure of that fitness for use is quality (3, 5). Despite the importance of hav- ing information about data quality, information on the accu- racy and reliability of spatial data is generally poor or nonex- istent. Unfortunately, determining and ensuring the accuracy and integrity of spatial information is complicated (6). Quality assurance is a basic requirement for performing an application reliably, and any application performed using spatial data should be accompanied by a detailed evaluation of the quality. This evaluation will help to determine if the data adequately represent the information needed to answer the question raised by the application. The quality of spatial data should be assessed and reported as part of each spatial data file of information. In addition, a comprehensive state- ment of data quality should accompany the transfer of all spa-

7tial data. As well, the quality of spatial data used in any analy- sis should be passed on to the consumer of that analysis. Dif- ferent data types will tolerate different margins of error and accuracy depending on their specific application (3). The concern for spatial data quality has increased in recent years because of factors, such as the following (2): • Increased data production by the private sector and non- government agencies, which are not governed by uni- form quality standards (production of data by national agencies has long been required to conform to national accuracy standards); • Increased use of GIS for decision support, highlighting the implications of using low-quality data, including the possibility of litigation; and • Increased reliance on secondary data sources, because of the growth of the Internet, data translators, and data trans- fer standards, making poor quality data ever easier to get. 2.1.2 Linear Referencing Transportation data are usually referenced to highway net- works by using a 1D linear referencing model. With this model, objects along a network are located using a set of known points on the network and distances and directions from the known points to the objects. All linear referencing methods are based on this concept (7). Many transportation agencies use various spatial measurement techniques to describe events linearly or locate events of the “network pro- file” and “point profile” spatial components. This technique is commonly known as the Linear Referencing Method (LRM), defined as a “way to identify a specific location with respect to a known point.” A Location Referencing System (LRS) is “a set of office and field procedures that include a highway location reference method” (8). Theoretical models for referencing linear objects typically use combinations of one, two, or three independent concepts that “anchor” the linear objects to reality. These three ele- ments are (1) an identifier, (2) a physical linear extent with- out a reference point, and (3) a temporal linear extent (9). The basic structural variations (data elements) of linear ref- erencing methods’ theoretical models are largely a function of the event measurement techniques in locating a point or linear object. The event can be identified as an offset from known points or by a series of “control” or reference points (10). Var- ious methods for linear location referencing in transportation have come about because state DOTs need to know where objects and attributes are located on roadways. These road- ways can be conveniently modeled as linear features, allowing the application of linear location referencing (11). 2.1.2.1 Linear Referencing Methods The primary objective of any highway location referenc- ing method is to provide a means for designating and record- ing the geographic position of specific locations on a high- way and for using designations as a key to stored information about locations. A method’s planned application determines its most significant characteristics. Three elements common to all location referencing methods are (1) identification of a known point, (2) measurement from the known point, and (3) direction of measurement (8). Various measuring systems as well as referencing methods are available to state DOTs. The critical difference between various linear referencing methods is their respective mea- surement techniques. The event can be identified as an offset from a known point or by a series of reference points. Linear location referencing methods commonly used by transporta- tion agencies include route-mile-point, route-reference-post- offset, route-mile-post-offset, and methods based on link-node models (8, 12). Adams et al. (11) described the various LRMs as well as their advantages and disadvantages. Data col- lected using one of the LRMs may not be suited for appli- cations based on another method. The inability to relate and/or cross reference information results in the effective loss of information. These LRMs use an offset distance along a highway from a known beginning point to define the position of interest. Such items include attributes of the road and features that exist as part of the road or adjacent to it. Typical attributes include speed limit, pavement type, functional class, traffic volume, number of lanes, and jurisdiction. Common features include intersections, bridges, signs, and guardrailing. Both attributes and features may be of the point or linear type. A point data item is located using a single offset distance, while a linear data item is located using a pair of offset distances (beginning and ending). 2.1.2.2 Linear Referencing System (LRS) Data Model This section provides an overview of existing LRS data models that serve as a guide to state DOTs in developing their LRS. The NCHRP 20-27(2) LRS Data Model. The NCHRP 20-27(2) linear LRS data model was developed in response to a growing awareness of the need to integrate increasing amounts of linearly referenced data used by the transportation community (10). The NCHRP 20-27(2) data model includes multiple linear location referencing methods, multiple carto- graphic representations, and multiple network representations. Data integration is supported through transformations among methods, networks, and cartographic representations by asso- ciation with a central object, referred to as a “linear datum.” The conceptual model for the LRS in NCHRP 20-27(2) (10) was designed to meet four basic functional requirements: (1) determination of unknown locations of items of interest in the field, (2) positioning of these items in location-referenced

databases, (3) placement of these items of interest in the field at known locations, and (4) transformation of linear location references among various methods. The model was intended to be generic so as to support as many applications as possible. Therefore, a fundamental “lowest common denominator” was sought as the generic basis for multiple LRMs, because it forms the functional “heart” of a true LRS. The NCHRP 20-27(2) LRS data model was created to facil- itate sharing linearly referenced data across modes and agen- cies. It provides the framework to manage and transform lin- early referenced data. The central notion is a linear datum that supports multiple cartographic representations (at any scale) and multiple network models (for various application areas). The datum consists of anchor points and anchor sections con- necting these points. It also provides the fundamental refer- encing space for transformations among various LRMs, net- work models, and cartographic representations (10). The Dueker-Butler GIS Data Model. The Dueker-Butler LRS data model covers a broad set of business rules for all modes of transportation and a wide range of applications (13, 14). The LRS components generally fall into four classes: (1) the geographic network, (2) cartography, (3) the trans- portation network, and (4) transportation topology. The model was designed to accomplish the following goals: • Accommodate the basic forms supported by GIS: point, line, and area (the model focuses on attributes); • Express point and area features in terms of their rela- tionship to linear features in transportation databases; • Support fixed- and variable-length segmentation schema; • Support the four functional requirements of the 20-27(2) model (10) ; • Express functional requirements as business rules (data and process requirements) (13, 14, 15, 16, 17); and • Support non-transportation features of the point and area types, and add a mechanism for expressing the location of linear transportation feature attributes using real-world (2D and 3D) coordinate systems (13). Generalized Model. The generalized model is a simplifi- cation of the NCHRP 20-27(2) (18) conceptual data model. The lower four levels of the NCHRP 20-27(2) model are all composed of linear elements. LRMs use traversals, networks have links, the linear datum has anchor sections, and the car- tographic representation has lines. If the constraint against locating events directly on a link, anchor section, or line is relaxed, the NCHRP 20-27(2) model can be compressed into a two-level model. The generalized model has the following characteristics: • It de-couples linear element types from measurement methods so that measurement methods may be applied to multiple linear element types. 8 • It generalizes the concept of a linear element in order to enable links and anchor sections to be treated as traversals. • It allows event locations to be specified against any lin- ear element. • It formalizes the concept of a location expression as the combination of an LRM (measurement method and lin- ear element type), linear element instance, and distance expression. • It formalizes the concept of distance expression. • It enables the generalization of the translation process between locations, linear elements, or LRMs. The generalized model has been reduced to two levels by realizing similarities between each of the four lower levels of the NCHRP 20-27(2) model. According to the generalized model, the networks are not required as an intermediary between event locations and the linear datum, as long as trac- ing is not required or if the linear datum is complete with respect to connectivity. The Network, Linear Datum, and Car- tographic Representation levels become LRMs, with linear elements that can directly support event locations. Though not mandatory, the linear datum LRM is still recommended in order to simplify the translation between multiple LRMs (18). Scarponcini (19) suggested the introduction of a more robust location expression (LX) that provides an association of an event to a location by applying LX = (LRM, LE, DX), where LRM is the linear referencing method, LE the linear element, and DX a distance expression native to the referenc- ing method. In this discussion, the author also mentions the possibility of extending the model to support lateral offsets as well as temporal information. This generalized approach is also suitable for the inclusion of an uncertainty component, which is discussed in the subsequent section. 2.1.3 Summary of State Practices Information presented in this section is based on a survey of state DOTs and other spatial data users. In all, 33 state DOTs and 3 other organizations were surveyed. The response rate to the survey was just under 30 percent. However, only a few of the respondents provided useful information for the purposes of this research. The information from the survey is useful in providing a reflection of the state practices. The fol- lowing subsections summarize state practices with regard to the characteristics of spatial data, LRMs, and transformation methods. A discussion of current and emerging transporta- tion applications of spatial data by state DOTs is presented in Section 2.3 of this report. 2.1.3.1 General Characteristics of Spatial Data Spatial data used by state DOTs and other transportation authorities such as Oak Ridge National Laboratory (ORNL)

come from different sources and are used for various applica- tions. Although some states are quite advanced in using new data collecting techniques, others rely on traditional methods. The base maps of the highway network maintained by state DOTs were digitized from 124,000 scale aerial photographs, U.S. Geological Survey (USGS) digital ortho quarter quads (DOQQs), or coordinate geometry (COGO). The following sections summarize information on the characteristics of spa- tial data maintained by various state DOTs. The databases, as well as line, point, and area data profiles, are described. The data source and the measuring methods are also identified. Arizona. Arizona Department of Transportation (ADOT) has three main spatial data products, each containing a specific type of information. One database contains the state highway network data at a scale of 15,000. The data were captured with a GPS device with desired resolution in the range of 1 to 3 m. This database is used for highway inventory, planning, and safety applications. The tolerance level of these applica- tions is 12 m. The second database contains a local streets net- work for the entire state, at 16000 to 124,000 scale. The data were collected using photogrammetry and GPS. The desired resolution is 3 to 15 m with a tolerance of 15 m. This data- base is used for planning and safety applications. The third database contains highway markers at 112,000 scale. GPS and photogrammetry are used in collecting these data with desired resolutions in the range of 3 to 7 m and 12 m toler- ance. This database is also used for highway inventory, plan- ning, and safety applications. ADOT’s original data from its 1970s system were modified to meet current ESRI software and in-house business needs. ADOT staff are currently using ArcInfo covers and ESRI measured shape files. Iowa. Line data, such as highway network data, come from digital computer-aided design and drafting (CADD) files/ COGO. The original resolution was 1100,000 Digital Line Graph (DLG) data. Information for road plans is obtained from cities and counties, local maps, and local aerial photos. The original resolution of these data was 124,000. Line data are updated yearly with information that varies in accuracy. Rail network information is derived from USGS digital ortho quarter quads (DOQQs). Iowa DOT is currently exploring the possibility of using GPS to capture rail network data. Currently, only a U.S. DOT Bureau of Transportation Statis- tics (BTS) layer is available and it is used for reference. Point data such as crash locations are located using an Iowa DOT-developed location tool. These are located on maps in GIS software with 1100,000 scale. The tools use the map and allow the user to locate the crash on the map. Airport locations are placed on the map cartographically. Polygon area such as boundary information is obtained from various sources and are usually mapped or digitized along with roads. It is assumed that Iowa state boundary information is 1100,000 scale and the city boundaries are 9 more accurate (approximately 124,000). The resolution also varies. Boundaries are very difficult to capture from readily available sources (e.g., aerial photos, satellite). Most infor- mation comes from boundary descriptions that are then dig- itized as accurately as possible. Maine. Examples of events stored in the Maine DOT data- bases are crash locations, assets (e.g., bridges [>20ft.], struts [10 to 20ft.], culverts [<10ft.]), pavement conditions (rough- ness, skid resistance), business signs, road signs, spray prop- erties, traffic counts (per min, hour, day, and truck, car), and intersections. Events are usually measured as a distance from a node. The origins (i.e., the nodes) are considered to have no inaccuracies (i.e., they are considered error-free). Thus, any errors attached to events originate from the mea- surement method. Using accurate distance measuring instru- ments (DMIs), which yield results that are within the given accuracy requirements, basically eliminates these errors. One exception is crash locations, because they are estimated by the responding police officer (crash locations are also stored as distances from nodes). If a study involves crash locations, the resolution interval of the study is set to 0.3 miles, thus addressing the issue of uncertainty in an indirect way. North Dakota. In addition to its GIS base map data, the North Dakota Department of Transportation (NDDOT) main- tains a database inventory, Roadways Information Manage- ment System (RIMS), of roadway features. In that system, NDDOT maintains a location inventory (using dynamic seg- mentation and route-ID/route-measure coding) for all guard- rail, fences, lights, signs, roadway geometry, mile markers, roadway construction, pavement conditions, and numerous other themes in the highway system. These data tables are used within ArcView as event themes referenced against the state’s centerline roadway coverage, which contains the other half of the dynamic segmentation-coding scheme. The design goal for those collecting RIMS information is that the loca- tion accuracy be better than the 1/10-mile offsets used for the route measure and, using differential GPS, within a few feet for orthogonal displacement from the road centerline for all features. Ohio. The main type of spatial data used by the Ohio Department of Transportation (ODOT) is the base network. Most of the data are derived from USGS quads and control points for field survey with DMI with scale 124,000. The current resolution is at 50 ft while the desired resolution is 5 ft. The 1947 highway inventory was collected with mechan- ical DMI. Since then, stations have been converted to mile- posts. The use of electronic DMI is calibrated to base and verified inventories. In addition to the base network, ODOT maintains an inventory of ramps at the same 124,000 scale that is derived from DOQQs. Position of points is determined with GPS and traditional methods (spirit and digital leveling instruments). Because

coordinates are used, scale is unimportant. Data collected during surveys are archived for each project. Crash location data are derived from descriptions by state and local law enforcement agencies. ODOT also maintains a bridge data- base that is used for condition assessment and permit vehicle routing. The majority of data are stored at an accuracy of 52.8 ft. However, much discussion has concerned what accuracy would be appropriate for the DOT considering cost, time, and functionality. The current thoughts are that 52.8 ft should sat- isfy these requirements. Although more accuracy is usually better, increased accuracy has a cost. The other issue is the recent sharing of data with other state agencies and/or local governments. Most of the data received from other agencies is based on 124,000 scale USGS quads. ODOT recently began evaluating the use of GPS for field data collection. Initial results are encouraging. The main issue identified with the use of GPS for the update and main- tenance of the LRS has been how to automate the propaga- tion of the changes throughout the highway network as well as the updating of control point information, both attributes as well as location. Pennsylvania. Pennsylvania maintains two spatial data products, each containing a specific type of information. The first, the Geographic Names Information System (GNIS), containing quad sheets and points, is derived from USGS. The second database, Global Data Technologies (GDT), con- tains addresses and centerlines. The nominal scale for these data products is 124,000. The line data profile includes highway network and waterway data. These are derived from USGS DOQQS and quad sheets. Point data, such as the loca- tion of airports, are collected with GPS by other agencies. Crash location data are estimated by state police with vary- ing precision. Point data for vertical control are collected with traditional geodetic equipment. Polygon data include boundaries and drainage basins. Oak Ridge National Laboratory. Oak Ridge National Laboratory (ORNL) relies on spatial data collected by other agencies and maintains four spatial databases. First, the national highway planning network (NHPN) contains the highway network for the entire United States, Canada, and Mexico. This network has a nominal scale of 1100,000. It is used for planning, routing, and infrastructure applications. Second, the Center for Transportation Analysis railroad network is a rail network for the United States, Canada, and Mexico. The nominal scale for this rail network is 1100,000 and is derived from TIGER, DLG data, and Federal Railroad Administration (FRA) files. This network is also used for planning, routing, and infrastructure applications. Third, the global seaway database contains the waterway network that covers U.S. coastal and inland waters and the world’s oceans. This network is derived from the U.S. Army Corps of Engineers’ National Waterway Network (NWN). The 10 nominal scale is 1100,000 (U.S.) and 13,000,000 (world). This database is used primarily for planning purposes. Fourth, the terminals database contains intermodal facil- ity locations at 1100,000 scale, derived from surveys. This database covers the United States and is used for planning applications. 2.1.3.2 Linear Referencing Methods and Transformation Methods This section discusses the LRMs, LRSs, and transforma- tion methods used by state DOTs. There is no uniformity in the LRMs used by state DOTs in collecting and referencing spatial data for transportation applications. Most states use multiple LRMs. State DOTs tend to rely on in-house trans- formation methods or algorithms that come with the GIS software. Some states use GPS for collecting crash and other point data. Arizona. The ADOT data model (ATISROADS) has the capabilities of linear referencing, dynamic segmentation, and routing. It does not have address or location geocoding capa- bilities. The data model is based on an in-house mainframe system and is modified to include ESRI coverage and route systems. Positional data are not currently collected uniquely for LRS. GIS and LRS are used subsequently for display only, or display and analysis. All files relate to road center- line with offsets and lengths varying from 1 ft to 1 mi. For the purposes of transformation, Arizona builds inter- section tables for each route in a system, with all reference markers (MP) and intersecting features (i.e., roads, jurisdic- tion boundary, drainage, and rail) and the corresponding route measure for each marker or feature. The corresponding route measure, with plus or minus offset, is used to geocode (linear reference) point or length data along the measured route. Iowa. The Iowa Department of Transportation (Iowa DOT) does not currently have an implemented LRS. Iowa DOT has several different LRMs that are inconsistently used in the field to measure events/features. Iowa DOT is implementing an LRS model based on NCHRP 20-27(2). This model has linear referencing, dynamic segmentation, and routing capabili- ties. This model does not, however, have address or location geocoding capabilities. Iowa DOT does not currently follow a specific spatial data model or standard. Iowa DOT uses several LRMs for different applications as follows: • Route-Mile-Point is used for Geographic Information Management System (GIMS) and inventory data for cre- ating Highway Performance Monitoring System (HPMS) data and general road inventory. DMI are used for col- lecting road inventory data along the centerline with a precision of 0.01 mile. This LRM is also used for video- log inventory for right-of-way data using DMI at a res-

olution of 0.01 miles. It is also used for referencing pavement management. • Route-Reference-Post-Offset is used primarily for ref- erencing pavement management data collected with DMI and GPS at a 0.01 mile resolution. • Stationing is used for sign inventory. Stationing is the measurement technique and the resolution is 10 feet. This information is used also to locate and monitor driveways. • Coordinate Route using 1100,000 map location is the LRM used for referencing crash locations for crash analysis. GPS is used for data collections with resolu- tion of 0.01 mile. • Coordinate Route using GPS is another LRM used for video-logging, inventory, and pavement management data collection. With regard to transformation, most data are referenced back to GIMS using a link to its segments. Segments vary in length and are created based on about 27 different criteria. Any time one of the criteria is met, a new segment is created. Another translation method used is a cross-reference table that links the GIMS milepoint by route and county for the pri- mary system with the Reference Posts. The final method is to conflate data to the GIMS centerline cartography from Coor- dinate Route (route name and GPS or DGPS coordinates). Given that the GIMS centerline cartography is based on a point of intersection, and, in most cases, does not have multi- lane facilities mapped separately, some errors occur when this conflation is done. The new LRS project will use a datum concept from the NCHRP 20-27(2) model to do all transfor- mations between the LRMs. Maine. The Maine DOT is in transforming its old version of the linearly referenced data structure (TINIS) into a future version of a linearly referenced data structure (D-Roads). Both identify a system of links and nodes. Links have a nom- inal maximum length of 6 miles. The numerical precision of the measured link lengths stored in the two systems was given as ±0.01 mile and ±0.001 mile, respectively. Their approach is to try to measure all new features within the required margin of accuracy and simply add the numerical precision to transformed events (from TINIS to D-Roads). Since re-measurements of all primary links are scheduled to occur every 2 to 4 years, the accuracy requirements of event data should be met within that timeframe. D-Roads is based on control nodes and segments between those nodes. Present and future measurements of the distances between nodes (i.e., segments) are collected via a DMI. The endpoints of the segments are defined by existing features, such as town lines, crossings, bridges, or by the nominal maximum 6-mile length. Updates of segment length are not necessarily constrained by the abovementioned update rate of a 4-year maximum. Seg- ment lengths are seen as constants as long as no significant changes to the network itself are applied (e.g., a new bridge or road realignment). 11 Currently, no multi-dimensional (i.e., 2D or 3D) refer- ence system is in use. There are plans to incorporate GPS- measured coordinates into very task-specific applications. For example, to enforce “no-spray zones,” a survey crew would first measure the location of a no-spray zone using GPS and then provide the spray crew with the acquired coordinates. In general, the spray planning (e.g., determi- nation of amount of spray) is still based on the linear sys- tem. No plans exist to merge these two reference systems. The expectation is that both systems will continue to run in parallel. Ohio. ODOT uses an in-house LRS standard based on county, route, and log (CRL). The LRM is coordinated with base route, with changes at county boundaries. LRM has dynamic segmentation capabilities, but no routing, address, or location geocoding capabilities. Measured accuracy in the region of 1/1000 of a mile is converted to 1/100 of a mile. The base files used as the DOT’s LRS were digitized from USGS quads with location and distance collected using a DMI. Much of the DOT’s roadway-based data are currently collected using some type of DMI and/or related back to a set of manuals con- taining LRS information (straight line diagrams). Ohio does not have any unique transformation method because transformation is automatically executed in the software. Pennsylvania. The Pennsylvania Department of Trans- portation (PennDOT) uses Integragh MGE, which has capa- bilities for linear referencing and dynamic segmentation, but not for routing or address or location geocoding. PennDOT uses the route offset LRM for all spatial data. Traffic, high- way maintenance, and project-related data are estimated from LRS with ±50 feet precision. DMI is used for other business data such as shoulder, pavement roughness, road signs inven- tory, guardrail, and bridge inspection. The precision of DMI is approximately 1 foot. ORNL. ORNL uses internal (in-house), ad hoc models. These models have capabilities for linear referencing, dynamic segmentation, routing, and address or location geocoding. All standards are internally driven by applications or else pro- vided. The process of transformation invariably involves iden- tification of “control points” (common locations identifiable in both inventory list and network), route construction between them, and interpolation. 2.2 SPATIAL DATA QUALITY The issue of data quality is continuing to challenge the spa- tial data community. Data quality is the relationship of the spatial data to the reality that it is attempting to represent (20). The value of any spatial data depends less on its cost and more on its fitness for a particular purpose. Quality of spatial data,

therefore, can simply be defined as its fitness for use. This def- inition enables users to make a judgement for each specific application, and quality is directly based on the extent to which a data set satisfies the needs of the person judging it (6). Data that are appropriate for use with one application may not be fit for use with another. The quality of spatial data depends fully on the scale, accuracy, and extent of the data set, as well as the quality of other data sets to be used (1). Different transportation applications require spatial data at different scales, and no one scale can support all transporta- tion applications. The life span and multiple uses of spatial data generally require that quality be assessed repeatedly and from different perspectives depending on the type of trans- portation analysis. Spatial databases must be properly main- tained and upgraded in order to maximize their usefulness (e.g., updated with changes in alignment, topology, and ref- erencing systems) (7). Recent concerns over the accuracy and reliability of spatial information in GIS have raised inter- est in trying to understand the reliability and uncertainty of GIS information. Because of the variety and amount of lin- early referenced data that need to be stored in geographic databases, it is crucial to provide reliable, efficient proce- dures to link all relevant data sources linearly (21). When used in GIS analysis, a data set’s quality signifi- cantly affects confidence in the results. Unknown data qual- ity leads to tentative decisions, increased liability, and loss of productivity (22). The primary objective of data quality stan- dards is to help data recipients and owners evaluate the “fit- ness for use” of data. Definitions of “fitness for use” vary, based on environment and intended application. Therefore, a definition of “data quality” should include a sufficiently broad set of criteria to address the full range of possible data characteristics that might affect its application. Setting data quality standards and documenting data quality require con- siderable forethought. The investment pays off, however, when evaluating the data for use, when sharing the data, and when attempting to communicate the benefits and limits of conclusions based on the data (23). 2.2.1 Measures of Quality Data quality is expressed in terms of precision, accuracy, and resolution. When referencing location, it is important for the field data collector to be aware of the resolution and pre- cision of the offset needed to report locations (e.g., 0.1, 0.01, 0.001 of a mile/kilometer), and the measurement position (e.g., along the centerline, along the shoulder lane, along the median lane). When using referenced locations for analysis, it is important for the analyst to be aware of the location res- olution and precision of reference posts, points, markers, and nodes in the field (11). Previous research (24) has identified several parameters (i.e., positional accuracy, thematic accuracy, temporal accu- racy, logical consistency, completeness, data status, and lin- eage) as encompassing the quality aspects of geographic infor- 12 mation. Considered together, these characteristics indicate the overall quality of a geographic database. Accuracy. When referring to geographic data, the term “accuracy” is usually described with two components: (1) posi- tional accuracy and (2) attribute accuracy (23). The posi- tional accuracy of a spatial object, or a digital representation of a feature, can be defined through measures of the difference between the apparent location of the features as recorded in a database and its true location (25). Positional accuracy refers to the amount of offset present within a data set from the true location of the features being represented, that is, how closely the coordinate descriptions of the features compare with their actual location. This type of accuracy is typically measured directly by comparison with data known to be more accurate or by inferring the amount of error introduced from process- ing the data; for example, a 124,000 scale road network may be tested against a set of GPS-based control points. If detailed positional accuracy analyses are beyond the reach of the proj- ect being performed, the data developer should at least doc- ument the processing steps and tolerances used, and the accu- racy of any source materials compiled. Attribute accuracy refers to how well the attribute portion of the database describes the geographic features being rep- resented. That is, how thoroughly and correctly the features in the data set are described. Before assessing attribute accu- racy, it is necessary to clearly define the interpretation rules used to represent information in the database. Rigorously determining attribute accuracy requires statistical analysis. At a minimum, data developers should document steps taken to ensure the integrity of attribute data. Resolution. Resolution, or precision, refers to the amount of detail that can be discerned in space, time, or theme. It is directly linked with accuracy and is also used to determine how useful a given database is for a particular application. Two databases with the same accuracy levels but different levels of resolution do not have the same quality. Data Status. Data status refers to the “currentness” of the data set. When developing data, it is important to maintain records of source material and observation dates used in the compilation. It is also important to maintain records on update cycles (23). Completeness. Data completeness refers to the degree to which the data describe the content of the source or phenom- ena being mapped. Completeness refers to a lack of errors of omission in spatial data. It includes consideration of holes in the data, unclassified areas, and any compilation procedures that may have caused data to be eliminated. Data complete- ness can be described by listing the features included in the data and whether the data are “completed” or “in progress.” One might also consider what might have been omitted. For example, a particular attribute may have been collected for

only part of an area, or perhaps paved roads but not gravel roads appear in a layer (23). Logical Consistency. Consistency refers to the adherence of the data to a given data structure, that is, the decisions that determine what the data set contains. Logical consistency refers to the absence of apparent contradictions in spatial data. Consistency is a measure of the internal validity of the data and is assessed using information that is contained within the data, which typically include spatial data incon- sistencies such as incorrect line intersections, duplicate lines or boundaries, or gaps in lines. These are referred to as spa- tial or topological errors. Consistency measures the extent to which geometric problems and drafting inconsistencies exist within the data set. For example, are attribute tables format- ted identically throughout the database? Are minimum fea- ture size criteria consistently applied? Are the data topolog- ically correct? Do features of the same type have the same descriptive data and level of detail? Are naming conventions consistent? (23) Lineage. Lineage refers to a record of all data sources used to construct the spatial data set and all operations that have been taken to process the data. Thorough documentation for all spatial data is essential for determining quality. Informa- tion about appropriate ranges of use and scales at which the information is valid should be included with the original spa- tial data and any derived data sets. Lineage is concerned with historical and compilation aspects of the data, such as source of the data, content of the data, data capture specifications, geographic coverage of the data, compilation method of the data, transformation methods applied to the data, and use of any pertinent algorithms during compilation. Knowing and documenting the original source of the data and its quality and establishing an audit trail of all transforma- tions and changes that have been applied is essential for eval- uating the overall quality of any resulting data set. The same data set that is reasonable for some applications is often not suitable for other applications where high quality is important. Timeliness. For certain types of spatial data that are con- stantly changing, such as roads, the quality of the data depends directly on the timeliness of the data. The primary data quality issues are related to authenticating and validating the data and maintaining a detailed historical audit trail of updates for users of the data, so that quality can be verified and publications based on the data can be properly attributed. 2.2.2 Positional Accuracy Accuracy is often defined generally as a measurement of exactness or correctness. In terms of spatial information, positional accuracy refers to how closely the data represent the real world. Because spatial data usually generalize the real world, it is often difficult to identify a true value. Because the 13 true value of the data is not actually known but can be esti- mated only, the actual accuracy of the measured quantity is also unknown, and the accuracy of spatial data can only be estimated only. (26) For points, accuracy is defined in terms of the distance between the encoded locations and “actual” location. For lines and areas, the situation is more complex because error is a mixture of positional error (error in the location of points along the line) and generalization error (error in the points selected to represent the line) (2). Positional accuracy has two components: absolute and relative accuracy (1, 27). Absolute accuracy and relative accuracy are considered separately because although spatial data may define a very accurate shape, the shape may not be located correctly (27). Absolute accuracy involves the accuracy of data elements with respect to a coordinate scheme. Absolute accuracy refers to how close a location on a map or data representation is to its real location on the earth. For example, a claim of absolute accuracy might be that 95 percent of the actual loca- tions of wells in a given area are within 50 meters of their sur- veyed locations. Relative accuracy concerns the positioning of spatial fea- tures relative to one another. Relative accuracy considers how similar a shape on a map or data representation is to the shape of the object on the earth. For example, cutblock boundaries do not vary by more than 10 meters from their actual shape. The spatial position of an arbitrary object defined within a GIS data layer has a positional error that can be described by one of the primary parameters of positional accuracy. Table 2-1 shows examples of measures and metrics associ- ated with positional accuracy (23). Accuracy addresses con- cerns for data quality, error, uncertainty, scale, resolution, and precision in spatial data and affects the ways in which it can be used and interpreted (28). Accuracy is always a rela- tive measure, because it is always measured relative to some specification (2). Two sources of error can reduce positional accuracy (1): inherent error, which is the error present in source documents and data, and operational error, which is all introduced error. 2.2.3 Uncertainty It has been argued that in the context of geographic data, there is a clear distinction between error and uncertainty (29). “Error” implies that some degree of knowledge has been attained about the difference between the results or observa- tions and the truth to which they pertain. “Uncertainty,” on the other hand, conveys that it is the lack of knowledge that is responsible for hesitancy in accepting without caution, and often the term “error” is used when it would be more appro- priate to use “uncertainty.” The term “uncertainty” has gained recent popularity but suffers from inconsistent and ambiguous usage. Mowrer (30)

provides a recent compilation of most frequent interpretations. Geographic Information Science (31, 32, 33), a relatively new field, has emerged as a combination of several different scien- tific fields (e.g., computer science, geography, surveying, and photogrammetry). Each of these scientific fields has a dif- ferent view of uncertainty. Some claim, for example, that there is a difference between a situation of risk and one of uncertainty. The distinction is that in a risky situation, a ran- dom event comes from a known probability distribution, whereas in an uncertain situation the probability distribution is not known. With any GIS product there is a level of uncertainty about the nature of its quality. It is important to provide the GIS user with the necessary awareness that these problems exist. Although there is a continuing interest in improving data qual- ity standards (24, 35), commercial GIS packages put little or no effort into calculating and communicating the inherent imperfections to the user (36). Several researchers (e.g., 37, 38, 39, 40, 41, 42) have explored different approaches to handling either a single imperfection (e.g., inaccuracy) or a conglomer- ate of imperfections (e.g., imprecision and inconsistency). To improve the management of quality within GIS, it is essential to detect occurrences of imperfections and to clarify some frequently used terms. Steps in this direction have been made over the last several years. The development of a Spatial Data Transfer Standard and other national and international research efforts have been directed at understanding spatial data uncertainty (e.g., 37, 43, 44, 45). Various approaches to the management of uncertainty have been proposed. For example, the possibility for assessing the fitness for use of spatial information as one form of uncertainty measure has been explored (46). A different approach was offered that emphasized the design of a GIS to avoid misuse of spatial information (47). The development of an intelligent GIS also has been proposed as a possible approach to managing uncer- tainty (48). Another approach focused on data quality issues with regard to user interface design (49), while another (50) discussed the relationship between the advantages of high resolution and the disadvantages of the accompanying high costs in GIS. 14 2.2.3.1 Measures of Uncertainty Many methods of measuring geospatial uncertainty, such as positional root mean square error (RMSE), have been adopted more or less directly from traditional statistics. Sev- eral problems arise in extending them to complex geospatial objects. For example, how does one measure the positional accuracy of a complex geographic curve like a shoreline in ways that are independent of artifacts like point sampling? Several methods have been proposed recently (25), but these have not been tested or assessed for large, realistic data sets. To be useful, measures of uncertainty for spatial databases should satisfy certain definable criteria that mirror those underlying such traditional measures as RMSE. Goodchild et al. (25) identify the following criteria: • Insensitivity to implementation details of the digital rep- resentation of the feature, • Insensitivity to outliers, • Unbiasedness, and • Minimum variance. The last two properties can be defined only in the presence of a stochastic model of uncertainty that allows comparison across multiple realizations. For example, if such a model were available for a digitized representation of a line, it would support simulation of a population of realizations of the line, each of which would be equally likely to be observed in real- ity. The measure of uncertainty would be a parameter of the model and it would be possible to analyze the performance of various procedures for estimating its value. The RMSE satis- fies this requirement for a Gaussian model of uncertainty in point position, but there is no comparable theoretical analysis of measures of uncertainty in complex spatial objects. Although much is known about measuring uncertainty in individual measurements, the problems of uncertainty in geo- spatial data are exacerbated by the lack of simple lineage between independent measurements and final product. It is currently impossible, for example, to identify the indepen- dent measurements responsible for uncertainty in a single elevation value drawn from a Digital Elevation Model (DEM). Measures Metrics Remarks Absolute accuracy (against reference frame) • RMSE (root mean square error) • Error ellipse (2D) • Describe the measurement method and error model • Can be represented by a vector – 2 axles and rotation angle Relative accuracy (positional relative to adjacent features) • Error of distance • Relative error ellipse (2D) • Describe measurement method • Give the random and systematic error components • Describe the method (e.g., adjustment) Accuracy of pixel position • RMSE of pixel position • Describe source of error representation via a real variable or a vector with random and systematic components Height accuracy • RMSE of height • Accuracy of height of a single point TABLE 2-1 Measures of positional accuracy (23)

Hunter et al. (51) analyzed this situation and showed the importance of knowing the spatial dependence among indi- vidual uncertainties in many geospatial applications. Any measures of uncertainty must consider that uncer- tainty varies through space and time and is context sensitive. This has important implications for modeling. Certain exist- ing measures have limited utility because they do not describe the spatial distribution of uncertainty (e.g., a single RMSE does not capture variation over space). Thus, an important and desirable characteristic for measures of uncertainty is that they do indeed describe the variation in uncertainty over space. There is also the need for measures to be dynamic (e.g., if they are computed and stored in the database, any updates to the database may require updating stored mea- sures). It is clear that there are many elements of uncertainty, each of which is ideally measured individually (e.g., positional uncertainty, topological uncertainty, attribute uncertainty, and temporal uncertainty). Given different user contexts, there may be a need for a combined measure of uncertainty that aggregates the measures of individual elements. For exam- ple, users in a rapid decision-making environment may not have the time or interest to request a view of each of the indi- vidual measures of uncertainty. They may prefer a combined measure that can inform them of the overall uncertainty of a specific piece of information. Important measurement issues thus relate to aggregating individual measures of uncertainty. It is presumptuous to give general solutions for the primary parameters of data quality that fulfill each useable occurrence in every GIS application. Therefore, measures and metrics can be defined by the data provider—appropriate for the indi- vidual kind of data set and the demands of the user. One might state that the demands of the user cannot be anticipated a priori except in an idealized case. Nevertheless, only the assumption of the ideal situation, or a situation where the provider at least has an idea of the intended GIS application, allows the provider to choose the parameters to give useful information on accuracy. If all possible accuracy values have to be evaluated, the costs of information on accuracy would be too high. 2.2.4 Spatial Data Errors All spatial data is inherently inaccurate, as it is only a con- ceptualization of the reality it tries to represent. The degree of uncertainty associated with spatial data is affected by various factors, which range from measurement error, to inherent variability, to instability, to conceptual ambiguity, to over- abstraction, or to simple ignorance of important model param- eters (3). Errors also can be introduced by collection meth- ods, data translation, digitizing methods, source material, generalization, symbol interpretation, specifications of aerial photography, aerotriangulation technique, ground control reli- ability, photogrammetric characteristics, scribing precision, resolution, and processing algorithms (5). The error associ- ated with any one of the potential sources is often small, but 15 together these errors can significantly affect the accuracy of the spatial data, thereby affecting the potential uses of the data (5). The method of data collection sets limitations on the selec- tion of the measures of uncertainty and their metrics. Both common surveying methods using tacheometer (or GPS) and aerial photography produce positional error. For the tacheome- ter, sources of error include orientation error, the scale derived from distance measurement error, and errors from adjustment. Attachments to surveyed points could also introduce instru- ment error, operator error, and other types of error to the point coordinates. In photogrammetry, one has to consider the reso- lution, distortion, scale (flight height), transformations (picture coordinates), and bundle block adjustment. Following the initial acquisition of data, a series of carto- graphic techniques are used to translate this acquired informa- tion into mapped information. Errors and inaccuracies intro- duced at the digitizing stage are largely unpredictable and random in nature. Integrating data from different sources, in different original formats (e.g., points, lines, and areas), at dif- ferent original scales, and with inherent errors can yield a prod- uct of questionable accuracy (1). Common practices in map compilation (e.g., generalization, aggregation, line smoothing, and separation of features) can introduce further inaccura- cies (28). Processing of data produces errors such as misuse of logic, generalization, problems of interpretation, mathemati- cal errors, accuracy lost from low precision computations, and rasterization of vector data (28). The method of processing data also determines the result- ing error type and its metric. The use of spatial data in GIS can further reduce the quality of the data. Because most of the spatial data used by GIS is required in predefined formats, the spatial data must be modified to fit the standard. This modification or compression of the data into the acceptable format often reduces the accuracy of the information. Fur- thermore, once the spatial data are in the acceptable format for use by a GIS, every action that uses the data can generate additional errors and compound existing ones. The errors and inaccuracies associated with spatial data are cumulative and build up through the various processes of data manipulation and analysis (3). Error also can spread to other spatial data that incorporate the data in the GIS (6). Errors introduced through measurement and processing can be either systematic or random. Examples of more GIS- specific errors are errors of orientation. These include errors from the transformations used while digitizing a paper map or as a result of the orientation of a GIS raster. During the process of conversion of data into a raster map, the level of granular- ity changes, which is an additional error source (52, 53). An example of error propagation modeling deals with methods for visualization of the accuracy of geometrical data. Areas are represented by their boundaries. The vertices of these polygons are treated as stochastic information. The mathematical principle is based on the probability of the loca- tion of an arbitrary point within a closed polygon. This model

can be used to determine the accuracy of an area segment by overlaying two areas with a map overlay operation. The latter quality model combines variances as well as correlation and systematic errors based on proven theoretical methods. A data quality model (DQM) is one way of integrating and presenting uncertainty information to a GIS user. The DQM is a subschema in the concept of metadata (35). It provides essential additional information to assess the decisions made with the help of a GIS. A model of the real world requires transformations of the data to reduce the information to the essential quantity. During this process, discrete data obtained from continuous reality introduces errors. 2.2.4.1 Scale, Resolution, and Discretization As noted in Sinton (54), Chrisman (37), and others, when making measurements, resolution is imposed across the three dimensions of space, theme, and time in the form of dis- cretization. Control is a discretization along one or more dimensions so that another dimension can be measured. The imposition of discretization results in a loss of information that contributes to the uncertainty about the variable or phe- nomena being described. In terms of uncertainty, the effects of discretization are likely to be more substantial than mea- surement error. Work on uncertainty has tended to focus on measurement errors, and yet the effects of discretization may be more substantial. In other words, the imperfections in the measurements are less cause for concern than that which is not measured. Several researchers discuss the effects of resolution or scale (55) in a broad variety of approaches. Watzek et al. (56), for example, focused on an empirical approach to deter- mine the perceived scale accuracy of computer visual simu- lations. Bruegger (57) proposed spatial theory models for integrating datasets of different levels of resolution in GISs. Cushnie (58) discussed the interactive effect of spatial reso- lution and the degree of internal variability within land-cover types on classification accuracies. Canters et al. (59) and Moody et al. (60) take a different approach that focuses on the errors introduced in land-cover proportions due to varying scale. Burrough (61) and Oliver et al. (62) investigated com- parable methodologies, concentrating on the influence of vari- ations in a continuous field. An application-specific approach (e.g., road density estimates) of scale-dependent accuracies can be found in Wade et al. (63). On a global scale, Townsend et al. (64) elaborate on the effects of resolution in conjunction with a specific application—global monitoring of land trans- formations. Similar effects such as aggregation and support are discussed in Heuvelink (65). Prisley et al. (66) investigated the effects that the underlying variation in the attribute variable had on the GIS-based decisions. The influence of discretization on the quality of spatial representations has not been addressed in any systematic way. Researchers van Groenigan (67) and Burrough et al. (33) address a similar problem in a slightly different way. They are 16 interested in optimizing the layout of a sample field. However, in their approach, the underlying variation of the attribute does not play a central role. Their approach is based on an a priori optimization, whereas this study is interested in estimating the loss of information a posteriori. Their approach is directed toward data producers, whereas this study concentrates on providing the user with helpful information on the inherent uncertainty. In general, the overall reliability of a spatial rep- resentation is less influenced by the accuracy or precision of a measurement than by the number, density, or spacing inter- val of the measurements. Accuracy measures are most often associated with well-defined points, which have little to say about unmeasured locations. Discretization is an implicit mea- sure of what is not known or what might be missing as a result of the discretization. 2.2.5 Spatial Data Standards Quality assurance is a basic requirement for reliably per- forming an application, and all applications should be accom- panied by a detailed evaluation of the fitness-of-use of the data used (to examine whether the data represent the infor- mation needed to answer the question raised by the applica- tion). A statement of accuracy generally includes a statistical determination of uncertainty and variation, as well as how and when the information was collected (27). Often a state- ment of accuracy is accompanied by the confidence level of the spatial data, which is defined as the probability that the true value of the data falls within a range of given values (26). Standards provide for consistency among data, users, and systems. Most accuracy standards for spatial data require a standard for the horizontal component of accuracy and another standard for the vertical component of accuracy, as well as a description of the method used to evaluate the accuracy (26). The reporting standard in the horizontal component is the radius of a circle of uncertainty, such that the true or theoret- ical location of the point falls within that circle 95 percent of the time. The reporting standard in the vertical component is a linear uncertainty value, such that the true or theoretical location of the point falls within plus or minus of that linear uncertainty value 95 percent of the time. The method used to evaluate accuracy (e.g., statistical test- ing, least squares adjustment results, comparison with values of higher accuracy, repeat measurements, or estimation) should be described. Comprehensive statements of spatial data quality should accompany the use or transfer of all spatial data, because it is not feasible to remove error entirely from spatial data sets, although a reduction of error is possible. The introduction and adoption of spatial data standards addresses the issue of spatial data quality, but heavy reliance on the fitness for use of the data means that most of the responsibility remains in the hands of spatial data users. An awareness of the accuracy of spatial data allows users to make a subjective statement on

the quality and reliability of the information (1). Spatial data error cannot be predicted, neither can it be entirely prevented; at best, it can only be coped with (3). 2.3 APPLICATIONS OF POSITIONAL DATA 2.3.1 Introduction Spatial data has little or no value to transportation applica- tions without any attribute data attached to it. Each spatial data element (a line, a point, or a polygon) has a cartographic rep- resentation as well as a unique identifier to associate attribute information with that data element. In contrast, data collected by transportation agencies for their facilities may not have any cartographic representation (i.e., geo-referenced). Data are collected in a network model, a theoretical framework that is applied to and depends on the functionality of different LRMs. Given that the network does not require any carto- graphic representation (i.e., spatial data element), and attribute data are collected independently from the cartographic rep- resentation of the transportation element (i.e., highway seg- ment), it is important to address the issue of sensitivity of applications in transforming various LRM data to the linear datum (cartographic) representation. Different applications require spatial data at different scales. Vonderohe et al. (68) suggested the use of four spatial database scales for DOT activities. As noted in Table 2-2, the transportation applica- tions of GIS can be divided into three primary functional groups: planning, management, and engineering. Planning applications are usually at statewide and regional levels and do not require highly precise locational data. Spatial databases for these applications are at 1500,000 to 1000,000 scales. Man- agement applications often require more detailed locational data that are available at regional or district levels. The spatial databases are usually in the 1100,000 to 124,000 range. Engineering applications require a high level of spatial accu- racy and these applications are restricted to project or corridor level. The preferred scales for engineering applications are 112,000 to 124,000. This grouping suggests that engineer- ing applications are more sensitive to positional data quality than management applications. A different way of grouping the current and emerging applications of GIS is by transportation subject area. This 17 concept of grouping recognizes that applications within a subject area may include planning, management, and engi- neering functions. Moreover, grouping by the three functional classes may conflict with the sensitivity of the individual applications to spatial data quality. For example, while crash reporting may not be classified as an engineering activity, identifying crash-prone locations is sensitive to the data qual- ity. Similarly, highway infrastructure management may be classified erroneously as a management function, when it actually involves engineering applications. Table 2-3 shows the current and emerging uses of spatial data in transporta- tion as well as the levels of sensitivity of transportation appli- cations to spatial data quality. These levels are based on state DOT perceptions of the sensitivity of the various applica- tions to positional data quality. 2.3.2 Examples of Applications Pittman et al. (69) provided an overview of the various transportation applications of spatial data and observed that GIS-T are being effectively used to do the following: • Provide support for making quality decisions on main- taining the transportation infrastructure, • Design efficient routes for maintenance operations and serving the riders of transit systems, • Manage traffic and incidents, and • Develop multi-year improvement plans that take into account existing roadway characteristics and conditions and crash record information. O’Neill et al. (70) identified emerging applications of positional data to include field crew scheduling, customer complaint and response, decision support system, facility management, and policy analysis. The following are examples of specific projects that demon- strate the applications of GIS in the various subject areas iden- tified in Table 2-3. These examples are provided to illustrate the range of current and emerging applications by state DOTs. These examples do not exhaust the full range of possible cur- rent and future applications. Some of the applications overlap Scale of Spatial Database Precision of Spatial Database (ft) Typical Activities or Applications 1:500,000 830 Statewide planning 1:100,000 170 District-level planning and facilities management 1:12,000 – 1:24,000 30 – 40 Engineering 1:120 – 1:1,200 0.33 – 3 Project-level activities TABLE 2-2 Scales and typical applications (68)

two or more application or subject areas or can be classified in more than one category. 2.3.2.1 Safety • North Carolina DOT used a GIS-based referencing sys- tem to identify locations with a high probability of truck crashes on truck corridors. The framework allowed visu- alization of geographic patterns of land use activities associated with frequent crash locations (71). • Iowa DOT developed a GIS-based crash location and analysis system designed to manage crash data retrieval and analysis. The system also allows analysis of impli- cations of crash location characteristics for emergency response services (72). 18 2.3.2.2 Transportation Planning, Impact Analysis, and Policy Analysis • A GIS software application with transportation demand modeling capability is used for freight modeling to help identify highway capacity problems of the national freight transportation system. This study was conducted for the FHWA’s Office of Freight Management and Opera- tions. The primary objective of the highway freight capac- ity analysis is to develop a policy tool for analyzing potential freight-related policies and examining the suffi- ciency of capacity of the transportation system in meet- ing forecast freight demand (73). • Florida DOT’s office of system planning uses GIS-T in an ad hoc production of maps used to manage and develop the Florida interstate highway systems (69). Sensitivity Subject Area Applications L M H Safety - Crash reporting - Black spot/ crash prone location identification - Traffic safety investigation - Rail crossing safety analysis - Pedestrian and bicycle safety analysis - Incident management - 911 emergency planning and response • • • • • • • Transportation Planning, Impact Analysis, Policy Analysis - Travel demand modeling - Multi-modal freight modeling - Hazardous materials routing - Traffic impact analysis • • • • Transit and Public Transport Planning and Operations - Transit planning - Transit routing - Handi-transit - Real-time tracking and scheduling of buses • • • • Transportation Infrastructure Management and Operations - Location of facilities (road, highway, airport, port) inventory - Pavement management system - Asset management - Operation (congestion, service) - Corridor analysis (rail, road, highway) - Rail/highway information system management • • • • • • Transportation Design and Construction Planning - Sources of construction materials - Right of way - Road closure and detour - Construction information - Field crew scheduling - Maintenance and operation - snow plowing - garbage collection - street sweeping • • • • • • • • • ITS Applications - Traveler Information System - Integrated Highway Information System (IHIS) - Integrated Traffic Monitoring System (ITMS) - Web-based road condition reporting system - Vehicle Navigation System - Applications to commercial vehicle operations regulatory enforcement activities • • • • • • Freight Analysis and Commercial Vehicle Operations - Fleet management - Vehicle tracking, guidance, dispatching, and other routing applications - Permitting - Freight movement • • • • • TABLE 2-3 Applications of positional data in transportation

2.3.2.3 Transit and Public Transport Planning and Operations • A prototype decision support system was developed in GIS for the Cape Cod Regional Transit Authority. The tool was designed to support operational decisions, which integrate paratransit ridership with regional and community-based fixed-route transit, and planning deci- sions regarding intermodal transit connections through- out the Cape Cod regions (74). • The Delaware DOT examined the use of GIS to better understand travel demand and to identify opportunities for transit in New Castle County. GIS was a valuable tool in demonstrating the relationship between transit markets and existing transit service, providing a method to describe travel demand at a very detailed level, and suggesting the best location for park-and-ride facilities and transit centers (75). • Research was carried out by the Orange County Trans- portation Authority that proved GIS to be a useful tool to project transit passengers’ mobility patterns with greater accuracy, consequently strengthening the validation data- base for travel demand forecasting analysis with respect to transit planning (76). 2.3.2.4 Transportation Infrastructure Management and Operations • For a highway infrastructure management application, a road centerline base map and inventory of transporta- tion infrastructure in Seneca County, Ohio, allowed the county to improve the maintenance of traffic signs, bridges, guardrails, and culverts (77). • In a pavement management application, dynamic seg- mentation was used to project transit passengers’ mobility patterns with greater accuracy, consequently strength- ening the validation database for travel demand fore- casting analysis with respect to transit planning. The necessary data were collected on I-85 in South Carolina, but the study was sponsored by NCDOT (78). • The New Jersey Turnpike Authority implemented a sys- tem integrating Automatic Traffic Surveillance and Con- trol System (ATSCS) technology with GIS-T to improve its transportation operation activities. This example also can be classified under ITS applications. 2.3.2.5 Transportation Design and Construction Planning • The Maryland State Highway Administration sponsored the use of a GIS model to optimize the selection of geo- metric designs for highways. GIS was integrated with a Highway Design Optimization Model (HDOM) to com- pute geographically sensitive costs to be used with an 19 iterative optimization scheme. It was shown that the GIS model provides accurate geographical features, com- putes location-dependent costs, and transmits these costs to an external program. An example study was carried out for Talbot County, Maryland (79). 2.3.2.6 ITS Applications • The NJDOT and NJ Transit sponsored a study that investigated the use of an Automatic Vehicle Location (AVL) system to monitor the locations of buses. Infor- mation from the AVL is displayed in a GIS that contains data on bus routes, bus stops, intersections, and land- marks. The system required that the positions of the fea- tures be accurately determined. The system was tested in a densely built urban area with high-rise buildings, tunnels, and overpasses. Accuracy of the results was within the 30-ft tolerance limit (80). • Through a public-private partnership between Mobility Technologies (formerly Traffic.com), Pennsylvania DOT, and U.S. DOT, an Integrated Surveillance and Data Man- agement Infrastructure (ISDMI) program was imple- mented in Pittsburgh and Philadelphia. Real-time traffic data are integrated with a GIS-based freeway management system that stakeholders can readily access. This system is expected to enhance traffic and incident management. The system also provides traveler information to road users (81). 2.3.2.7 Freight Analysis and Commercial Vehicle Operations • Freight flow characteristics were integrated with GIS for the identification and analysis of the location of transportation facilities and freight generators, freight movement patterns, variation in truck traffic mix by configuration and body type, and truck travel time. The purpose is to examine the specific details of policy options and how these options may affect the operation, modal competition, equipment selection, and response of primary decision-making groups. The study develops a set of metrics that will allow examination of implica- tions of possible federal truck size and weight policy (82). 2.3.3 Emerging and Future GIS-T Applications by State DOTs Table 2-4 summarizes information from state DOTs as well as the Oak Ridge National Laboratory (ORNL) on cur- rent, emerging, and future GIS-T applications. Both current and anticipated future applications vary from state to state. However, several current applications, such as a highway

inventory, pavement management, traffic studies, and crash analysis, are common to all states. Vehicle routing (e.g., bus, truck, or permit vehicles) and detour routing are common future applications identified by state DOTs. 2.3.4 Sensitivity of Applications to Positional Data Quality Knowledge of the uncertainty associated with geographic information is critical to the effective use and credibility of GIS and GIS outputs. The key components of a research agenda for uncertainty have been identified as modeling, propagation, communication, fitness-for-use assessment, and uncertainty absorption (46). The “truth in labeling” concept is aimed at providing users with information to help assess fitness for use of data. However, the lack of actual procedures for this assessment means that, in many cases, valuable data quality statements remain under-utilized. Agumya et al. (83) discussed risk management techniques in assessing fitness for use of geographic information by translating uncertainty in the information into risk in the decision. 20 The sensitivity of transportation applications to positional data accuracy can be assessed either by standards-based meth- ods or by a risk-based approach. The traditional method to assess the acceptability or fitness of use—the standards-based method—compares data uncertainty with a set of standards that defines acceptable levels of uncertainty in the data (36). This approach measures the sensitivity of the positional data for a particular application by directly comparing the quality elements of information against a set of standards or error benchmarks that represent the acceptability of the data com- ponents. Although uncertainty in spatial data is composed of several well-known elements (84), the obvious measurable ones are map scale (resolution), currency, attribute accuracy, and percentage of completeness. However, measures of these elements are difficult to combine into a single, meaningful, composite unit (85) and require testing the sensitivity of the application to error associated with each element. A typical example would be U.S. census TIGER street centerline spatial data, which are used for urban transportation modeling appli- cations. There is no means of separating the individual error effects of poor map scale (e.g., positional accuracy of the street segments), logical consistency (e.g., street network topology), State/ Organization Current Applications Future/Potential Applications Arizona • Planning • Safety analysis • Incident detours • Highway closures and restrictions • Asset management • Feature inventory • Detour routing Iowa • Road inventory • Pavement management • Crash analysis and reporting • Highway inventory • Travel demand modeling • Automated overweight/oversized truck routing • Safety inventory • Sign inventory • Automated traveler information system (ATIS) • ITS (emergency and construction routing) • Automatic vehicle location (AVL) Ohio • Planning applications: - environmental impact studies (wetland studies) - historical and archeological studies - highway safety (location of crashes) - congestion management - level of service - statewide travel demand modeling • Pavement management • Traffic studies (impact, design) • ITS applications • Bus routing • Intermodal • Freight analysis • Emergency evacuation • Traffic demand modeling Pennsylvania • Crash analysis • Right of way • Pavement management • Traffic studies • Linear reference control • Environmental reviews including cultural resource • Environmental permitting • Wetland mitigation • Address matching ORNL • Planning • Routing • Infrastructure management • Development of hierarchical networks to integrate functions e.g., inventory, navigation, strategic routing TABLE 2-4 Summary of state DOT applications

attribute accuracy (e.g., travel time), or completeness (e.g., missing street segments) (86). A risk-based approach, in which the sensitivity of an appli- cation is measured against the adverse effect of the ultimate decision, is based on the results of the analysis. Agumya et al. (86) stated that the “risk-based approach is a technique based on risk management practices, in which a study is made of the effect that uncertainty in the data has upon the ultimate decision to be made with it. In turn the adverse con- sequences of making a poor decision are quantified, and it is this information which enables a user to determine whether a data set is fit for use or not.” Risk analysis has already been suggested as a plausible basis for characterizing and estimat- ing the consequences of uncertainty in spatial data (38). In the earlier example, the risk-based approach would have determined the consequence and liability associated with this particular application, by using the TIGER street line spatial data and formulating a strategy for reducing this liability or consequence in the most cost-effective manner. The sensitivity assessment of positional data under this approach would require addressing two fundamental ques- tions (83): • What are the consequences associated with the decision, in terms of risk, in using a particular set of spatial data with error in different transportation applications? • What are the acceptable consequences of uncertainty in terms of risk? The first question entails the partition of spatial data error for a particular dataset into its various elements, the determi- nation of the risk a transportation analyst may incur by mak- ing the decision based on the dataset, and the extent to which this dataset influences the decisions. If the positional accuracy of the dataset has the lesser effect on the decision, such as traf- fic or freight assignment using a TIGER street file, then it is reasonable to accept the risk and uncertainty associated with this particular application. However, for vehicle navigation purposes, the risk may still be too high to be acceptable. The second question entails establishing a threshold for the risk that is considered acceptable. The acceptability of risk may vary widely among the data users and depend on the nature of the applications. The acceptability of project-level analysis or a decision is more conservative than the planning- level transportation application. For a given spatial dataset (e.g., TIGER street file), acceptability of the positional accu- racy is much higher. 2.3.4.1 State Practices—Sensitivity of Applications to Positional Data Quality This section discusses perceptions of the sensitivity of positional data quality on various transportation applications. Most respondents were unable to provide any meaningful 21 responses to the question of how the quality of data affects or is taken into account in various transportation applica- tions. Applications such as commercial vehicle operations or regulatory and policy analyses are less sensitive to the accu- racy of positional data than highway design, construction planning, and infrastructure management applications. Iowa. The Iowa DOT is creating an LRS, based on a datum as part of the LRS Development Project Pilot (scheduled to be completed in June 2001). A needs assessment was com- pleted as part of that project. Part of that assessment identified user accuracy requirements. These accuracy requirements were quite diverse, even for events/features in the same data- base. The consensus (including cost considerations) was that the achievable accuracy was 10 meters along the roadway. Given that location is the basis for integrating the data, the accuracy along the centerline becomes one of the most impor- tant aspects. As technology improves and becomes more eco- nomical, Iowa will no doubt increase the accuracy of the datum locations. This will be necessary so that the business data mapped against the datum will not be degraded if the business data are more accurate than the datum. ORNL. The accuracy has to be better than the size of the objects. Roadway segments are rarely less than 40 meters, so there is little benefit for accuracy better than 20 meters. Never- theless, 100-meter accuracy is still useable if that is all that is available. ORNL’s experience has been that other sources of ambiguity dominate locational error such as unequal spacing of mileposts. 2.3.4.2 State Practices—Effects of Data Quality on Decisions The quality of positional data influences decisions relating to different applications. For planning and management appli- cations that do not require high accuracy of positional data, a general idea of the quality of data may be sufficient to make decisions. However, for engineering applications where speci- ficity is critical, the quality of data receives more emphasis in making decisions. In the absence of knowledge of the quality of positional data, states tend to rely on the standards to guide the assessment of the data quality. Further applications are designed around available accuracy or quality of data. Arizona. Accuracy of positional data is adequate for plan- ning, statistics, and inventory. ADOT noted that one adverse effect of using spatial data that do not meet the minimum quality standards, or data with uncertain accuracy, is diffi- culty in coordinating with other data. Most decisions are not currently made on readily avail- able spatial data. Initial analysis may be performed so that more exact field surveys can be obtained. At that point,

engineering-accurate surveys provide the spatial information needed to make decisions. Even in the areas of pavement management, crash analysis, and ITS, the current spatial accuracy is used mostly for a general description of the loca- tion, not as an engineering decision-making tool. As the LRS is developed and uses the location to integrate the data, more dependence will be placed on the location/linear accuracy. In general, the USGS ortho photos (1-meter pixels) will probably meet most accuracy requirements. ADOT is getting hard measurements to confirm that the expected accuracy in a “flat” state like Iowa will be substantially better than the nominal accuracy stated by the USGS. ADOT is also acquir- ing higher accuracy orthos from local governments, as they become available, with 6 inches, 1 foot, and 2 feet pixels. The 11/2 feet pixel sizes will definitely meet all but the most strin- gent requirements. These sources vary in spatial resolution from 11,000 to 112,000. Obviously, spatial data that fail to meet minimum accu- racy standards can cause incorrect decisions to be made or require that analyses be verified using costly fieldwork. In some cases, limited accuracies will mean that the data are not useable (e.g., 15-meter panchromatic spatial images are too coarse for most transportation needs). In some instances, ADOT receives data from other state agencies with a resolu- tion of 11,000,000 or less. Such data are only useful for very macro-level analysis. Ohio. When data fail to meet minimum quality standards, it is evident during processing when the coordinates do not fit. A decision has to be made whether to use existing data or new data. That decision depends on the project. For example, in culvert replacement, vertical alignment accuracy is criti- cal, while horizontal alignment is not so critical. For bridges, the position of piers and elevation require higher levels of accuracy, while in boundary work accuracy is not very impor- tant, so they use the state minimum as a guide. The standard used depends on the type of survey. National Geodetic Survey (NGS) specifications are used for certain types of surveys and second-order NGS specifications are used for the control of engineering designs, for example, cen- ter line points (150,000). The state minimum is 15000. However, the NGS specification is not always followed. ORNL. Applications are designed around available accu- racy; the need for more accuracy seldom arises. Applications are more dependent on attribute accuracy and currency, where the scale of the objects is substantially under geographic accu- racy. All applications have error rates, and more accuracy will reduce these. The biggest problems have been in facility loca- tions on networks such as bridges and railroad grade cross- ings. An improvement from 100 meters to 20 meters of max- imum error would reduce location-caused error rates from 10 percent to near zero. 22 2.4 MEASUREMENT SYSTEMS 2.4.1 Introduction Several techniques are available to measure the positions of objects or events to be mapped to the highway network. Examples include milepost-referencing, distance measuring instruments (DMIs), surveying, aerial and satellite imagery, and GPS. Techniques such as milepost-referencing and DMI techniques measure the positions of objects along linear paths directly. In many applications, however, use of these tech- niques is either not possible or practical. Examples include real-time emergency vehicle routing, automatic vehicle loca- tion (AVL), and monitoring of construction equipment. For these applications, techniques such as aerial/satellite imagery and GPS techniques are more feasible. Difficulties arise with the use of these techniques, however, because the 2D (or 3D) positional data must be mapped to a 1D linear refer- ence. In most cases, this data mapping is done with the help of a GIS (21). This section summarizes the different methods used in transportation for measuring positions and for locating vehi- cles. The objective is to describe each system and indicate the levels of accuracy it can achieve. Section 2.4.2 covers measuring methods that deal with locating roadway features for creating maps or geographic databases. Section 2.4.3 discusses positioning methods commonly used to deter- mine the current location of a vehicle in real time. In this case, the actual measurement is immediately used for nav- igation or vehicle tracking. In each case, the measuring device and measuring method are described. In addition to the descriptions in the following sections, details of the measuring and positioning methods are summarized in Table 2-5. 2.4.2 Measuring Methods 2.4.2.1 Aerial Photogrammetry. The fundamental principle used in aer- ial photogrammetry is triangulation. Aerial photographs are taken with an airplane or a helicopter. By taking photographs from at least two different locations, so-called “lines of sight” can be developed from each camera to point on the object. These lines of sight (sometimes called rays because of their optical nature) are mathematically intersected to produce the 3D coordinates of the points of interest. At a minimum, one needs two different photographs to reconstruct the 3D world. To triangulate a set of points, one must also know the cam- era position and aiming angles (together called the orienta- tion) for all the pictures in the set. The orientation can be computed using ground control points or by installing survey- grade GPS in the aircraft. Aerial triangulation ties blocks of aerial photos together and simultaneously computes the ori- entation parameters of all photographs.

Once the orientation is available, analytical stereo plotters or digital photogrammetric workstations are used to extract spatial data. The stereo plotter operator views a 3D model in his or her workstation and, using a 3D cursor, traces the lines to be added to the map (e.g., road centerlines, intersections, or contours). Aerial photogrammetry is used to take measurements in x,y,z coordinates. High-resolution aerial photos are used for highway engineering and roadway design, while lower- resolution photos can be used for GIS base mapping (to extract road centerlines). The accuracy of measurements with aerial photogrammetry depends on the image scale at which aerial photos were collected and on the pixel resolu- tion in the case of digital images. Accuracy of 3 to 5 inches and an image collection scale of 1 inch = 100 feet can be achieved with regular aerial photographs and high-quality stereo plotters. The productivity of this measuring method is limited by the capabilities of the stereo plotter operator. The data extraction process is mostly manual, although there is promising research for automatically extracting road center- lines and road edges. Data with high-resolution photos are used for project-level applications, while network-level appli- cations require data with lower-resolution photos. 23 Ground Control Points (GCPs) are typically established using GPS to serve as location reference points; however, aerial GPS is becoming more popular and reduces the num- ber of GCPs required. Orthophotography. A digital orthophoto is a rasterized (scanned) aerial photograph, which is fully rectified to remove all of the distortions that occur in the original image: the pitch and roll of the aircraft, the radial distortion from the camera lens, and the image displacement from the topogra- phy. The removal of these distortions results in the imagery becoming a true scale representation of the ground. The orthophotos can be used for 2D digitizing on a computer screen. With digital orthophotos, all of the information on the original photograph is on the rectified image and is located in its true position. The standard DOQs (digital ortho quad) produced by the U.S. Geological Survey (USGS) are either gray-scale or color- infrared (CIR) images with a 1-meter ground resolution; they cover an area measuring 3.75 minutes longitude by 3.75 min- utes latitude, or approximately 5 miles on each side. Each DOQ has between 50 and 300 meters of overedge image beyond the latitude and longitude corner crosses embedded in Methods Description Measurements Applications Accuracy Measuring Aerial Photogrammetry Stereo-Plotting – Digital and Analog x,y,z Engineering, design, GIS basemapping 3-5 inches Orthophotos On-screen digitizing x,y Design, GIS – direct basemapping 1.5 ft w/ 0.5 ft resolution 6 ft w/3 ft resolution LIDAR Automatic height measurement using laser z(x,y) Engineering, design, digital elevation models 4 inches Ground: Vehicle Based Mobile Mapping Global Positioning System (GPS) / Inertial Navigation System (INS)/ Digital Stereo Measuring x,y,z GIS- asset inventory, mapping, engineering < 1 meter Video-Logging DMI w/ GPS/ Single Video Camera x,y of vehicle distance (D), offset (∆O) Inventorying, pavement condition analysis 3-10 meters Distance Measuring Instruments (DMIs) DMI w/ data logger D Asset inventorying > 1 meter (% of distance Ground: Surveying Wheel Operator walks w/ wheel (like DMI) and measures distances relative to stations defined in a map Relative Distance (∆D) Crash investigation, local surveys for maintenance and planning 2 feet +2% of (∆d) Kinematic GPS Dual frequency carrier phase with base stations x,y,z Engineering design, property surveys 1-5 inches Differential GPS Pseudo ranges w/ real time differential conditions x,y (z) Asset inventorying 5-10 feet Laser Ranging Laser gun with compass and inclinometer to determine location of objects ∆d, angle (α), (x,y,z) Asset inventorying 1 inch + % of (∆d) Total Stations (theodolite) Land surveying weith theodolite/electronic distance measuring system (EDMs) x,y,z Engineering design, property surveys 1-5 inches Map Digitizing Paper maps are placed on digitizing tablet x,y GIS-basemapping, legacy data conversion 5-50 feet Positioning Qualitative/Approximate Locating Distance from landmark Estimate distance from landmark ∆d Distance from intersection Estimate distance from intersection ∆d Distance from milepost marker Estimate distance from milepost marker ∆d Address Address number Address number (#n) Crash reports and investigation, emergency response (EMS-911), roadway maintenance crews 100-300 feet Automatic Vehicle Location GPS for car navigation x,y, α 10-50 feet Compass GPS, compass and odometer data are merged with street maps to keep track of vehicle and α Vehicle tracking, routing, car navigation, emergency >10 degrees Odometer show its current location ∆d dispatch > 20 ft (+ 5% of distance) TABLE 2-5 Characteristics of measuring methods (continued on next page)

the image. All DOQs are referenced to the North American Datum of 1983 (NAD 83) and cast on the Universal Trans- verse Mercator (UTM) projection. The file size of a gray-scale DOQ is 40 to 45 megabytes, and a CIR DOQ can be three times this size. Digital orthophotos are a standard product commonly used as a base map for GIS. Typical resolutions for county- wide mapping projects are 2 to 3 feet. In cities, resolutions of 6 inches and 1 foot are used. From these orthos a limited number of roadway features can be extracted. Digital orthophotos are incorporated in GIS. They function as a cartographic base for displaying, generating, and modi- fying associated digital planimetric data. Other applications include vegetation and timber management, routing and habi- tat analysis, environmental impact assessments, emergency evacuation planning, flood analysis, soil erosion assessment, facility management, and groundwater and watershed analy- sis. Orthos created from satellite images are sometimes used to create statewide road centerline maps. The accuracy of orthophotos ranges from 1.5 feet with 0.5-foot pixel ground resolution to 6 feet with 3-foot ground resolution. The horizontal accuracy of a DOQ is typically around 3 meters (i.e., orthophoto error is typically three times the pixel resolution). Similar to aerial photogrammetry, GCPs are typically established using GPS and aerial GPS. Digitiz- 24 ing is manual, however, and no special equipment is needed, as the orthophoto can be directly used as the base map. Orthophotos are mostly used for network-level applications. LIDAR. This system automatically measures elevations using laser technology–Light Detection and Ranging (LIDAR). The laser system is mounted on an aircraft, along with a GPS and an Inertial Measuring Unit (IMU). The GPS derives the laser’s latitude, longitude, and height. The IMU provides information on the aircraft’s roll, pitch, and yaw. Using these measurements, a computer can calculate the position of the laser as a function of time. As the aircraft proceeds along the flight path, the laser oscillates back and forth perpendicular to the aircraft’s direc- tion, while rapidly sending and receiving laser pulses that reflect off the earth’s surface. Utilizing the information on the position and attitude of the sensor, the elapsed time between laser pulse and sensor retrieval, and the speed of light constant, a large series of x, y, and z ground surface points are collected. These points are then transformed into a regular digital elevation model (DEM). LIDAR creates three-dimensional surface points. However, because these points do not correspond to a specific feature, the horizontal component is of limited value. LIDAR is used for engineering design projects as well as DEMs along roadways. Methods Reference Point Locations Data Collection Vehicle Productivity Level of Data Measuring Aerial Photogrammetry Ground Control Points (GCP), aerial GPS Airplane/helicopter Manual post processing Project Orthophotos GCP and aerial GPS Airplane Direct use as basemap Network LIDAR Aerial GPS and inertial system Helicopter/airplane Real-Time Heights, post processing for DEM Project Ground: Vehicle Based Mobile Mapping GPS base stations - HARN Van Manual and semi-automatic post processing Network, Project Video-Logging COARSE – Coast Guard GPS reference stations Van Visual image inspection Network Distance Measuring Instruments (DMIs) Intersections (anchor points, nodes) Van/Car Real-time data logging in vehicle Network Ground: Surveying Wheel Stationing along roads Person walking Measurements recorded on printed map or notepad Project Kinematic GPS HARN, first-order GPS reference stations Person walking (tripod) Data collector connected to receiver Project Differential GPS COARSE- Coast Guard GPS reference stations Person walking Data collector connected to receiver Project/Network Laser Ranging Local Reference Points, GCPs or GPS Person walking (bipod) Data collector connected to laser Project/Network Total Stations (theodolite) HARN, first-order GPS reference stations Instrument static on tripod Data collector connected to total station Project Map Digitizing GCPs Person working in the office Recorded on computer, cleanup in CAD system Network Positioning Qualitative/Approximate Locating Distance from landmark Landmark Distance from intersection Intersection Distance from milepost marker Milepost marker Address Street segment, block Persons walking or driving in a car records data on paper Recorded in the field in real - time Network Automatic Vehicle Location GPS for car navigation GPS satellites Compass Magnetic north Odometer Start of travel Sensor(s) installed in car or truck Data are recorded and merged with map in real-time to continuously show the location of the vehicle Network TABLE 2-5 (Continued)

The accuracy level of this measuring method is about 4 inches in the vertical direction. Typically, the laser is flown at altitudes of 5,000 to 8,000 feet above the ground surface. Theoretically, this can produce a horizontal accuracy of ±0.4 meters and a vertical accuracy of ±0.15 meters. Part of the accuracy equation is the accuracy of the GPS used to locate each LIDAR pulse return point. GPS is usually accu- rate to 5 or 7 centimeters. Aerial GPS and inertial systems are used as the reference points of control. The data collection vehicle is either a heli- copter or an airplane. LIDAR can be flown by either type of aircraft; the selection usually depends on the altitude of the flight. The system is fully automatic; an operator is not needed when processing DEMs. The large quantity of data created and the narrow swath of the LIDAR system limits its appli- cation to the project level. 2.4.2.2 Ground Vehicle-Based Mobile Mapping. A mobile mapping van is equipped with a survey-grade, kinematic GPS receiver; an INS (Inertial Nav- igation System) unit; and up to five digital cameras. GPS data are used to determine the position of the van at any time, while the digital cameras capture high-resolution color images pointing forward and to the road right of way, showing a “windshield view” of roadside assets and condition. Each pair of digital cameras can be used to measure the spatial loca- tions of roadway features. This method of inventorying highway infrastructure and integrating the images into Infrastructure Management System (IMS) databases is considerably more efficient than traditional approaches. This system allows users to create GIS base maps and infrastructure management systems at an affordable cost and with a short turn-around time. Measurements can be made in x,y,z coordinates—this system creates real 3D coordinates. Data collected with this method are used in GIS for various applications, including asset inventory, mapping, and engi- neering. The accuracy of measurements is less than 1 meter. GPS base stations are usually set up at High Accuracy Ref- erence Network (HARN) points or other first-order reference points. In addition, data are manually and semi-automatically post-processed. The system can be used in both network- and project-level data collection; however, it is most efficient if the roadway mileage of a project is more than 20 miles. Video-Logging. For video-logging applications, digital or analog right-of-way images are captured in a single pass driv- ing along a roadway. Some vendors offer multiple cameras configured to provide a 130-degree panoramic view, similar to a driver’s view. Other agencies configure the right-side camera to provide a roadside view for environmental appli- cations. Images are typically captured at predetermined inter- vals, usually 100 frames per mile, which equals a spacing of 53 feet between images. Images are usually stored on video- tapes; newer systems deliver digital video. A video banner 25 describing the roadway ID, date, time, and milepost can be optionally burned to the images. The location of the vehicle is determined by distance measurement instruments (DMIs) and/or real-time, differential GPS. This method is used to collect data on a vehicle in x,y coor- dinates: distance (D) and offset (∆O). Accuracy of data col- lection is 3 to 10 meters. Images are collected in real time; visual image inspection is used to extract asset information. The data collected with this method are used in inventorying and pavement condition analysis. Continuously Operating Reference Stations (CORS) are the reference point locations. Distance Measuring Instruments (DMIs). DMIs are used for measuring distances and are installed in a car or van, com- bined with a data logger. The DMI needs to be initialized at a known reference point: an intersection or other log point. When the vehicle moves along the road, the accurate distance is recorded. For example, the NITESTAR® distance measur- ing instrument can measure distances to ±1 foot over the course of 1 mile. NITESTAR® has been designed to make dis- tance measuring easy, and it is linked to a special keyboard for data logging. NITESTAR® has internal memory to store numerous events along with the distance at which they occur. DMIs are used to collect data for asset inventorying (e.g., paint line length, guide rail length, pole or sign spacing, cable or pipeline length, truck, bus, or postal routes, E-911 address locating, crash reconstruction, and roadway and rail- way lengths.) The accuracy level is greater than 1 meter, based on a percentage of distance, ±1 foot per mile. Inter- sections (anchor points, nodes) serve as reference points. 2.4.2.3 Ground Surveying Wheel. In ground surveying, the operator walks with a measuring wheel (like DMI) and measures distances relative to stations (visible reference points) defined on a map. Wheels range in size from 4 to 25 inches in radius. This method is used for measuring relative distance (∆D); the counter measures up to 100,000 units (feet or meters). It is used for crash investiga- tion, local surveys for maintenance, planning at the city and county level, and telecommunications inventorying. The accu- racy is around 2 feet plus 2 percent change in distance. Stations along roads are used as reference points. Distance measure- ments have to be recorded manually on a printed map or notepad. The data are used for project-level applications. GPS. Recent significant advances in roadway mapping reflect the use of combined technologies (e.g., GPS, dead reck- oning technique). GPS is increasingly being used to obtain coordinate data associated with events and to generate GIS- based vector drawings to map those events to the network. Positional accuracy varies depending on the data collec- tion equipment used. GPS positional accuracy is much finer than those obtained with traditional maps (e.g., with TIGER files) and maintains tighter control for the location of linear

features and events. GPS data positional accuracy is typically expressed in 2D or 3D, for example, in terms of circular error probability (CEP) or spherical error probability (SEP). Thus, it is necessary to transform these accuracy measures into 1D measures to make them comparable with linear feature dis- tance accuracy measures. Many states are collecting inventory and pavement condi- tion data using vans equipped with videos, digital cameras, computers, and GPS receivers. Several states have experi- mented with the use of GPS for collection of incident data. GPS technology has rapidly matured to the point where, using differential GPS, sub-meter accuracy is technologi- cally possible. Once 3D data are collected, DOTs are left to deal with determining the relationship with the associated cartographic centerline in their GIS spatial database. NCHRP Project 15-15 is evaluating various technologies for cost- effective ways to collect data on physical attributes of high- way facilities and display them in straight-line diagrams. Many states recognize considerable practical applications for using GPS in the field, and many state experts are explor- ing the options daily. GPS technology will likely dramati- cally affect how future GIS systems are built. The availabil- ity of highly accurate, 3D measurements makes it possible to calculate locations and distances more easily than with some of the linear location referencing methods currently in use. However, in the absence of 100-percent accuracy in both the spatial database and the GPS-collected data, there is still error in relating a GPS point or linear event to its accurate location on the associated centerline representation. That is, GPS, in itself, does not solve the conflation problem. Each time data are collected along the same roadway with a GPS van, a dif- ferent string of coordinates will be obtained. These coordinate strings must be related before the data can be integrated. Kinematic GPS. Kinematic GPS deals with dual fre- quency, carrier-phase data processing. The basis of GPS is the measurement of distances to GPS satellites using the travel time of radio signals. Only carrier-phase processing can provide millimeter-level accuracy; code-phase process- ing using single-frequency signals can yield only meter-level accuracy. The combination of two frequencies removes the effects of the ionosphere. Under heavy foliage or when satel- lite signals pass through light trees, the signal strength is greatly diminished. The receivers that have better sensitivity can track signals more reliably under such adverse condi- tions. The accuracy, reliability, and speed of obtaining results increase with the number of satellites. Five satellites are the minimum for obtaining a reliable position. Measurement can be made in x,y,z coordinates with an accuracy of 1 to 5 inches. Positional data collected with GPS are used for various applications, including engineering design and property surveys. The location points of reference for this device are the HARN and first-order GPS reference stations. Data are collected by a person walking with a tri- pod; the data collection equipment is connected to a receiver. 26 The kinds of positional data collected with this method are used for project-level applications. Differential GPS. Differential GPS (DGPS) is a tech- nique used to improve positioning or navigation accuracy. It is performed by determining the positioning error at a known location and subsequently incorporating a differential cor- rection factor (by real-time transmission of corrections or by post-processing) into the position calculations of another receiver operating in the same area and simultaneously track- ing the same satellites. Differential GPS is based on process- ing of pseudo-range (distances) between receiver and satel- lite using a ground reference station to provide corrections of atmospheric effects on the signals. One (fixed) receiver mea- sures the timing errors and then provides correction infor- mation to the other (roving) receivers. Measurement can be made in x,y,(z) (elevations are not very accurate) coordinates with accuracy of 5 to 10 feet. The location point of reference for this device is Continuously Operating Reference Stations (CORS). Data are collected by a person walking, with equipment connected to a receiver. Data collected with DGPS are used for project- and network- level applications including asset inventorying. Laser Ranging. A common tool for inventorying assets is a laser range finder with integrated compass and inclinometer to determine locations of objects. Typically infrared, GaAs laser diodes are used for distance measurement. The generated light energy has a wavelength of approximately 900 nanome- ters, with a beam divergence of 3 milliradians, equal to a beam width of about 3 meters at 1000 meters. The target acquisition times range from 0.3 to 0.7 seconds. These lasers are com- pletely eye safe, meeting FDA Class 1 specifications, which means that a person could stare directly into the laser for 3 hours without any harm to the eyesight. The radiated light power is in the order of 50 microwatts; it outputs only 5 per- cent of the light power of a typical TV remote control, and far less than a flashlight. Laser range finders calculate distance by measuring the time of flight of very short pulses of infrared light. This method differs from the traditional surveying instrument method of measuring phase shifts by comparing the incoming wavelength with the phase of the reflected light. Any solid object will reflect back a certain percentage of the emitted light energy. The instrument measures the time it takes a laser pulse to travel to the target and back with a pre- cision, crystal-controlled time base. Knowing the speed of light, the distance is calculated. To increase accuracy, the laser measures as many as 60 pulses, utilizing the average to determine the range. Using this method, measurements of ∆d, angles (α, ζ) (azimuth, inclination), and x,y,z can be computed from angle and range, if the location of the laser gun is known. The level of accuracy is 1 inch plus a percentage of ∆d. The location points of reference for this device are local reference points, GCPs or GPS. Data are collected by a person walking with a

bi-pod, and with equipment connected to a laser for real-time data collection in the field. Data collected with this device are used for project- and network-level applications, including asset inventorying, surveying, and construction. Total Stations (Theodolites). Land surveying with theo- dolites combined with an electronic distance measuring (EDM) system is the preferred method for project-level, high-accuracy mapping of small project areas. These instru- ments are also called total stations. They need to be set up on tripods and leveled by the surveyor. Measurements consist of a distance to a reflector, as well as a horizontal and vertical angle. The 3D location of the object point is computed imme- diately and stored on a data collector. There are total stations that work without reflectors and some that automatically trace the reflector (basically reducing the total station crew to the person holding the reflector). Measurement can be made in x, y, z coordinates with accu- racy of 0.5 to 3 inches. The location points of reference for this device are HARN or first-order GPS reference stations. Data collected with theodolites are used for project-level applica- tions, including engineering design and property surveys. Map Digitizing. Paper maps are attached to a digitizing tablet and lines are traced with a mouse or cursor directly on top of the map. An advanced approach is based on scanning the map and digitizing the lines on the computer screen using the mouse and computer cursor. There are automated pro- grams for digitizing specific map elements, such as contours and road centerlines. In order to convert the digitized lines into a real-world coordinate system, control points are needed. Measurements are typically in x,y coordinates (except when contour lines are digitized). Digitized maps are used in GIS for base mapping, legacy data conversion (parcel maps, utility drawings), engineering design, and property surveys. Accu- racy of digitized maps depends on the quality and scale of the maps. It can be no better than the nominal accuracy of the orig- inal map. Typically the positional accuracy of any measure- ment represented on the map can be anywhere between 5 to 50 feet from its true position. Data are directly recorded on the computer. Clean up of data in a CAD system is necessary; some automated digitizing programs are available. 2.4.3 Positioning 2.4.3.1 Qualitative/Approximate Locating The methods described in this section are commonly used to determine the current location of a vehicle, person, or fea- ture in real time with measurement tools that are available to the average consumer or vehicle operator. Distance from Landmark. The current position is deter- mined as the estimated distance from a landmark (e.g., church, easily identifiable building, or roadside object). There is no 27 offset, however, and the side of the road (e.g., north, south) is typically known. Typical applications are crash reports and investigation, police, emergency response systems (EMS- 911), and roadway maintenance. Accuracy of measurements is in the range of 100 to 300 feet. Landmarks serve as the ref- erence points. Data are collected by a person walking or dri- ving in a car, recording data on paper. Distance from Intersection. The current position is deter- mined as the estimated distance from the nearest intersection. The side of the road (e.g., north, south) is typically known. These are distance, ∆d, measurements and used in crash reports and investigation, police, emergency response sys- tems (EMS-911), and roadway maintenance. Accuracy is in the range of 100 to 300 feet. Reference points are usually intersections. A person walking or driving in a car records the data on paper. Data are recorded in the field in real time by reading a vehicle odometer. Distance from Milepost Marker. The current position is determined as the estimated distance from a milepost marker along the roadway. The side of the road (e.g., north, south) is typically known. These are distance, (d, measurements and used in crash reports and investigation, police, emergency response systems (EMS-911), and roadway maintenance. Accuracy is in the range of 100 to 300 feet. The reference point is the milepost marker. A person walking or driving in a car records the data on paper. Data are recorded in the field in real time by reading a vehicle odometer. Address. Address numbers are defined in a grid system over a city or county, or as a function of the distance along a roadway, relative to a starting point. Often address ranges provided in TIGER files are used to estimate the location. Addresses are difficult to use, as they may not appear on a building, and they may be different in postal, county, and utility databases. Address numbers (#n) are recorded by a person walking or driving in a car. Data are recorded in the field in real time by reading the odometer. Accuracy is in the range of 100 to 300 feet. The reference point is a street seg- ment or block. 2.4.3.2 Automatic Vehicle Location GPS for Car Navigation. A GPS, compass, and odometer are often used in an integrated system. The measurements are automatically merged with street maps to keep track of a vehi- cle and show its current location. A navigation system needs to know where the vehicle is on a map. Correlating the raw data from the sensors to a navigable map database enables meaningful map display of the car’s location, calculation of distances between possible destinations and turns, and route calculation. These functions are only as good as the map database on which they rely—accuracy, detail, and coverage are crucial to satisfactory performance.

The type of measurements are x,y, and azimuth angle (α) coordinates of the location and driving direction of the vehi- cle. This method is used in vehicle tracking, truck routing, car navigation, and emergency dispatch. Accuracy is around 10 to 50 feet, and GPS satellites are the reference points. Data are recorded and merged with a map in real time to continu- ously show the location of the vehicle. Compass. A compass is an instrument that indicates direc- tion. Two fundamental types of compass are used: the mag- netic compass, which probably originated in ancient China, and the gyrocompass, a device developed at the beginning of the 20th century. In the magnetic compass, directions are obtained by means of one or more magnetic needles pointing in the general direction of the magnetic North Pole under the influence of the magnetic field of the earth. The gyrocompass, which is unaffected by the magnetism of the earth, consists of a gyroscope, with the spinning wheel on an axis confined to the horizontal plane so that its axle aligns itself with the north-south line parallel to the axis of the rotation of the earth, thereby indicating true north. The compass is used to measure the azimuth angle (α). Accuracy of this device is better than 2 degrees. This device is used for vehicle tracking, truck routing, car navigation, and emergency dispatch. The reference point is the magnetic or true north. Data are recorded and merged with a map in real time to continuously show the location of the vehicle. Odometer. An odometer is an instrument in automotive vehicles to indicate the total number of miles that have been traveled. The odometer generally shares housing with the vehicle’s speedometer and is driven by a cable that the two share. When the vehicle is in motion, this cable moves a series of gears in the odometer, turning a set of numbered drums that count the miles traveled. Some odometers, called trip meters, can be manually reset to zero to measure the lengths of individual trips. This device is used to measure distance, ∆d, with accuracy greater than 20 feet plus 5 percent of distance. The data are used for vehicle tracking, truck routing, car navigation, and emergency dispatch. The reference is the start of travel. Data are recorded and merged with a map in real time to continu- ously show the location of the vehicle. 2.5 MODELING DATA ERROR 2.5.1 Introduction This section describes a conceptual error model for assess- ing the effects of data uncertainty in measurement techniques applied to transportation phenomena and transformations between spatial referencing systems. The fundamental ques- tion of interest is the positional accuracy of a recorded posi- tion for any transportation feature or event. The error model would allow users of GIS data to be aware of the bounds on 28 the “true” locations of transportation features and events, whether these are independently arrived at or derived from the integration of diverse data sources. To assess positional errors, there must be an understanding of how a recorded position for a transportation feature or event is determined. A starting point for the development of the conceptual error model is a model of the transportation system. The 20-27(3) data model is a comprehensive and well-developed model of transportation phenomena and it contains many of the rela- tions necessary to make the above determination. However, it falls short in supporting a comprehensive view and hence management strategy for positional accuracy, as some impor- tant contributing error sources are not modeled. The most critical components for developing the error model lie in the relationships of transportation features to spatial objects and spatial objects to spatial referencing systems. Before describ- ing the error model, some relevant terms and issues are defined and discussed in the following section. 2.5.2 Review of 20-27(3) Data Model and Clarification of Terms This section defines and clarifies terms pertinent to the development of the conceptual error model and indicates their overlap or deviation from the 20-27(3) data model. The terms of interest include transportation feature, event, phys- ical roadway, roadway section, link, node, network, spatial reference systems, reference objects, locational reference, and anchor section. In the 20-27(3) data model, a key object is the transporta- tion feature. It is defined as a non-decomposable phenome- non in the transportation domain. Examples of transportation features include roads, routes, ramps, bridge abutments, cul- verts, maintenance management zones (e.g., spray zones and no sand/salt sections), and pavement management zones. Another object in the 20-27(3) data model for which positional accuracy issues are of concern is the event object. Events refer either to occurrences or changes of state to features on or along a roadway. Events can be traffic crashes, construction, or repair activities applied to transportation features. In the 20-27(3) data model, both transportation features and events are associated with spatio-temporal objects. For this project, the interest is only in the spatial dimension. Trans- portation features and events are modeled in the 20-27(3) study as being represented by spatial objects and associated with spatial reference systems. According to the 20-27(3) data model, each transportation feature or event can have zero-to- many associated spatial objects, and each spatial object can be associated with zero-to-many topological or zero-to-many geometric objects. The topological objects serve to model the connectivity among spatial objects. Each geometric object serves to represent the position and possibly size and shape of a transportation feature at some point in time. Each geometric object has one or more associated spatial reference systems that allow a transportation feature to be spatially positioned.

For the purpose of having some distinct transportation fea- tures to refer to in developing the conceptual error model, a few of them (e.g., physical roadway and roadway section) are distinguished and their associated spatial objects (e.g., link, node, and network) discussed. Physical roadways are the connected set of transportation features, such as highways, streets, roads, and exit ramps, that have a real-world presence. Because the physical road- way is a complex connected system, it is frequently of inter- est to be able to identify and refer to its sub-sections. A road section is a sub-unit defined as the portion of physical road- way between intersections. A roadway section is a trans- portation feature and a section in the 20-27(3) data model. A physical roadway and hence roadway section can have multiple associated digital representations that will vary in spatial detail and hence positional accuracy. For most trans- portation applications, a centerline representation serves as the geometric representation of the physical roadway. There are two commonly available public centerline digital repre- sentations for most major roads in the United States: the USGS 124,000 scale DLG and the Census TIGER file roads (nominally of 1100,000 scale heritage). Another possible geometric representation is the edge of pavement as often captured from aerial photography. This results in multiple digital spatial representations for a single physical roadway section, as indicated in Figure 2-1. The term link refers to the digital spatial representation of a roadway section centerline. It corresponds to a spatial object in 20-27(3) and is defined as a spatial object that rep- resents the section of roadway between intersections. A node is a spatial object that represents the road intersection. A link has one topological representation but may have multiple geometries. The geometry of a link is typically a set of (x, y, and sometimes z) coordinates for a road centerline. The geom- etry of a node is a (x, y, and sometimes z) coordinate for a road intersection. A network, a complex spatial object in 20-27(3), is defined as a set of connected links and nodes. A network may have topology and/or geometry. The topological representation cap- tures roadway connectivity and typically indicates the bound- ing nodes for each link and the incident links for each node. A network is a key component and concept in linear referencing systems that are one form of spatial referencing systems. 29 The 20-27(3) data model identifies spatial referencing sys- tems and it is agreed that there are multiple spatial referencing systems that differ primarily with respect to their dimensions. A referencing system for any dimension (i.e., space, theme, or time) is defined as a framework for a set of measurements where a measurement is the assignment of class or score to a phenomenon based on a set of rules. A spatial reference sys- tem defines the parameters and rules to situate a measurement in space. The essential parameters for any spatial reference system are an origin and units (the required parameters for a linear spatial referencing system). A 2D system further requires specification and orientation of two axes and possibly location and relation of the origin and axes to a geometric body. A 3D system requires specification of a geometric body and orientation of the origin and three axes with respect to this body. Figure 2-2 illustrates components of these systems. The parameters and rules required for each dimension cor- respond to the datum object specified in the 20-27(3) data model. To generate a measurement in one of these systems involves any number of different measurement methods for distances, angles, or times. Reference objects specified in the 20-27(3) data model are an important concept within ref- erence systems. These objects are measured typically to well- defined standards such that additional measurements can refer to these measured positions rather than to the original system parameters. For example, mile markers can be reference objects in the linear system and new measured positions can be based on the measured mile-marker locations rather than with respect to the system origin. As shown in Figure 2-3, the (a) one physical roadway section (b) multiple digital representations links links nodes Figure 2-1. Representations of physical roadway. (a) Linear system (b) 2D System (c) 3D System 0 155 miles 0,0 X Axis Y Axis 0,0,0 Y Axis Z Axis feet Origin, Units = miles Origin, Units = feet Origin, Units = degrees 210, 398 35, 107 Figure 2-2. Components of spatial reference systems.

position of a transportation feature, represented by the tri- angle, can be determined by the two bounding mile markers rather than by the origin. Any transportation feature or event may be associated with one or more linear or higher dimensional spatial reference datums and one or more different measuring methods (e.g., photogrammetry or GPS). A further important distinction is that they also can be associated with different orders of mea- surement (i.e., measured directly according to the system parameters or measured with respect to one or more refer- ence objects). This distinction captures the situation in which a roadway inventory project uses DMI to measure both road centerlines and all the assets along the roadway at the same time. All of the transportation features in this situation would have directly measured positions, rather than positions mea- sured through reference objects (mile markers). The 20-27(3) data model indicates that geometric objects are only linked to a spatial reference system through reference objects. Based on these associations, every transportation feature or event has one or more locational references. A locational reference is a term not used in 20-27(3). It refers to the infor- mation stored in the database that provides the spatial loca- tion description for any transportation feature or event. Fig- ure 2-4 illustrates locational references for 2D and linear reference systems. As mentioned above, links may have multiple geometries and, for each geometry, there may exist one or more spatial reference systems. Multiple linear reference systems may exist for a set of links due to the passage of time. A linear refer- ence system, for example, might have been put in place in 1980 and re-measured in 1999 using a new measurement tech- nology, so that for some transition period, two linear mea- surement systems will coexist. Linear measurements will also 30 typically coexist (on links) with one or more 2D or 3D mea- surement systems. The multiple spatial reference systems attached to links may be dependent on or independent of each other. An exam- ple of an independent case is a situation where the link geom- etry is 124,000 scale DLG data in a 2D reference system but with a linear measured distance for the link captured by odometer or DMI. In this case the linear measured distance is not dependent on the 2D geometry. A dependent scenario occurs when the linear measurement is computed directly from a 2D or 3D measurement system, say by computational geometry. As an example, GPS might be used to measure coordinates (longitude, latitude, and height) for road center- lines and these measures might subsequently be used to com- pute a 3D distance measure for the road centerline. As an extension to the 20-27(3) data model, the dependencies among measurement systems should be accounted for, as they are pertinent to the error model. In the dependent case, the linear measured distance will be affected by the error characteris- tics of the 2D or 3D reference system. Under a linear reference system, the distance measure is applied to an anchor section, which is a set of connected road- way sections or links. The measured distance of the anchor section is used to reference other transportation features on or along the roadway. In some cases, the linear distances to trans- portation features along the roadway are measured at the same time as the measures are applied to the anchor section. These measurements all have the same measurement characteristics and hence the same error characteristics. The anchor section is described and other transportation features measured simulta- neously as having direct linear measured positions. Any trans- portation features subsequently referenced to the anchor sec- tion will have an indirect linear measured position and hence different error characteristics. Similar distinctions may apply in the 2D and 3D cases. So as a refinement to the 20-27(3) data model (87), it is sug- gested that locational references for transportation features be categorized as follows: • Direct linear measured position, • Indirect linear measured position using linear reference objects, • Indirect measured position using 2D reference objects, and • Indirect measured position using 3D reference objects. NOTE: In this case, mile markers along a road can be used to generate measures for unmeasured transportation features or events. 0 10 miles Figure 2-3. Reference objects. Event ID Location ( X,Y, Z) (a) Independent locational reference Event ID Route ID Distance Offset (b) Dependent locational reference Locational reference Locational reference Figure 2-4. Distinction between locational references.

Positional accuracy depends on the measurement methods applied in each case and also on the accuracy of reference object measurements where these apply. The above qualifications and identified dependencies are not explicit in 20-27(3), yet they have implications for under- standing the positional error characteristics. These depen- dencies and their effects on positional error lay the founda- tion for development of the error model. 2.5.3 Sources of Error in Spatial Data The locations of transportation features are typically col- lected, analyzed, operated on, transformed, and compared rel- ative to other transportation feature locations without regard for positional accuracy or the quality aspects of the data. Positional errors arising from imperfect measurements are inherent in data. Also, certain operations on data, such as transformation among spatial referencing systems, introduce additional persistent spatial distortions. Such errors propa- gate through spatial analytical processes and are embedded in applications that manipulate data in various ways to pro- duce results used in decision making (87). The first step in developing the conceptual data error model is to identify the various sources of error. The primary 31 sources of error associated with positional data are acquisi- tion or measurement, processing, and presentation or visual- ization. Regardless of the measurement technique and refer- encing system, data will be observed with error. As discussed in Section 2.4 of this report, the method of data collection sets limitations on the selection of the measures and their metrics. As described in the preceding section, every transportation feature is or can be associated with one or more spatial refer- encing systems. Depending on the measurement techniques used by a referencing system, each recorded location reference will have different error characteristics. Figure 2-5 outlines the error sources associated with different spatial referencing sys- tems. The important difference in the linear referencing sys- tem is its dependency on a path definition. The path can be the physical roadway and the measurement method may be applied to the physical roadway (e.g., using DMI). Alterna- tively, a path can be a digital representation of the physical roadway, in which case, the linear measurement may be com- puted from the digital representation. In this latter case, the level of the network’s spatial detail (i.e., topological and geo- metric) and the measurement technique will affect the mea- sured distance and any subsequent locational references that employ this representation and measurement. Transportation feature or event to be located Locate in 2D or 3D reference system Locate in linear reference system Error in the locational reference is a function of error in the 2D or 3D measurement system Error in the locational reference is a function of error in the linear measurement technique and error in the network Requires a 2D or 3D spatial reference system Requires a path designation (physical roadway or a network) and a linear spatial reference system x + em2, y + em2 x + em3, y + em3, z +em3 Where em2 represents error in the 2D measurement system and em3 represents error in the 3D measurement system. d + emp + eml Where emn represents error in the path and eml represents error in the linear measurement system. emn can be further subdivided into errors in the measurement of the network (em2 or em3 type errors) and errors in the representation (er)of the network. Representational errors include topological, geometric, and attribute inaccuracies. In the indirect case, eml can be further subdivided into errors in the measurement of the reference objects and the reference marker spacing. emp = em2 + er 2D case 3D case eml = erm + ers Direct case with network Indirect case Figure 2-5. Outline of error sources associated with the process of assigning locational references to transportation features.

A transportation feature or event, whose location is mea- sured by a 2D or 3D measurement system, such as photo- grammetry or GPS, is independent of the road geometry. For example, a hazardous waste spill from an overturned truck can be captured and recorded by GPS without reference to the adjacent roadway. As another example, a traffic crash may be reported as cell phone coordinates (x, y) using any of a number of cell phone locational methods. Linearly referenced transportation features or events, because of their potential dependency on a network (i.e., dig- ital representation of the physical roadway), are subject to inaccuracies in the network as well as characteristics of the measurement methodology. As indicated in the previous sec- tion, it is also important to distinguish a direct linear measured location from an indirect measured location. The direct linear measures typically apply to the path and physical transporta- tion assets along the path. The accuracy of indirect measure- ments depends on reference objects and will be influenced by the measurement errors in the reference objects and the spac- ing of the reference objects as indicated in Figure 2-5. The linear spacing of the reference objects associated with the linear reference system can substantially affect positional accuracy. The spacing between linear reference markers is a form of resolution and the coarser the spacing or resolution, the less accurate the locational reference. If, for example, a crash is reported as between Exits 49 and 50, the accuracy of the event is a function of the distance between exits (approx- imately ±5 miles for a 10-mile spacing between the exits). A crash referenced as just south of Mile Marker 315 can have a higher accuracy because of the finer spacing (i.e., resolu- tion) of mile markers. The error model must consider the positional accuracy of a linear locational reference as a func- tion of the type and resolution of the linear reference method. Where a digital representation substitutes for the physical path, the results are multiple possible topological and geo- metric representations of the physical roadway, some of which may be substantially less accurate than others. Characteris- tics of a network that affect the quality of referencing a trans- portation feature or event include topological completeness, geometric accuracy and detail, and attribute accuracy and consistency. To understand the role of geometric accuracy and detail, consider the case in which GPS is used to position a road cen- terline. Each recorded coordinate might have centimeter- level accuracy. However, the number of coordinates collected and their ability to capture the geometry of the physical road- way will have a sizeable effect on the accuracy of the linear measured distance generated from these coordinates. This is another instance in which resolution has a significant effect on positional accuracy. Attribute accuracy and consistency play a role given that, as noted in the previous section, a linear location reference includes a route or similar reference that must ultimately pro- vide a link to an anchor section. Relationships need to be estab- lished among several objects across the database to make con- 32 nections from route identifiers to links to an anchor section. If a name or identifier is incorrect or inconsistent somewhere in the database, misconnections will occur, resulting often in gross inaccuracies in a referenced position. The problem is most likely to occur with indirect linear measurements. From a practical point, the location of objects relative to a network is of great importance. Practically, it does not mat- ter if the road network is a meter off, as long as one can find the location of a certain feature. Networks with low posi- tional accuracy can be used, as long as they are complete rel- ative to the log points and intersections. Although several sources of error are involved in generat- ing a locational reference for transportation features or events, the transformations between spatial reference systems are another source of positional error. In the transformation process, either two independent reference systems have to be combined into one new system, or one system must be transformed to the other. Both approaches raise issues of uncertainty. Figure 2-6 is a schematic representation of three sources of errors involved in this context. Figure 2-6a illustrates an example of 2D measurement error. Because the measure- ment is independent of the road network, the measurement may be off the roadway even though, in reality, it is on the roadway. Transforming the 2D reference to a linear reference will place the location on the roadway but with some error that is a function of the 2D measurement error plus a linear measurement error. Conceptually, the 2D-measured location moves to the closest point on the roadway. However, given the error in the measurement, there are multiple closest points rep- resented by the normal vectors from the circular error bound to the road centerline. Figure 2-6b illustrates error in the network representation. Given that the centerline position has error, the set of closest points extends to positions represented by the network error buffer. Figure 2-6c represents the cumulative error from these sources. Finally, Figure 2-6d illustrates the errors that might be present in the linear referencing system. Figure 2-6d illustrates potential bounds on the transformed linear position. The spe- cific error value depends on the errors in each of the respective referencing systems. The effective result is that the 2D error transforms to a linear error in the linear referencing system. Figure 2-7 shows an example for a 2D error ellipse. 2.5.4 Transformation Methodology Transformation of data provides the necessary key for the interoperability of data sets. Many transportation agencies recognize a need to be able to translate location references between spatial referencing systems. Some agencies establish one referencing system as the primary system and derive the locations in other systems from the primary system. For exam- ple, the primary location referencing method at the Virginia Department of Transportation (VDOT) is link-node. VDOT

33 2D Measurement Error Network Error Buffer True Representation Network Representation ( x,y) Linear Referenced Error Linear Referenced Distance ( x,y) Cumulative Network and Measurement (a) (c) (d) (b) Figure 2-6. Schematic representation of positional data error. 2D Error Ellipse 1D Distance Error Transforms to Figure 2-7. Example of 1D error generation. also uses mile points derived from link lengths between nodes with known mile-point locations. Other states establish a loca- tion control mechanism that is independent of any LRM. For example, Wisconsin DOT has a Location Control Manage- ment System, which is used for conversion between differ- ent linear referencing systems (e.g., link/site and reference point) (11). The main types of transformation are defined and illustrated in the following sub-sections. 2.5.4.1 Types of Transformations The main types of transformations for transportation appli- cations involving linear reference systems are as follows: • Transformation Type 1—transformation of a 2D (or 3D) location expression to a linear location expression. This might occur when new data are collected using GPS and these GPS coordinates need to be converted to a linear referenced position to integrate with legacy data already linearly referenced (Figure 2-8a). • Transformation Type 2—transformation of a linear location expression to a 2D (or 3D) expression. This might occur when linearly referenced data need to be converted to 2D coordinates for analytical purposes such as finding all crashes within 2 miles of an intersection for all intersections in a jurisdiction (Figure 2-8b). • Transformation Type 3—transformation of one linear location reference to another linear location reference. This might occur if transportation features referenced in a legacy linear system need to be updated to a new lin- ear system or if more than one linear system exists within an organization and data need to be integrated across these systems (Figure 2-8c). Currently, locational references, regardless of the type of spatial reference system, are not reported with error. In terms of an error model, for the 2D case, assume a coordinate (x,y) with error such that the expression is as follows: (x + γX, y + γY) where γX, γY are the errors in the x, y values respectively. In the linear case, it is assumed that there is some error associated with the distance measure d. If A is the anchor

point or origin for a distance measure, then the expression for the distance measure with error is as follows: dA + eA where dA is the measured distance relative to the anchor point A and eA is the error associated the measurement. Using these error expressions, errors in the three main transformation types are illustrated in Figures 2-9a through 2-9c. Several variations of these three main transformation types are possible. Illustrations of some specific transformation cases for Types 1 and 3 are considered in the next section. 2.5.4.2 Transformation between GPS and LRS (Type 1) Transformation between GPS and LRS is an example of a roadway inventory project conducted by a contractor for a state DOT. In this example, coordinates of transportation fea- tures are captured using GPS and transformed to UTM or State Plane. Both road centerline and assets are measured with the same system (e.g., a stereo imaging system, which bases its locations on GPS). This is an example of the direct linear measurement case and, therefore, both road centerline and roadway features are of the same accuracy (<3 ft). The 34 state DOT wants the asset information converted to a linear reference system, so it is necessary to transform the GPS data. The contractor uses the following steps to accomplish this transformation (The steps are illustrated in Figure 2-10): 2D (x,y) coordinate is known, distance d is unknown and must be determined, route may or may not be known. Route and distance are known, 2D coordinate is unknown and must be determined. Route and distance are known in Linear System 1; route may be known but distance is unknown in Linear System 2. x,y d? ID - 104 X,Y ID - 104 Route # ? Distance ? x,y ?? d = 78 ID - 104 X,Y ? ID - 104 Route 101 Distance 78 distance 78 ID - 104 Route 101 Distance 78 ID - 104 Route 29 Distance ? Route 101 Route 29 d? (a) Transformation Type 1 (b) Transformation Type 2 (c) Transformation Type 3 Figure 2-8. Illustration of transformation types. A d∆ +e∆ x+γX’, y+γY’Transforms to d∆ +e∆’x+γX, y+γY Transforms to A Transforms to d∆ +e∆’ A d∆ +e∆ A (a) (b) (c) Figure 2-9. Examples of errors associated with the transformation types.

1. Compute the 3D (i.e., slope) distance of the road cen- terline, starting at the beginning of the road (there are certain rules defining the beginning and end mileposts, e.g., north–south). 2. Compute the mileposts of all intersections (i.e., log points and anchor points). This road centerline, together with the distance references of the log points, serves as the network. 3. Find the closest point on the road centerline of the road- way assets inventoried, then compute the milepost (i.e., distance from the start of the route) and the offset of the feature from the centerline. The result is two measures (distance [D] and offset [O]) for each transportation asset. This method also allows the positioning of linear features if the beginning and end points of the feature were measured. 2.5.4.3 Transformation Between Two Linear Reference Systems (Type 3) Often state DOTs have legacy data that are positioned using a form of linear referencing system. For instance, a DOT may have a road centerline network available that was digitized from geocoded aerial photos and of questionable accuracy; however, the data are consistent with all other GIS data layers in the DOT system. In many instances, the DOT may not want to add inventory points created with GPS, because they would not overlay with the legacy data, even if they are more accurate. One approach to this problem is to transform the transportation feature or event data captured 35 with GPS to a linear reference system (LRS-1) and then con- vert this system to the customer’s legacy linear reference sys- tem (LRS-2). This can be accomplished with the following steps as illustrated in Figure 2-11: 1. Compute the distance to all intersections and the end point of LRS-1 relative to the origin or starting point of the legacy road centerline LRS-2. 2. Take the transportation feature data referenced in LRS-1 and, using the anchor points (i.e., intersections) as ref- erence objects, squeeze or stretch the distances to match the measured distances of the legacy system, LRS-2. The desired transformations can be accom- plished with most standard GIS programs using dynamic segmentation routines. The result is the new feature inventory referenced to the old centerline. This allows the user to combine new roadway features with legacy data without having to change the exist- ing system completely. The opposite transformation also may occur where legacy data are transformed to a newer linear referencing system. A question of interest for DOTs may be whether there is a sig- nificant accuracy difference between converting new inven- tory features to a legacy linear referencing system and con- verting features referenced in the legacy system to a new linear system. Assuming that the same reference system was used, one approach would be to accept the data as is, without consid- ering the consistency of topology and the differences in the precision of the measurements and the resolution of the ref- erence methods. Another approach would be to use redun- dant information by comparing locations of identical events and to stretch or shrink the historical data set. In the latter case, the uncertainty information attached to (or assumed for) historical events would have to be transformed as well. Compute Distance (D) MP of Object Create GPS Centerline (CL) Compute Slope Distance Compute Distances of Nodes Intersection MP’s Geometry of LRS X, Y, Z GPS Locations Find Closest Point (CP) on Centerline Transportation Feature File Compute Offset (O) Distance Object to CP D,O Linear Referenced Locations Figure 2-10. Steps in transformation of data for GPS to LRS. Features linked to CL - 1 Create Feature File 1: D, O Run Dyn Seg Routine Feature File 1 on Geometry 2 Transportation Features Referenced in LRS – 1 Geometry LRS - 2 Road Centerlines (CL) 2 Create Geometry 2 File 1 Features Displayed on Map Defined by CL 2 Figure 2-11. Steps in transformation of data from LRS-1 to LRS-2.

2.5.5 Model Concept A formal approach in developing the error model considers all the components of uncertainty mentioned above, as well as transformation-specific properties. This includes recorded measurement precision, accuracy of the network, and issues of scale and resolution. Key issues in the model concern errors initiated in the measurement system followed by errors in the transformations between the different reference systems. A location and its associated uncertainties are the central objects of interest. The conceptual data-error model is designed to handle the uncertainties associated with the locations of transportation features and events present in transportation- related applications. A location can first be influenced by the definition of a fea- ture or event (e.g., a crash location can be seen as the loca- tion where the crash started or where involved cars stopped). Once a definition has been established, a transportation fea- ture or event is located in a specific reference system by a particular measurement method and measurement device and the level of uncertainty depends on the reference system characteristics. In a 2D (or 3D) reference system, the posi- tional characteristics of the error component will be dictated primarily by the precision of the measurement device. A lin- ear reference system is particularly prone to accumulating systematic errors. Additionally, in a linear system, the reso- lution and accuracy used to record the network, as well as the method and device used to acquire its location, affect the quality of the data. The goal is to isolate the location and formalize a more abstract model for the related parameters (i.e., feature or event, network, measurement system characteristics and dependen- cies). This approach allows a generalization of transforma- tion procedures and, thus, builds the basis for an error model formulation that will allow development of an uncertainty expression for a location. The three main components of the conceptual model define the input data, the desired output, and the requirements needed to achieve the desired outputs: • Inputs (or information that exists) – Reference systems (1D, 2D, and 3D) – Event data (1D, 2D, and 3D) including networks and event data – Associated accuracy or uncertainty information (e.g., measurement error) • Output (i.e., estimation of errors) – Estimates of errors associated with the transformation between reference systems – Estimates of errors in the combination of data from different dimensions (e.g., 1D network with 2D event, such as highway: pavement status with crash location) • Requirements (or processes to use the available infor- mation to obtain the desired outputs) – Knowledge of all involved reference systems 36 – Transformation methods between reference systems for events and their associated uncertainties – Means to combine uncertainties associated with dif- ferent data. These components define the structure of the data error model. This is an object-oriented concept, where the posi- tion of an event can be visualized as an object that depends on (a) an event, (b) a reference system, (c) a network, and (d) a measurement device or methodology. The focus of the error model is methods of transforming between reference systems and the associated uncertainties, as well as a means to combine uncertainties associated with different data. Fig- ures 2-12 and 2-13 illustrate the dependencies of the error model. Figure 2-12 shows the relationships among event, measurement method and device, and reference systems. Figure 2-13 shows the transformation between two referenc- ing systems. A measurement device (e.g., DMI versus photo- grammetry) or a measurement method (e.g., linear distanc- ing versus 2D measurements) introduces uncertainties. The level of uncertainty also depends on the reference system itself. For example, in a 2D reference system, the positional characteristics of the error component of the uncertainty of an event will be primarily dictated by the precision of the measurement device. A 1D reference system is particularly prone to accumulating systematic errors. Thus, the positional error component of an event depends increasingly on the sys- tematic errors inherent in an existing linear network. The transformation methodology dictates the transformation of associated uncertainties. The dependencies shown in Figure 2-12 can be compared with the model outlined in 20-27(3) data model (87). The given terminology, however, varies slightly. The essential parallels are that events are directly linked to a location and that the location is directly linked to a reference method and a reference system. The addition of the direct link to the infor- mation on the measurement device as well as the network (or, to be more precise, the uncertainty of the network) is an addi- tional requirement of the conceptual data error model. This outline, however, fulfills the purpose of enhancing the visual- ization of the uncertainty portion of the concept. It should be emphasized that access to information regarding the reference system, the network, the measurement method and device, as well as the event, is essential for an error model. The means of getting this information is secondary. For the purpose of retrieving this information, the data model described in Adams et al. (87) can be used as a basis. Additional objects (e.g., an uncertainty object), however, have to be introduced. 2.5.3.1 Model Formulation A mathematical formulation of a conceptual model that incorporates error into positional data can be written as follows: L = TL + EL1 + EL2 + . . . + ELk,

37 where L is a recorded location, TL is the true location, and the ELi are errors associated with a location from k different sources. Thus, each measured location is the sum of the true location and a number of error terms. Each of the error terms has an associated probability distribution that describes the likelihood of errors over its range of plausible values. A number of the error terms were outlined in Figure 2-5. The data-error model, however, can be simplified by com- bining the errors into a single term. The resulting statistical model is written as follows: L = TL + EL, where EL is the overall error term at location L. The proba- bility distribution of EL is determined by combining the prob- ability distributions of the individual error sources, which may be correlated. An extension to the generalized model (19) can be repre- sented by the modified location expression: LX = (LRM,LE,DX,EL). where LX is a linear location expression composed of linear referencing method (LRM), linear element (LE), and distance expression (DX). The additional term EL for the overall error term of the linear location expression LX can be specified by the probability distribution around the true location. This Reference System Measurement Method Measurement Device Location Reference Reference System Measurement Method Measurement Device Location Reference Linear Reference System Measurement Method Measurement Device Location Linear Referenced Event Network Network Event Event Figure 2-12. Data error model—combination of event and network. Reference System 2 Measurement Method Measurement Device Location 2 Reference Event (in reference system 2) Reference System 1 Measurement Method Measurement Device Location 1 Reference Event (in reference system 1) Uncertainty Event Event Uncertainty Figure 2-13. Data error model—transformation between referencing systems.

term, however, is not measured or given as the other terms of the location expression. Calculations are required to acquire an estimate of the associated error term EL. These models are general and can be applied to data col- lected using any location measurement technique and any referencing system. An appropriate model for each location measurement technique will be chosen to determine plausi- ble probability distributions for the error components and total data error. Transforming data from one referencing system to another will result in transformed locations that also have errors (i.e., errors propagated from the original measure- ments and errors introduced by the transformation process). These errors will be present, regardless of whether the user is converting from 2D or 3D data to 1D data or from a par- ticular dimension to the same dimension. A conceptual model for the errors associated with transformation can be written as follows: g(L) = g(TL) + FL, where g is the transformation function and FL is the error associated with the transformed location. As in the model for the errors in observed data, F will have a probability distri- bution that must be determined. The transformation function itself may have a systematic bias as a result of the transfor- mation or the referencing system. For simpler transformations (such as 1D to 1D), the prob- ability distribution of FL may be determined by mathemati- cal derivation, such as the “delta” method. However, if the transformation is more complex (e.g., across dimensions or using a map-matching algorithm), the probability distribu- tion of FL probably will need to be obtained by numerical methods. In formalizing the conceptual error model, an “uncertainty” object with knowledge or stored attributes of, for example, the resolution of the measurement system, active scale, and measurement error (or legacy if transformed) is added to the 20-27(3) data model (87). Possible uncertainty attributes are listed in Table 2-6. Modifications of the uncertainty object for different objects (e.g., geometric location, temporal time stamp, and network) are advisable. This would require imple- menting an uncertainty object, for example, for a stored spa- tial location, time stamp, measurement method, and network. It is optional to store one uncertainty object with all possible attributes and use it to store spatial, temporal, and network- specific uncertainties. The option chosen will be determined by the status of the currently implemented system (DOT- specific). For an existing system, it might be easier to imple- ment a single additional object, rather than to add attributes and functions to a multitude of existing objects. Furthermore, Table 2-6 is not necessarily a complete representation of all attributes and functions. Additional attributes and functions can be added as needed. 38 The functions of the uncertainty object use the informa- tion stored by the attributes of the object. The functions of the object help to communicate the inherent uncertainties. For example, one can visualize the uncertainty of a spatial event by presenting the probability zone around the speci- fied event. The transform function has several variations, one for each possible transformation with a corresponding metric that has knowledge of how the probability zone of an event (i.e., location error) has to be adjusted to reflect the performed transformation. The merge function accounts for the combination of two or more objects with properties of different reference systems. These can be events, events and a network, or networks. 2.5.6 Presentation of Positional Data Error Error can be represented by either a description or an error map. Hansen (88) noted that data quality standards on positional accuracy emphasize the accuracy of the coordi- nate values in the x, y, z plane. Error estimates with confi- dence intervals for these coordinate values are not explicitly described as elements, nor is the precision of the coordinate values delineated. Agumya et al. (83) noted that the primary concern that end-users have regarding uncertainty in data is its potential effect on their decisions. The intention of assessing fitness for use is to avoid the application of data whose uncertainty may cause unacceptable results. The traditional method to assess the acceptability or fitness of use—the standards-based method—compares data uncertainty with a set of standard methods that reflect the acceptability levels of uncertainty in the data (36). With this technique, fitness for use is assessed by directly comparing the quality elements of information against a set of standards that represent the corresponding acceptable quality components. To facilitate the comparison, the standards are defined using the same elements as those used for describing data quality. These may include scale (of the source document), root mean square error (RMSE), res- olution, percentage of correctly classified pixel (PCCP), cur- rency, and percentage completeness (83). Object: Uncertainty Resolution Active Scale Measurement Error Network Accuracy Topological Completeness Lineage (of Previous Transformations) Probability Zone Attributes Temporal Uncertainty Visualize Transform Functions Merge TABLE 2-6 Uncertainty object

The three main ways of presenting uncertainty associated with positional data for two-dimensional GIS are as follows: (1) a confidence region model based on a rigorous statistical model, (2) error band models derived from the error propaga- tion law in statistics and stochastic approaches, and (3) relia- bility of linear measures based on simulation and statistical techniques. Analytical and simulation techniques were used to investigate positional error. It was concluded that both techniques provided approximations of the error with identi- cal results. The simulation technique was found to be time- consuming compared with an analytical method (89). It is commonly assumed that a node is distributed within an ellipsoid, centered at its corresponding true location (90). Modeling positional error assumes that the error of each node is normally distributed within an error ellipsoid centered at its true location. GPS data positional accuracy is typically expressed in 2D or 3D, for example, in terms of circular error probability (CEP) or spherical error probability (SEP). Thus, it is necessary to transform these accuracy measures into 1D measures to make them comparable with linear feature dis- tance accuracy measures. As DOTs deploy GPS in positional data collection, the need for integrating GPS data into exist- ing LRS increases. As the GPS technology improves, accu- racy increases, although the data captured are not entirely error-free. Accuracy measures of GPS readings can be shown by the probability distributions of error (9), which involves identifying the location with a band of probable variations based on the error. The uncertainty model that has evolved can be defined as a stochastic process capable of generating a population of distorted versions of the same reality (such as a map), with each version being a sample from the same population. The traditional Gaussian model, where the mean of the popula- tion estimates the true value and the standard deviation is a measure of variation in the observations, is one approach to describing error. Nevertheless, the Gaussian model is global in nature and says nothing about the processes by which error may be accumulated (29). McGranaghan (92) discussed various techniques for dis- playing the uncertainty of the location of a spatial feature, the distinctness of boundaries, and the relative size of the fea- tures. Beard et al. (93) presented methods of using explor- atory data analysis in a spatial context where quantitative methods are not available. These methods illustrate the relia- bility in the classification of features based on the size of a feature. Hansen (20) noted that defining these spatial charac- teristics forms a basis from which one can begin to model error. This approach includes identifying the type of error distribution and methods of estimation for a spatial charac- teristic of a feature. Measurement-based systems develop error estimates derived from a normal distribution of error for repeated measurements and redundant measurements, which permits correction for distortions introduced by map projec- tions, the differences in actual elevation, and the spheroid sur- face to length and area measurements of survey data (94, 95). 39 Other spatial characteristics may require the use of another error distribution. 2.5.6.1 Conceptual Approach to Presentation of Error One of the primary objectives of this project is to develop an approach for presenting positional data error. The concept is to introduce probability zones around features (e.g., an event) to describe the uncertainty of their locations. The cal- culation of a probability zone is based on the measurement error as well as the resolution of the applied measurement sys- tem or the embedded reference system. The goal of the prob- abilistic approach is to assign n-dimensional probability zones immediately surrounding every n-dimensional measured fea- ture location. The size of each of these n-dimensional zones depends directly on two components: (1) the uncertainty arising from imprecise measurements expressed in impre- cision measures or derived inaccuracy values (e.g., ±5m) and (2) a user-selected probability threshold (e.g., 95 per- cent) that the true feature location is to be found within this probabilistic space. The basic idea is to transform, for example, an accuracy value of (x meters) into a statistical probability that a point can be found in its neighborhood based on the normalized normal distribution and the present resolution. Subsequently, each feature or event is assigned a probability space. The probability zones are confidence intervals indicating the con- fidence or the probability that a specific measured event is actually located within a given area. For example, a surveyed point location is known with a spatial accuracy measure of ±1 meter. Thus, one can assume, with a probability of about 68 percent that the actual location of this point is within a cir- cle of radius 1 meter. Assuming, however, that one would like to be 95 percent confident that the point is located within a specified area one would have to use a circular area with radius of 1.96 meters according to the normal distribution. Applying this principle, one can now translate error measures of ±x units into probability zones. Additionally, this allows one to overlay two such generated probability zones to gain information on the possibility (given in percentages of prob- ability) that two locations are congruent. Probability zones can either be binary and continuous. These are both indicators for the probability that a specific GIS feature is located within an estimated probabilistic area. These are described below. Binary Zone. In a binary probability zone, the GIS feature of interest is a subset of a single unit of the measurement sys- tem. In this case, the resolution of the measurement system dictates the resulting uncertainty values. The binary zone is a rather simple approach. The basic idea is to determine the probability that a sub area is selected. The term binary is assigned because no distinction is made as to what degree the sub area is selected. The possible result set is: {selected, not

selected}. Furthermore, this approach results in a single value for the entire measurement unit. Hence, it is termed the binary zone. Consider the following scenario: Assume that a feature (e.g., a parking lot measuring 10 m by 10 m) is smaller than the atomic unit of the measurement sys- tem (e.g., a 30 m by 30 m pixel) and is positioned somewhere within a specific unit (e.g., pixel x, x). If one chooses to walk to the real location of this unit (e.g., the 30 m by 30 m area) 100 times, how many times would one actually stand on the parking lot? The result can be obtained by simply calculating the percentage of the sub-area in comparison to the unit. Thus, it can be derived that one would stand approximately 11 out of the 100 times on the parking lot. In other words, the percentage indicates the probability that the subset of interest is selected (e.g., ∼11 percent). It is a measure for the degree of uncertainty that one actually selects the desired sub area. Or one could state that one can select the desired sub area with a probability or with a certainty of about 11 percent. The above example illustrates the case of 2D raster-based imagery. The measurement system also could be linear. For example, the police record a crash location based on the near- est milepost. In this case the feature extension would be about 75 yards; however, the resolution of the measurement system is based on 1-mile segments (or 1,028 yards). Thus, if one would visit the location based on the nearest milepost, the probability of standing somewhere within the 75 yards of the actual crash would be only 7.3 percent. This approach further assumes that the actual value of the sub area is known (e.g., one knows the size of the parking lot). This approach requires some sort of external information source or the implementation of one of the above-mentioned approaches such as discussed in Ehlschlaeger (96). An exten- sion to the binary zone can be applied for unions of mul- tiple atomic values. In this case, one would assign a new atomic value equal to the sum of all previous atomic values in the union. Continuous Zones. Continuous probability zones, on the other hand, are mostly independent of the resolution of the measurement system. In the continuous case, the decisive factor is the measurement error, or to be more precise, the resulting variance associated with a measured location. Con- tinuous zones can be calculated for any geometric object embedded in n-dimensional space. Shi (97) provided generic derivations for the geometric objects of a point, line segment, and line. The model assumes that measured locations (Xn) of an n-dimensional feature are based on a normal distribution with variance (σ2) around the true location (µn). Furthermore, Shi (97) describes the calculation of confidence intervals based on the confidence level itself, the geometric feature (e.g., point and line), and the n-dimensionality of space. The confidence intervals are based on the χ2-distribution, where the probability that the measured point location is within the tabulated distance of the true point location can be tested. The approach used in this project makes two adjustments to the general approach discussed by Shi (97). First, it is 40 assumed that equality exists among the n variances associ- ated with each of the cardinal directions, resulting in Equa- tions 1 and 2: (1) (2) Usually, a single accuracy value (i.e., σ2) is provided, if at all. The requirement of explicit specifications for the variances in all cardinal directions is the ideal scenario; however, it is unlikely to be found in practical applications. Equation 3 is a general descriptor for the spatial extent of the probability zone around an n-dimensional point location: (3) Similarly, for a link node system we can derive Equation 4 (4) In the second modification, several layers of probability zones are generated, rather than a single zone, allowing for a more detailed representation of the validity for the subsequent discussion on the combination of two or more features. For any GIS feature in an n-dimensional space, the probability zones can be calculated based on preset confidence levels, for exam- ple, PZ .75, for the 75 percent probability zone, to PZ.99, for the 99 percent probability zone. Each of the probability zones cov- ers the continuous space immediately adjacent to its neigh- boring zones. This principle is best explained by using an example. In the case of an n-dimensional point location, one can calculate the probability zones in the following manner: for example, PZ .75: which indi-dPZ.75 = ± ⋅ −σ χ1 0 252;( . )/ ,n n Measured location X X X N r r r r r r r r X nr n n n = − + − + − + − +                         1 2 11 12 21 22 1 2 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 r r M M ~ ( ) ( ) ( ) , (( ) ) µ µ µ µ µ µ σ σ σ with an interval of i ni i ni ni i n n b x X b b r r − ≤ ≤ + = − + ∗ − +with σ χ γ2 12 2 21;( ) / (( ) ) P True location Confidence space X b x X b b i ni i ni ni i n n ( ) ;( ) / ∈ > − ≤ ≤ + = − + with γ σ χ γ1 12 Measured location X X X N n n n =                         1 2 1 2 2 2 2 0 0 0 0 0 0 M M ~ , µ µ µ σ σ σ σ σ σ σ2 2 2 2= = =x y nK

cates that the true point location lies, with a probability of 75 percent, within the zone outlined by the shape formed at a distance dPZ .75. For the 2D scenario, this would result in a square with dimensions dPZ .75 by dPZ.75 with the point fea- ture located at the intersection of the diagonals of the square. The distance dPZ from the point in each of the cardinal directions x . . . n and consequently the intervals for the prob- ability zones PZ are as follows: Figure 2-14 depicts an example of a 1D point feature. The left side shows a single probability zone at the 75 percent confidence interval; the right side shows multiple probability zones according to the previous example (dPZ .75 to dPZ.99). As noted in Figure 2-14, the width or radius of a probability zone increases as the distance from the measured location increases, keeping in mind that the gained probability increase is constant (with the exception of the last interval where it is decreased). This makes the gain of additional confidence at higher confi- dence levels rather costly because of exponentially increasing the borders of the area of uncertainty. Subdividing the probability zones in such a way helps to describe the different stages of confidence levels. Another advantage of this procedure is the more detailed gain of con- fidence per unit (e.g., linear distance or square units for the 2D case) information, which is desired as outlined in the sub- sequent discussion on the combination of two or more fea- tures. Table 2-7 illustrates a comparison of gained confidence − + = ⋅( ) − − + + = ⋅( ) − − + + − − + + − − + − − dPZ to dPZ dPZ to dPZ and dPZ to dPZ dPZ to dPZ and dPZ to dPZ dPZ to dPZ and dPZ to dPZ dPZ to dPZ and dPZ .80 . . ;( . )/ . . . ;( . )/ . . . . . . . . . . . 75 75 1 0 25 2 80 75 75 1 0 2 2 85 80 80 85 90 85 85 90 95 90 90 σ χ σ χ n n n n to dPZ dPZ to dPZ and dPZ to dPZ .99 + − − + + = ⋅( ) − . . . . ;( . )/ 95 999 95 95 1 0 01 2σ χ n n 41 versus units added to the uncertainty interval for a point fea- ture. In Table 2-7, for the 2D point feature, a 75 percent con- fidence interval is equal to an area of uncertainty of 1.32 σ square units. The increase from a 95 percent to a 99 percent confidence interval is more costly, i.e., a gain of 4 percent confidence costs an additional 2.97 σ square units of uncer- tainty area. 2.5.6.2 Combination of Two or More Features This section discusses the combination of two or more fea- tures resulting in an estimate for the probability of congru- ency. In other words, this approach calculates the probabil- ity that, for example, two points are identical or that two lines intersect. This section introduces the principle using the example of a test of congruency for two 2D point features. Figure 2-15 illustrates the two measured point locations (P1 and P2) along with their confidence intervals. For illustrative purposes, the number of probability zones is reduced to two for each of the point locations. The two point locations have associated standard deviations of σ1 and σ2 (with σ2 < σ1), respectively. The inner probability zone is a PZ .75 and the outer one a PZ .95. For simplicity, both probability zones of Point 2 are located completely within one zone of Point 1. To calculate the probability that the two points are con- gruent, one needs to calculate the probability that the true point locations of Point 1 and Point 2, respectively, are in Area A and Area B (see shaded areas in Figure 2-15). First, calculate the probability that the true location of P2 is within 80% 85% 95% Measured location 75% 90% 99% Figure 2-14. Probability zones of a one-dimensional point feature. Probability Zone One-Dimensional Point Two-Dimensional Point PZ .75 2.30 σ 1.32 σ PZ .80 0.26 σ 0.32 σ PZ .85 0.32 σ 0.43 σ PZ .90 0.41 σ 0.64 σ PZ .95 0.63 σ 1.13 σ PZ .99 1.23 σ 2.79 σ TABLE 2-7 Confidence versus units added to the uncertainty interval for a point feature A B P1 P2 C Figure 2-15. Two point locations along with their confidence intervals.

Area A or B. The probability that P2 is in Area A is P2A = 75% and that it is in Area B is P2B = 95% − 75% = 20%. For Point P1, calculate the portion of Area C (there is a 20 per- cent probability that P1 is located in C) covered by Area A and Area B. The probability that the true location of Point P1 is within area A can be derived as P1A = C/A  20%. Simi- larly, the probability that Point P1 is within Area B can be surmised as P1B = C/B  20%. Having derived the probabili- ties that each of the two point locations is in Area A or in Area B, the probability that both events occur can be calcu- lated. The probability that both true point locations (i.e., P1 and P2) are positioned within Area A is PA(P1  P2) = P1A ⋅ P2A. Similarly, for Area B, PB (P1  P2) = P1B  P2B. The probability that Point P1 and Point P2 are congruent can be calculated as P(PA  PB) = PA + PB. 2.5.7 Prototype Data Error Model This section describes the prototype error model and illus- trates its application. The prototype model essentially repre- sents the uncertainty object. Fundamentally, this is a metadata reporting approach, which contains uncertainty information about the event and its history. The prototype model is designed as a stand-alone product where all the required input informa- tion has to be provided by the user. For a full implementation into existing transportation databases this information can be retrieved from existing data. The challenge in developing a data error model stems from two facts: there are a multitude of data sources (e.g., photogrammetry and distance measur- ing instruments) and multiple reference systems in multiple dimensions. These facts were considered in developing the prototype error model to meet the two main goals of estimating the uncertainty (1) associated with data collection, network, and referenced features; and (2) in combining different data sources (e.g., 2D network with 1D event digitized road loca- tion with crash site). The first task is to find a common descriptor for positional uncertainty inherent in the spatial data specific to transportation features. 2.5.7.1 Input Requirements This subsection describes the input requirements for the prototype model. The dependencies of a linear or point fea- ture, shown previously in Figure 2-12, are implemented in the prototype model and require user input for estimates of the standard deviations of each component. For example, the measurement method of recording a crash site on a highway could be by a handheld GPS or by measuring the distance to the nearest milepost via the odometer in a police car. Each method, however, has a known standard deviation, which is used to estimate the associated probability zones. In the lat- 42 ter case, network errors of the milepost system are also a required input for the prototype. Figure 2-16 shows the required inputs for the uncertainty object and the relationship between the measured object and the sources of error. As noted above, the prototype is essen- tially an implementation of the uncertainty object itself. Instead of retrieving the necessary input (right side of Figure 2-16) from the system, the user is asked to provide these data. To avoid crowding the presentation with too much infor- mation, each point feature and each network have an indi- vidually associated raster. For example, if the input consists of point features and one network of 15 link nodes, the uncer- tainties of this system are stored in two individual layers (i.e., one for the point feature and one for the network). Each inter- section chosen adds raster maps. Functions within the uncertainty object are used to calcu- late uncertainties and their propagation and then store indi- vidual uncertainties. Specifically, the prototype requires the following inputs: • Type of feature (i.e., point or line). • Spatial locations of events and link nodes (i.e., coordi- nates of the events, such as line event and start- and end-coordinates). In a situation where two lines inter- sect, for example, the coordinates of the start and end points of each line will be required. • Estimated imprecision of relative or absolute network errors (i.e., estimates of level of precision of how the event was measured). • Resolution of measurement system (i.e., resolution of the measuring devices and referencing system used). • Precision of event measurements (e.g., estimated preci- sion in locating a crash site). • Extent or description of the event (e.g., crash site versus business sign). Depending on the feature or event of interest, the error value would reflect the estimated precision of the network, resolu- tion of the measurement system, and/or systematic network error. For example, for a 2D line event, the error value depicts the estimated imprecision and resolution associated with the development of the line or network (or base map) from which Uncertainty object Retrieves and compiles uncertainty information Measurement error Resolution of the reference system Systematic network errors Linear feature: Accident location Figure 2-16. Detailed input requirements.

the line was derived. In such a case, the errors associated with digitizing the map or converting aerial photographs into maps will be the input into the error program. On the other hand, for a point event, the error value reflects the precision of the mea- suring instrument or method. The extent of the event is just a descriptor for the event under consideration. As discussed in Section 2.5.4, in order to visualize a 1D error in 2D, information on the linear distance from a known point is required. The delta method (where the error in the direction of the line as well as perpendicular to the line) is used. The error value in the direction of the line can be cal- culated using the chi-squared table, while the width of the line will be used to represent the width or the error. To illustrate the application of the prototype error model, consider a simple example of crash location data. In this exam- ple, the prototype model was used to estimate and display the combined errors associated with recording a crash site located on a highway segment. The crash site was recorded by refer- encing the nearest milepost (i.e., the measurement method could be handheld GPS or measuring the distance to the near- est milepost via the odometer in a police car) and the road network was digitized from aerial photographs, which requires a transformation from one system to the other. The uncertainty of the linear feature is transformed into 2D space. In applying the model, it is assumed that each measuring method has a 43 known standard deviation of measurement error, which is used to estimate the associated probability zones. The model was used to estimate the combined errors from line and point events. 2.5.7.2 Calculations First, the probability zones associated with each feature are calculated and stored in separate geo-referenced Arc ASCII rasters. The zones are based on assumed uniformly distributed intervals of the χ2-distribution. The zones rep- resented in the prototype are 0–75, 75–85, 85–90, 90–95, and 95-90 percent significance intervals of the associated χ2-distributions. Thus, each spatial location (added by the user) has five discrete buffers around it, where the closest represents 75 percent probability that the true point location is within the buffer, and the second through the fifth each represents an additional 5 percent. Each pixel in those three zones receives a proportional probability that the location is exactly in that pixel according to the assumption of uni- formity within each zone. For example, the uncertainty of a crash site that was linearly referenced with an indepen- dently produced 2D representation of the same road can now be combined. Assume that the crash site was recorded by referencing the nearest milepost and that the road Key: • Yellow line = road (the true location of the line event) • Yellow circle = linear referenced point event (crash site) • Red rectangles (at the ends) = 2D uncertainty of the road centerline • Red middle portion = extent to which one can visualize error in 2D (this is the result of transformation from 1D to 2D) • Blue trapezoid = linear referenced error visualized in 2D Figure 2-17. Combination of linear feature and 2D network. Key: • Yellow circle in purple square = location of business event (e.g., sign post) • Purple square = business event in 2D (event that is not referenced to the road) • Intersection between the blue and pink areas = the chance that the 2D event is actually on the road (i.e., the data quality of a 2D event with a linear referenced event) Figure 2-18. Intersection of linear event with independent 2D event.

network was digitized from aerial photographs. In the pro- totype model, the uncertainty of the linear feature is trans- formed into 2D space. Second, the program combines the different uncertainty zones. Following from the example under consideration, one can now combine the uncertainty of a crash site that was lin- early referenced with an independently produced 2D repre- sentation of the same road. Figure 2-17 shows a sketch of the outcome after the two features are combined. In this specific example, the 2D uncertainty zone is based on the following: (1) the width is based on the linear feature’s net- work error, resolution, and measurement error; and (2) the height is based on the digitization error of the 2D network. The next step is to calculate the probability that the lin- ear event feature (which now has an associated 2D uncer- tainty) intersects with a 2D point location. As can be seen in 44 Figure 2-18, it is not necessary to assume that this point location has to be related to the 2D road network. Based on the individually stored uncertainty layers, one can now calculate the intersection of these layers by inter- secting the probability of each pixel in each associated raster layer. The result is stored in a new Arc ASCII raster. The prototype error model is encapsulated into a software program called GISError developed in Visual Basic program- ming language with graphic user interfaces (GUI). The GUI facilitates data input and visualization of outputs. A user guide for the application of the prototype error model is included in Appendix A of this report. The program has a feature that allows results from the error analysis to be exported to other GIS application software. The prototype model was applied to case study data obtained from the Ohio DOT. The results of the case study are presented in the next chapter.

Next: Chapter 3 - Interpretation, Appraisal, and Applications »
Quality and Accuracy of Positional Data in Transportation Get This Book
×
 Quality and Accuracy of Positional Data in Transportation
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s National Cooperative Highway Research Program (NCHRP) Report 506: Quality and Accuracy of Positional Data in Transportation presents guidance for practitioners on the use of positional, or spatial, data in Geographic Information Systems for transportation applications.

Supporting Software-GIS Error Model

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!