National Academies Press: OpenBook

Cell Phone Location Data for Travel Behavior Analysis (2018)

Chapter: Chapter 9 - Guidelines for Practitioners

« Previous: Chapter 8 - Model Comparison: Origin Destination Trips
Page 110
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 110
Page 111
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 111
Page 112
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 112
Page 113
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 113
Page 114
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 114
Page 115
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 115
Page 116
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 116
Page 117
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 117
Page 118
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 118
Page 119
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 119
Page 120
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 120
Page 121
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 121
Page 122
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 122
Page 123
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 123
Page 124
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 124
Page 125
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 125
Page 126
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 126
Page 127
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 127
Page 128
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 128
Page 129
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 129
Page 130
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 130
Page 131
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 131
Page 132
Suggested Citation:"Chapter 9 - Guidelines for Practitioners." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 132

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

110 9.1 Roadmap to the Chapter Chapters 4 through 8 highlighted three-way comparisons between survey data, traditional models, and call detail record (CDR)-derived estimates to help bridge the gap between research and practice. This chapter summarizes key considerations about the potential uses of CDR data and provides guidelines for practitioners of planning and modeling. Specifically, the chapter focuses on the questions practitioners typically ask about data and models to shed more light on the potential value of cell phone CDR locational data. To extract the maximum value from cell phone data for planning and modeling purposes, practitioners can consider the following general principles: • Be aware of the underlying assumptions that are made to process cell phone data to determine locations and to infer activities and purposes. Vendors are likely to make inferences that are similar to the method documented in this report to process CDR data and develop origin– destination (O-D) tables. • Recognize that results from traditional surveys and models are also built on different sets of assumptions. Although transportation practitioners are more familiar with these methods, traditional survey and model results do not necessarily provide a true ground truth baseline either. • Expect that with the increase in the quantity and quality of cell phone data and the use of pow- erful machine learning algorithms, vendors will improve their analytical methods to analyze locational data to infer travel patterns. As new products are developed, practitioners need to continue asking typical questions about the underlying data and assumptions used. • Appreciate the uncertainty underlying both CDR estimates and traditional measures of travel patterns. Although it is not easy to quantify uncertainty, practitioners should be open to using ranges of estimates for both new and traditional data sources. The practitioner guidelines presented here are grouped into three categories: • Administrative considerations (Section 9.2); • Data considerations, juxtaposing CDR data with traditional data sources (Section 9.3); and • Modeling considerations and the potential of CDR data to support different model compo- nents (Section 9.4). The advent of technology and how the availability of more and better-quality CDR data, coupled with new research in locational data, may create a new generation of enhanced CDR data products is discussed in Section 9.5. The chapter concludes with a discussion of examples of recent literature that benefit from richer and more detailed data sources and the use of advanced analytics. C H A P T E R 9 Guidelines for Practitioners

Guidelines for Practitioners 111 9.2 Administrative Considerations The specification, purchase, and licensing of CDR data is a complex transaction with cost and legal implications. The terms of payment need to be considered in the financial planning undertaken by public agencies. The cost of such data may also exceed the financial limits set for incidental purchases. For example, a state department of transportation (DOT) may need to disclose plans for similar data purchases in advance as part of its State Planning and Research Work Programs. Similarly, a metropolitan planning organization (MPO) may need to disclose plans for purchas- ing CDR data in its Unified Planning Work Program. In the event that funds, contracts, or grants from other parties are used to finance such data purchases, then a pay-when-paid approach should be negotiated with the data vendor. This section summarizes the financial, legal, schedule, technical, and communication consid- erations that practitioners are likely to face. 9.2.1 Financial Considerations Agencies need to consider the financial implications of acquiring processed CDR data from third parties in their financial planning. Questions include the timing of the payments, the terms of the payments, and the delivery of available data to the agency. Agency staff need to develop detailed specifications for the data request. The parameters that may affect the quality and cost of the CDR data set and that need to be defined by the agency include the following: • Size of the data set; • Spatial coverage; • Duration of the observation period; • Desired geographic level of detail; • Types of travel, including purposes and resident-versus-visitor markets; • Temporal detail for time-of-day analyses; • Time period during which the data may be used; • Ability to refresh the data for one or more future years; • Single-use versus multiple-use data purchase; and • Ability to share the data with agency partners. 9.2.2 Legal Considerations Privacy considerations and proprietary methods have introduced complications in the use and dissemination of CDR data. It is safe to assume that, once purchased or licensed for a specific purpose, CDR data may not be used for any other purpose by a public agency.1 Furthermore, it is also reasonable to assume that contract language will regulate how these CDR data can be shared with agency partners, including consulting firms working for a public agency. Agency staff will probably need to address the following issues in a legal document such as a data-sharing agreement: • The potential uses of the CDR data, • The parties by whom these CDR data may be used or with whom they may be shared, and • The products that can be derived from the CDR data. 1 It is unlikely that the processed CDR O-D data can be purchased outright. The right to use the data will likely be licensed to public agencies for specific agreed-upon uses.

112 Cell Phone Location Data for Travel Behavior Analysis If the public agency does not have the signatory authority to enter into legal agreements, then legal staff of the signatory agency will need to enter into a legally binding agreement. An example of an MPO administered by a separate legal entity is the Greater Buffalo Niagara Regional Trans- portation Council, which is administered by the Niagara Frontier Transportation Authority. Agency staff should also recognize and address in a legal document the inherent potential conflicts between the confidential nature of CDR data and any Freedom of Information Act or other state legislation to which the public agency may be subjected. 9.2.3 Schedule Considerations The time required to obtain the CDR data may be longer than the agency’s experience with traditional travel and survey data sources, owing to the financial and legal requirements that were discussed. As part of the agency’s financial plans, agency staff should consider the potential contract negotiation delays in data acquisition. Agency staff should also recognize that the negotiations are likely to affect the project sched- ule. Care should be taken to adjust the period of performance of contracts by a third party, including other agencies, agency partners, and consulting firms that are expected to use the CDR data for project-related analyses. 9.2.4 Technical Considerations CDR data are different from data from household travel surveys and other data sources used to support planning analyses and model development. In this respect, CDR data have differ- ent strengths and weaknesses as compared with traditional data sources and may not conform exactly to a practitioner’s expectations or needs. The data considerations described in Section 9.3 and the modeling and analysis consid- erations discussed in Section 9.4 are accompanied by checklists of the types of questions that agency staff should think about and discuss when they are considering the purchase of CDR data. Agency staff should review and clearly communicate to the data vendor the agency’s data and analysis needs for planning and modeling purposes. Such a discussion helps the agency and the vendor to compare and contrast existing sources of data and model outputs with the cor- responding properties of the CDR-derived travel data. This discussion will also help ensure that the vendor data product is specified on the basis of the agency’s needs and that the CDR travel data provide value by supporting the agency’s planning and modeling applications. 9.2.5 Communications Considerations Given the complex nature of CDR data, it is also critical that agencies develop a clear and well-defined document that can be available to legislative bodies, news agencies, and informed citizens and that addresses the following considerations: • Statutory language that allows and regulates the CDR data purchase; • Description of the types of CDR data that are being collected and the purpose(s) of collecting and using these data; • The kinds of planning questions that these data are expected to answer; • The steps that have been taken to anonymize data and preserve privacy; • The spatial and temporal resolution of the data; and • Practices related to data access, retrieval, archiving, and deletion.

Guidelines for Practitioners 113 The design and availability of this document during the preliminary stages of project devel- opment will ensure that any privacy issues are dealt with in a comprehensive and transparent manner. 9.3 Data Considerations Chapters 6 through 8 discuss how CDR data and the O-D trip tables derived from them are similar to and different from travel survey results and model outputs. This section summarizes the key data features of CDR data to highlight their strengths and weaknesses. In addition to the 2010 CDR data that were used in this research effort and the 2015 vendor CDR data, the discussion in this section includes GPS logger and Bluetooth device data col- lection technologies that were used in a recent Transportation Model Improvement Program report (Hard et al. 2016). The discussion also addresses the features and potential of the smart- phone app data collection option. GPS logger data for personal or truck travel rely on the GPS functionality of a logger carried by a respondent as part of a household survey or embedded in a commercial vehicle as part of a freight survey. These GPS data are often combined with a follow-up telephone survey that obtains additional information and context about individual stops and activities. This method of enhancing the GPS data creates an integrated data source that benefits from both the contextual information and the detailed location data provided by GPS technology. Bluetooth-based data passively capture and identify the location of vehicles or devices by employing Bluetooth readers on corridors and locations of interest. Little contextual or socio- economic information is known about the user or the owner of the device. These data provide value in cases in which the analyst is interested in counts of vehicles along a facility. These data cannot be integrated with other sources to provide either socioeconomic or contextual information. Smartphone apps represent a new wave of technology that offers a new option for integrating passive and active collection of locational data to infer travel. Respondents who agree to par- ticipate in the survey provide socioeconomic data and can also identify locations that they visit often. They agree to be passively monitored by the smartphone app, which traces their daily travel through the GPS tracking option embedded in their cell phones. Respondents are asked to actively validate their travel through a prompted recall method in which they provide information about individual activities. A more sophisticated approach includes embedded machine learning software that makes inferences about the day’s activities and presents them to respondents, asking them to verify the inferences. Chapter 3 discussed how the various elements of travel obtained from traditional surveys are similar to and different from travel estimates obtained from the analysis of CDR data. This comparison clarified the strengths and weaknesses of CDR data as compared with traditional survey data. Table 3-3 is the key table, repeated here as Table 9-1, to highlight the contrast between traditional survey data and CDR data for key variables that are critical to planning and modeling analyses. Table 9-2 extends this comparison by highlighting how specific data properties differ across four technology options: CDR data, GPS loggers, smartphone apps, and Bluetooth devices. The entries in Table 9-2 correspond to elemental properties of data and include • Raw versus processed data records, • Spatial and temporal resolution of each data source, • Level of technology used, and

114 Cell Phone Location Data for Travel Behavior Analysis • Contextual information available in each data source to – Differentiate between commercial and passenger travel, – Identify activities and travel purposes, and – Use socioeconomic information to expand the sample. The data elements shown in Tables 9-1 and 9-2 can serve as a data checklist to help agency staff identify the specific features of CDR data, their ability to provide the required information, and their relative value compared with traditional surveys, GPS logger data, smartphone surveys, and Bluetooth data. The remainder of this section focuses on and briefly discusses each of the 11 data properties in Table 9-2. Key findings are summarized and specific recommendations are made for each individual data entry. Variable of Interest Travel Data from Traditional Surveys Travel Data Based on Cell Phone Use Total daily travel Self-reported in survey diaries. Travel may be underreported. Prompted recall offers an improvement. Passive cell signals over days may offer more robust metrics than surveys. Unit is device-trips rather than person-trips. Quality depends on CDR data density. Time of travel Self-reported in survey diaries. Times may be inaccurate and incomplete. Accurate time stamps. Need to infer activity and link it to the time stamp versus en route travel. Stops versus activities Self-reported in survey diaries. Detailed log of stops and activities. Good detail on all travel purposes. Need to infer stops, activities, segments. Nonwork purposes are difficult to infer. Location of activities Self-reported in survey diaries. Smart geocoding needed to match. Prompted recall offers an improvement. Difficult to infer the location of activities. A challenge in mixed land use areas. Travel purpose Self-reported in survey diaries. Prompted recall offers an improvement. Home and work locations are inferred. Poor inference on nonhome and nonwork. Joint travel Self-reported in survey diaries. Risk of underreporting. Prompted recall offers an improvement. Not feasible to record or capture. Mode of travel Self-reported in survey diaries. Good detail by tour and segment. Walk and bike trips may be underreported. Not readily inferred. Route assignment Not usually captured in surveys. Depends on trace data and algorithm. Tour generation Self-reported in detail in a survey. Analysis by using heuristics and rules. Data products do not include chains. Only aggregate trips are sold. Source: Cambridge Systematics, Inc. Table 9-1. Travel elements in traditional surveys and CDR data.

Guidelines for Practitioners 115 9.3.1 Raw CDR Data This report benefited from access to the raw, disaggregate, and anonymized 2010 CDR data that were used for research purposes. However, current thinking and legal privacy considerations make it unlikely that raw cell phone CDR data will be available to practitioners in the future. This is a key consideration for practitioners who are accustomed to traditional analysis methods that allow testing and experimentation with household travel survey data. This natural part of Data Property CDR Data Personal GPS- Derived Data Smartphone Survey Custom Bluetooth Data CDR data in raw form Raw data likely not available due to privacy concerns. Raw data are available to data analysts. Processed CDR data available to analyst Processing method is not known to analyst. Method can be shared with the analyst. Limited data processing is possible. Zonal size and spatial resolution Low spatial accuracy. Zone size and number of zones affect pricing. Spatial accuracy greater than CDR data. Spatial accuracy similar to personal GPS data. Data can be used to support corridor traffic analysis. External zones and external stations External travel may be obtained. Depends on survey methodology and participant travel. Yes, but depends on survey locations. Trip purpose Activities and purposes are inferred. Three purposes are available: HBW, HBO, and NHB. Detailed trip purposes through prompted recall. Not possible. Socioeconomics Not available. Available. Not available. Technology Advances in technology will yield more accurate data. More frequent data points. Greater spatial accuracy. Standardized technology. Potential to improve pulse rates versus battery life. Standardized technology. Time periods and temporal resolution Depends on cell utilization and interaction with network. Depends on level of interaction with network. Very detailed resolution. Possible to summarize data by time of day. Commercial and passenger travel Not possible to differentiate between vehicle classes. Able to differentiate between vehicle classes. Not possible to differentiate. Expansion of sample Expansion is driven by population and geography. No socioeconomic or market segment data. Vendor-driven methods are used. Customized expansion by socioeconomics and geographic detail. Expansion can be made to vehicle counts. Path traces Unreliable path traces. Infrequent transactions. Low spatial accuracy. Unreliable traces for slow data transaction rate. Very reliable path traces. Not possible. Source: Cambridge Systematics, Inc. Table 9-2. Properties of different locational-based data sources.

116 Cell Phone Location Data for Travel Behavior Analysis the model estimation and discovery process is anchored in the behavioral paradigm approach but is not feasible with CDR data. Agency staff will be able to specify their customized data requirements to CDR data vendors. However, access to processed CDR estimates instead of raw CDR data most likely will not allow them to • Test for themselves the sensitivity of different assumptions about the sampling of cell phone devices, • Use different criteria to select a CDR sample for estimation, • Weigh and expand the CDR sample to different control totals or account for the presence of different service providers in the marketplace; • Infer stops and activities from CDR traces; or • Construct CDR trip tables by purpose and by time of day in response to different assumptions. Agency staff can evaluate the processed CDR data indirectly by discussing with the data ven- dor the properties of CDR data for these specific assumptions. Table 9.2 can be used as a guide to clarify the processed and aggregated CDR data and to help address the strengths and weak- nesses of CDR data. 9.3.2 Processed CDR Data Innovative ideas are more likely to be embraced and adopted in practice after a high level of collaboration and vetting of ideas between academia, the industry, and planning agencies. Academic research to harness data from disruptive cell phone technology can be tested and verified by the industry and can then be more easily adopted by agencies as a proven method and product. Although business realities may prevent the sharing of proprietary methods by vendors, greater transparency of the black box will increase the value of CDR data to practitioners and planning agencies. Recognition of the strengths and weaknesses of CDR data, the analysis meth- ods used, and underlying assumptions will increase the industry’s confidence in cell phone data products. At the time of development of this report, only one data vendor was using CDR data to infer stops, activities, and a limited number of travel purposes before expanding the pro- cessed CDR sample to generate O-D trip tables. As discussed in Chapters 4 to 8, the vendor’s processed and aggregated 2015 trip tables are broadly comparable to the measures derived from the research team’s analysis of the raw 2010 CDR data. This suggests that CDR data vendors are most likely using methods and assumptions that are consistent with the Boston case study research. A key drawback for practitioners is that they need to rely on processed CDR data without the benefit of having access to the underlying raw data or the methods used to analyze them. As a result, practitioners implicitly have to accept the methods used for processing, expanding, and interpreting the raw data. Given that vendors may not provide enough methodological details because of proprietary considerations, practitioners need to ask specific questions about the CDR product to better understand its properties. Agency staff routinely ask data-related questions and challenge assumptions and methods used in traditional trip-based or advanced activity-based models. Agency staff and practitioners can better understand elements of the CDR black box by asking questions about • The spatial and temporal accuracy of the CDR data to ensure that the data resolution is appro- priate for the agency’s project needs; Raw CDR data will not be available because of privacy considerations. Vendors provide aggregate trip tables from processed CDR data.

Guidelines for Practitioners 117 • The population of cell phone users in the CDR sample and its representativeness of the popu- lation at large; • The incidence of different market segments in the sample of cell phone users and the method used to expand the sample; and • The methods used to detect home and work locations, stops, and activities that are then used to infer daily travel by purpose and time of day. Agency staff will develop more confidence in a vendor’s analyses and products if they are convinced that the processed CDR data generate results that are broadly consistent and compa- rable to the outcomes of traditional data sources and modeling methods. These questions can be addressed directly and definitively if • The underlying raw CDR data are available for analysis by the agency, • The assumptions and methods used by a CDR vendor are disclosed to a greater extent than is available today, or • Practitioners, academics, and vendors collaborate to analyze and test different assumptions by using raw CDR data to gain greater confidence in the final product and the methods used. If such options are not realistic, agency staff and practitioners need to use indirect ways to assess the quality of the O-D trip tables by engaging the vendor in a discussion about the under- lying specific assumptions and methods that drive the final product. The entries in Table 9-2 can serve as a data checklist with the types of questions typically asked during traditional data design in advance of model development, validation, and application. 9.3.3 Zonal Size and Spatial Resolution CDR data offer a wealth of spatial and temporal information, but the uncertainty about stay locations and activities has practical implications for the accuracy of travel data. Loca- tion inferences are made by triangulating between cell towers, which results in different degrees of spatial accuracy. In the case of the 2010 raw CDR data, the accuracy was as low as 300 meters. Practitioners are familiar with traditional surveys in which locations are provided by respon- dents but reporting errors may include missing an activity or not providing an accurate location for an activity. Data collection with GPS loggers or smartphones, aided by prompted recall and verification, provides full and direct reporting of locations compared with CDR data, for which locations need to be inferred. CDR spatial resolution may effectively preclude analysis at the traffic analysis zone (TAZ) level and require aggregation of existing TAZs. CDR-based trip tables at a more aggregate geo- graphic level are likely to provide results closer to those of traditional surveys and models than are trip tables at the TAZ level. This is not an unexpected finding, given that it is also true of models developed by using travel surveys. Modeled trips at an O-D level are often aggregated at a district level to compare them with other data sources. Agency staff need to decide on geographic coverage and the desired geographic detail, given the spatial accuracy of CDR data. Although CDR data may be purchased at a TAZ level, the data will need to be aggregated for most practical applications. An example of district aggregation is the traffic analysis district (TAD) which was developed after the 2010 Census in support of the Census Transportation Planning Products (CTPP). TADs are aggregates of select TAZs. When TAZs are delineated for a given area, TAD boundaries follow the outermost bound- aries of the TAZs they are intended to encompass, are contiguous, and do not extend into other areas. Locations in CDR data are inferred and may not be accurate.

118 Cell Phone Location Data for Travel Behavior Analysis Agencies can specify to the vendor the desired level of aggregation on the basis of CDR spatial properties, activity centers in the region, and the existence of other sources of data at comparable levels of aggregation. Given that vendors work with point-level data, they can provide custom- ized aggregations of zonal data based on a model’s TAZs, Census geographic boundaries, or an agency’s custom-built layers. When zonal data are being aggregated, it is recommended that a nesting structure within commonly used geographic layers be preserved for comparability with other data sources. Finally, the cost of the CDR data may depend not only on coverage but also on the number of zones and level of geographic detail. On balance, the research team believes that district-level zones similar to the TAD system are preferred to traditional urban area TAZs. The spatial accuracy of up to 300 meters that is present in the 2010 research CDR data prevented accurate analysis at a typical urban TAZ level. The use of the TAD system provides other data sources with which the CDR data can be compared. Agency staff evaluating cell phone–derived data need to weigh the effect of less-accurate activ- ity location data in relation to benefits such as lower costs and larger sample sizes. Questions to discuss with the vendors include • The cost of the CDR data purchase as a function of the model coverage, the size of individual zones, and the use of the CDR data for specific analyses; • The spatial accuracy of the CDR data to determine whether the model’s TAZ system or a more aggregate level needs to be used; and • The definition of a robust district-level zone system for CDR data that is consistent with the geography of Census data and other local databases. 9.3.4 External Zones and Stations CDR-derived O-D data are processed at the zonal level for the region under study. Zones available as trip ends in the CDR data will typically not include external stations that are not rep- resented as a model zone. Some minor additional processing will be needed to designate zones that correspond to current external stations. CDR data are available for a broader geographical area that includes zones both inside and outside of the model region. Agencies that need estimates for internal–external traffic and external–external traffic that passes through their region can define and purchase data for zones outside the regional model boundary. For agencies interested in traffic originating or destined outside their region from both a modeling and economic policy perspective, • CDR data provide a valuable tool for understanding total travel made by visitors whose inferred home address is outside the study region; • CDR nonresident data can augment current methods of collecting visitor data and may be packaged as part of the regional CDR data request; and • Both passenger and commercial vehicles will be captured in these estimates, but distinguish- ing the two segments will not be possible. 9.3.5 Trip Purpose A key difference between traditional and CDR data and methods is the ability to infer activi- ties and trip purposes. In traditional surveys, a sample of individuals and household members record their daily activities at a great level of detail. Survey data are then analyzed to infer travel at a similar level of activity and purpose detail for the entire population in a region. CDR trip tables are more robust at a more aggregate geography. Capturing travel by non residents can be of high value.

Guidelines for Practitioners 119 In contrast, CDR data are limited in the detail they offer for activities and purposes. As described in Chapters 5 and 6, the analysis of CDR data relies on heuristic rules and algorithms to infer home and work locations. The most frequently observed location for a cell device during the nighttime hours is assumed to be the home end, while the most frequently inferred CDR trip end during the daytime is assumed to be the place of work. However, CDR data are much weaker when it comes to nonwork travel, given that they can- not distinguish between different types of nonwork activities. CDR-derived trip purposes are defined by the land use activities at each trip end. Three trip purposes include home-based work (HBW), home-based other (HBO), and non-home-based (NHB) trips. Analysts need to accept that CDR data do not include detailed travel purposes and that HBO and NHB trips cannot be expanded to a wider range of travel purposes.2 Agency staff need to decide the following when considering a CDR data purchase: • Is a data set with three trip purposes adequate for the agency’s planning needs? – Traditional models offer a much more detailed set of trip purposes. – Activity-based models further link trips together into tours to better represent an indi- vidual’s and a household’s daily travel. • Are the relative magnitudes of CDR travel by purpose reasonable? – The home–home database must be small (almost zero). – The home–work and work–home matrices must be roughly comparable to the journey- to-work data. – The percentage of work and nonwork travel should be roughly comparable to that of past regional surveys and models. 9.3.6 Socioeconomic Data A key strength of CDR data is the large sample of locational data points in comparison to the small sample of traditional diary surveys that offer more depth and detail. A key weakness of CDR data is the lack of socioeconomic information that would provide the context for the travel patterns observed for the large sample of a region’s residents. By definition, this weakness limits an agency’s understanding of differences in the travel behavior across market segments in the region. Agency staff need to accept this key weakness of CDR data and its corresponding effects on sample expansion, market segmentation, household travel patterns, and overall resolution of the underlying data. These effects, which are discussed in other sections, include the following: • The unit of the analysis is the cell phone device instead of the individual. • Although two or more devices belong to the same household, their trips are not connected, and household interactions are not taken into account. • The sample expansion methods for CDR data are simpler and are based on the population of cell phone subscribers at the home location. • The models and travel patterns inferred from CDR data cannot reflect differences in travel behavior by market segment. 9.3.7 Technology of CDR Data As described in Chapters 3 through 5, CDR data currently include transmissions that cor- respond to telephone calls made or received, incoming and outgoing text messages, and access CDR trip purposes are limited and they are inferred. Lack of socio- economic data limits the value of CDR travel patterns. 2 Academic research that is under way aims to address this weakness, which is more difficult to overcome in urban areas with many mixed land use parcels.

120 Cell Phone Location Data for Travel Behavior Analysis to the web. Smartphone devices now include active and passive data transmissions, such as podcast or music downloads, e-mail updates, data on maps and directions, and the use of various apps. The 2010 raw CDR research data set used in this analysis and the 2015 aggregate cell phone data set purchased from a vendor include time stamp and location for every instance of phone use in the service network. This includes information about location every time a phone call is made or received, a text message is sent or received, or data are accessed on the device. Agency staff should confirm that the CDR data used by a vendor include at least the level of active and passive trans- missions reflecting calls, texts, and Internet data access as was used in the 2010 CDR case study. The inclusion in a CDR sample of smartphones that use the 4G-LTE spectrum will increase the frequency of device sightings and will provide analysts and machine learning algorithms with more data with which to make inferences about travel. The quality and representativeness of CDR data also depend on the cell phone service provid- ers from whom the CDR data are obtained, their market share in a region, and the data plans that they offer, such as unlimited data transmission. A CDR data set that uses data from multiple vendors, records all types of transmissions, and uses the latest technology will yield a richer set of signals that are transmitted, recorded, and processed. Agency staff need to be comfortable with the coverage of the cell phone data, the sample of CDR data in the region, the quality of the CDR data used, and the representativeness of the cell phone sample for travel by the region’s residents. The relevant questions to discuss with vendors to address these objectives include the following: • Who are the cell phone service providers in each region? • What is the market share of each provider? • Which cell phone service providers are included in the vendor’s sample? • Are there distinct market segments and parts of the region where specific cell phone service providers have a greater presence? • Are there important differences in the socioeconomic and usage profile of the markets served by each cell phone service provider? • Are calls, texts, and Internet data access recorded as part of the captured transmissions? • Do these transmissions account for both active and passive signals? • What cell phone technologies are captured in the vendor’s CDR data? • Are the more frequent 4G-LTE technology signals used by new smartphones captured in the vendor data? From a practitioner’s perspective, the ideal CDR data product would be based on a sample from two or more cell phone service providers that accounts for the majority of cell phone users and is representative of the region’s population. The CDR data should account for active and passive transmissions of calls, text messages, and Internet access and should reflect the prevailing technology used on the cell phones of most subscribers. A product that meets these conditions would benefit from the higher quality and greater quantity of cell phone signals. A sample of cell phone users that is a representative sample of the population would result in a representative and rich sample of locational data that would allow the analyst to draw better travel inferences. 9.3.8 Time Periods and Temporal Resolution The 2010 CDR data processed and presented in this case study allowed the inference of a large number of trip ends at a high level of temporal resolution. These CDR data can be grouped in Inference of trip ends depends on cell phone technologies captured. Capturing more passive signals improves the data for travel analysis.

Guidelines for Practitioners 121 different ways without any major limitation on the number of time periods or the duration of each time period. The location and timing of the inferred CDR trip ends are processed, aggregated, and expanded by the vendor to preserve the confidentiality of individual subscribers. Although existing prod- ucts may use a default time period that is not suitable to an agency’s needs, there is no practical limitation in grouping the data using different time period definitions. Agency staff can benefit in the following ways from the scale and detail of the time-of-day information that is provided by CDR data: • Flexibility in defining the time periods best suited to an agency’s purposes allows time-of-day summaries at the desired level of temporal resolution. • Although mode-specific information is not available, time-of-day profiles reflect fluctuations in the observed total demand for regional travel. • The large sample size of CDR data allows for detailed time-of-day comparisons, as follows: – Peak and shoulder peak period traffic during a specific weekday; – Late night and early morning traffic patterns not captured accurately by traditional models; – Peaking patterns by day of the week and during the weekend; and – Changes in peaking patterns observed by time of the year, as a result of weather or other special events, and in response to recurring or incident congestion. 9.3.9 Commercial and Passenger Travel In recent years, there has been increased emphasis on commercial travel and freight flows in urban areas and regional corridors and at the state level. Sources of commercial traffic data used in freight and truck analyses include commodity flow surveys, networks and data from the freight analysis framework, and GPS trace data from the American Transportation Research Institute. In the case of CDR data, there is no identifying information to classify a cell phone device as being used for personal or commercial use. It is therefore not possible to classify the inferred trip end activities as serving a passenger or commercial purpose. As a result, CDR data provided as part of a vendor data set account for total cell use and reflect total travel in a region. Agency staff need to focus on alternative data sources to assess freight and truck traffic in a region. These sources of data include • Customized traditional surveys of truck drivers, establishments, rail and truck companies, and freight forwarders; • GPS data extracted from devices installed on trucks to track their movements; and • Smartphone surveys that target truck drivers and are used to measure commercial trip ends and O-D truck flows. 9.3.10 Expansion of the CDR Sample Traditional sample weighting and expansion techniques use detailed approaches by market segment that can account for household size, vehicle ownership, number of workers in a house- hold, and geography. The rationale is that residents’ inherent propensity to travel and their travel patterns differ across these market segments. Therefore, it is important to account for these dif- ferences by developing distinct weights during sample expansion. The expansion of CDR data is more simplistic by definition. Given that CDR data do not include socioeconomic data or contextual information, the analyst cannot differentiate devices The definition of the number and duration of time periods is flexible. Trip ends are inferred from all sampled devices.

122 Cell Phone Location Data for Travel Behavior Analysis across market segments. Therefore, the population of cell phone subscribers within the zone iden- tified as one’s home location is used as a simpler measure of expected CDR use and travel activity. Practitioners recognize that this approach does not explicitly account for differences in cell phone use and travel by each market segment in the CDR sample. Population-based weights are also not satisfactory for trips such as long-distance travel where the home and/or the work loca- tion are outside the model’s coverage. Agency staff should • Inquire about the details of the weighting method used to expand the CDR data to verify that a simpler, population-based expansion method is acceptable for their intended use of the CDR sample; • Ask which cell service provider data are included in the sample and what is the market share of the service provider in the region; and • Discuss whether subscribers of this service provider represent a random sample of the popula- tion or whether this provider serves market segments with distinct socioeconomic character- istics and cell phone usage profiles. 9.3.11 Travel Times and Path Traces Travel time along paths can be determined by CDR data typically discarded as part of the processing of O-D data. Travel times and detailed actual path traces along each O-D pair may be available for specific cell phone devices. However, these detailed data are not typically included in CDR data, given that they are masked during the aggregation and expansion of the origin and destination trip end data. Agency staff who are interested in detailed O-D trip times and path travel times can discuss the option of obtaining such data as part of the data specification: • Travel time estimates and O-D path traces are not reliable for shorter distances, where loca- tion errors affect their accuracy; and • GPS data provide an alternative source of data that is more likely to provide the desired level of travel time and speed detail by using a sample of vehicles in the network. 9.4 Modeling Considerations Regional transportation models vary in size, scope, and complexity across state DOTs and regional MPOs, which have different experiences and track records with the design, collec- tion, and analysis of travel data for planning and modeling purposes. In this context, CDR data have been evaluated by agencies as a potential source of data that could support different aspects of • Model estimation, • Model validation, • Model updates for intermediate years between releases of Census data or between years with a major regional survey data collection, • Corridor studies and microsimulation analyses, • Special generator studies, • Assessment of visitor markets, and • Estimates of long-distance travel markets. Table 9-3 outlines each of these modeling options and discusses how each option can benefit from CDR data. The table also compares the value of CDR data with similar nontraditional data CDR expansion relies on the population of cell service subscribers at the home location.

Guidelines for Practitioners 123 Type of Model CDR Data Personal GPS Derived Data Smartphone Survey Custom Bluetooth Data Estimation of regional models No socioeconomic data. No detailed activity, purpose, mode and tour data. Spatial resolution can vary. All data needed to develop detailed regional travel demand models are captured. na Validation of regional models Aggregate validation for trip generation, trip distribution. Aggregate validation for trip generation, trip distribution, and possibly highway assignment. Detailed validation for all aspects of regional travel demand models. Validation for small corridors or locations with traffic counts. Model updates Documentation of changes in travel patterns. Measurement of changes in total travel flows. Identification of changes in travel flows by time of day. High costs for frequent large- scale data collection. Refresh is feasible with small sample & key travel markets. Estimates of changes in corridor-level traffic at different points in time can be used. Corridor and traffic impact studies Data at the corridor level. Spatial resolution may not be sufficient. Spatial resolution possible. Important to capture trip start and end locations. High cost of survey data for corridor- level studies. Estimates of traffic counts by time of day. Microsimulation studies Precise temporal resolution is an additional concern. GPS data are better suited than CDR data. High cost of survey data for microsimulation studies. Data can be used to support traffic analysis. Special generator studies Applicable to special generators and special events. Trip generation and trip-length estimates. Airports, universities, malls, and sports arenas. Data on socioeconomics, mode used, and trip purpose. High single-purpose survey cost. Traffic counts at special generators can be made. Visitor models Long-term study of aggregate movements of visitors to a region and within a region. Study of aggregate movement of visitors to and within a region. Difficult to target visitors. na Long-distance models CDR and GPS offer suitable data sources. Ability to monitor visitors and residents. Long-distance travel underrepresented in typical surveys. Long observation period is needed. na Source: Cambridge Systematics, Inc. Note: na = not applicable. Table 9-3. Modeling applications supported by CDR, GPS, smartphone survey, and Bluetooth data.

124 Cell Phone Location Data for Travel Behavior Analysis that can be obtained from GPS loggers, smartphone apps, and Bluetooth devices. The detailed discussion of modeling options can help agency staff prioritize their plans to use CDR and other nontraditional data to support, augment, or replace one or more model components. The modeling components in Table 9-3 are discussed in separate sections, each of which summa- rizes key observations and provides practitioners with specific recommendations for individual modeling options. 9.4.1 Estimation of Regional Models Traditional and activity-based regional models have been estimated and validated with detailed travel data from household travel surveys that also include person-level socioeconomic data. Traditional disaggregate models often account for travel patterns by market segment, while activity-based models provide more detail in accounting for tours and reflecting intrahousehold travel interactions. As discussed in Chapters 4 through 8, outputs of regional models, summaries from household travel surveys, and inferred O-D flows from CDR data are broadly comparable, especially at an aggregate level. However, CDR data require different analysis approaches. Although they benefit from a much larger sample size and accurate temporal resolution, CDR data are used to infer up to three trip purposes and have a lower spatial resolution for shorter trips. CDR data also do not take into account the effect of socioeconomic character- istics on daily travel, given that they are anonymized because of privacy considerations. As a result, CDR-based models provide considerably less context and depth than traditional and activity-based models. Agency staff need to recognize that CDR data cannot be used to estimate models at the same level of detail and resolution as traditional or activity-based models. The weaknesses of CDR data for model estimation can be summarized as follows: • Only work and nonwork travel purposes are inferred, in comparison with the more detailed purposes obtained in traditional and activity-based models. • Locations of activities are inferred by using heuristic rules and are subject to error, especially for shorter trips and for activities of short duration. • Socioeconomic data at the individual level are not available and cannot be used to estimate models that differentiate between market segments. • Estimates of regional travel obtained from CDR data are provided at the trip level instead of the tour or activity level. • Passenger and freight-related travel estimates cannot be distinguished, given that the unit of analysis is the cell phone device. Exploratory research is being conducted to generate synthetic populations using marketing data sets and to assign cell phone O-D data to these populations. In addition, academic research that is now under way focuses on inferring more travel purposes by using a new set of heuristic rules to infer purposes in more detail. 9.4.2 Validation of Regional Models Practitioners are always interested in independent sources of data that can be used to validate the outputs of travel demand models. Models are typically estimated and calibrated with detailed travel survey data. These estimates are validated by comparison with independent sources of travel flows such as the CTPP journey-to-work data, American Community Survey data and other Census estimates, and link-level measures such as highway traffic counts, transit ridership estimates, and O-D travel times. CDR data cannot be used to estimate models at the traditional level of detail and resolution.

Guidelines for Practitioners 125 CDR-derived travel and trip table estimates provide an alternative source of data for vali- dating individual model components of trip generation and distribution as well as estimates of travel by time of day. However, these CDR-derived estimates are not available by market segment such as income, auto availability, or household size. CDR validation data are also only available at the trip level and do not provide the ability to connect trips into tours. CDR data also do not provide information on modes used. Agency staff can evaluate the degree to which processed CDR-derived estimates and O-D trip tables can provide a valuable source of validation data that can be used independently to • Compare total trips produced and attracted, including both passenger and freight, without the ability to further differentiate by mode; • Provide estimates of O-D flows for home-based work, home-based other, and non-home-based travel purposes combined; • Generate trip-length distributions for the three CDR inferred purposes that could be used to calibrate trip distribution models; • Capture a greater level of temporal detail and generate O-D trip tables to validate models by time of day; and • Quantify external–internal travel and through travel to support model validation in addition to the typical internal travel in a region. 9.4.3 Model Updates Regional planning agencies do not update travel demand models often, primarily for reasons of data availability and cost considerations. Practical constraints include the need for updated socioeconomic and Census data, revised highway and transit networks, and up-to-date traffic counts and transit ridership data. Agencies that update or revalidate their models for an intermediate year rely on their existing model structure and use updated data to fine tune their model for new base year conditions. The motivation to update regional models is more easily justified in cases of significant socio- economic changes, the introduction of a new mode, major changes to highway or transit services, and new technologies that bring about major changes in travel behavior. CDR data can provide agency staff with updated O-D trip matrices to support more fre- quent model updates that do not require a major data collection effort. The framework of using CDR data for updating a model for an intermediate year would require the following steps: • CDR data can be purchased for the original base model year and compared with model out- puts to ensure that the two data sets provide broadly comparable travel behavior metrics. Comparisons may include: – Trip rates by geography, – Temporal distribution of trips, – Trip-length distributions, and – Share of total trips by purpose. • CDR data can be purchased for an intermediate year, assuming that the same method and underlying source of data are used. This intermediate year data set can be compared with the original base-year CDR data to quantify the magnitude and reasonableness of the observed change in travel flows. • The percentage of growth or decrease measured by the two CDR data sets can be applied to the results of the original base-year model to generate travel estimates and metrics for the intermediate year. CDR data offer an additional source for validation of model components. CDR data can provide information on changes in travel patterns in intermediate years.

126 Cell Phone Location Data for Travel Behavior Analysis • In addition to the CDR data, intermediate year traffic counts, transit ridership data, and socioeconomic data can be used to paint a complete picture of the updated base year travel patterns. 9.4.4 Corridor, Traffic Impact, and Microsimulation Studies CDR data can support corridor-level studies in cases of large, long, and well-defined corridors or parts of an urban area to allow meaningful estimates of travel within, to, and from the corridor or area boundaries. The format of CDR data limits their suitability for a detailed traffic assessment or microsimulation studies, which require a greater level of detail. CDR data can be used to generate synthetic O-D tables that can be further adjusted by using matrix adjustment methods to match base-year traffic counts. This approach may be a preferred alternative to generating subarea models. Analysts can then assign these synthetic O-D tables to the network to support corridor studies. Agency staff are aware of the following issues that affect corridor-level studies: • CDR data do not distinguish between passenger and commercial vehicle travel; • Lack of socioeconomic data does not allow analysts to assign economic benefits to different market segments; and • The spatial inaccuracies of CDR data may affect the validity of the analysis in cases in which the corridor is narrowly defined or in which there are multiple competing facilities in proximity. With regard to supporting traffic microsimulation studies and models, the requirements for detailed geographic and time period data are even greater, and path trace information becomes even more important. In such cases, the use of detailed local traffic counts, GPS data, and spe- cifically focused travel surveys provide the desired level of resolution, which is higher than that provided by CDR data. 9.4.5 Special Generator and Special Event Studies Special generators cause a lot of passenger and commercial activity that is not always cap- tured well by regional models. Typical special generators include airports, large malls, parks, ballparks or stadiums that have recurring sports events, ports with commercial activity, and universities. Typically, special surveys are conducted to capture activity at these locations. Models based on these surveys are relatively straightforward and capture travel by using broad aggregate measures. Such an aggregate modeling framework is suitable for CDR data. Data purchases can be struc- tured to include only those trips for which the special generator is either an origin or a desti- nation. Metrics such as trip rates, trip-length distributions, and temporal distributions can be obtained using CDR data. In addition, CDR data can be used to study other aspects of special generator travel, including • Seasonal variation in travel; • Before-and-after studies in cases in which the modes and the level of service to reach the spe- cial generator have changed; • Visitor travel to the special generator, for the purpose of understanding the effect of such locations in attracting out-of-town visitors; and • Contribution of a special generator to local travel, especially during peak hours. CDR data can provide input data for broadly defined corri- dors or parts of an urban area. Special genera- tor and special event studies can benefit from CDR data.

Guidelines for Practitioners 127 Agency staff need to specify the CDR data purchase, especially with respect to zone definition, and provide input to sample expansion: • The zone system needs to be outlined carefully. When specifying the special generator, agency staff need to account for the spatial inaccuracy of the CDR data and choose an appropriate buffer around the generator. • Agency staff must discuss with the vendor the weighting method used. The preferred way would be to develop weights for the entire database and then carve out the portion of the trip table for the special generator study. Agencies can also use independent counts to scale the CDR data further. Special event facilities generate activity over a few days each year and include concert halls, conference centers, and stadium facilities. CDR data can be suitable for special events too, but three points of note must be considered by agency staff: • Special events often attract nontypical crowds, and O-D patterns at the same location may vary across events. A sufficiently large sample within a broad time horizon may be needed to better capture average special event activity. • The temporal dimension of travel is determined by the time of year when the special event is being held. Again, obtaining CDR data for a broad time horizon will help mitigate any skewed travel patterns. • Finally, some special events may attract visitors from long distances. Establishing detailed external locations is vital in capturing special event travel patterns. 9.4.6 Visitor Models The ability to distinguish between residents and visitors is a strength of CDR data. Under- standing and quantifying travel by visitors whose home address is outside the study region is a valuable tool for urban areas from the perspective of modeling, planning, and economic policy. Agency staff can use CDR data to augment their current methods of collecting visitor travel data, especially if they are not interested in classifying or otherwise segmenting the visitor travel market. Capturing visitor data has traditionally been challenging within the context of travel demand models. Agencies typically intercept visitors and conduct specialized surveys at hotels, airports, rail stations, and other ports of entry. These surveys face numerous challenges: • Visitor surveys are typically costly and are often limited in scale, given that regions have mul- tiple ports of entry and visitors have many places to stay—a pattern exacerbated by new services like Airbnb. • Visitor profiles and their travel patterns tend to vary by time of the year. The scheduling of conferences, festivals, and sporting events affects travel and may require conducting a survey that spans several months and results in an expensive and time-consuming effort. Agency staff can use CDR data to study visitor travel patterns in a region by using the follow- ing framework: • The analyst defines the area of interest for studying visitor movements. • The data vendor identifies cell phone devices that travel within this region, but whose home location is outside the area. • The data vendor provides trip tables for these visitors over a specified time frame that could span several months or a whole year. 9.4.7 Long-Distance Models Long-distance travel is infrequent and is not adequately represented in most traditional sur- veys, which capture resident travel within a region over a 1- to 5-day observation period. As with Visitor travel patterns can be assessed with CDR data.

128 Cell Phone Location Data for Travel Behavior Analysis visitor models, CDR data can provide valuable information for modeling long-distance travel by a region’s residents. Agencies typically model long-distance travel by using broad classification schemes to identify differences in total trip making, destinations, and modes used within their population. Given the lack of socioeconomic information in CDR data, only a more aggregate long-distance model is feasible without the benefit of additional market segmentation. The considerations for agency staff are similar to those they have with visitor models: • The area of interest for studying long-distance movements needs to be defined. • Criteria such as “resident travels 50 miles beyond the study region” are defined to elimi- nate travel to the outer reaches of the study region that may happen on a more regular basis. • The time frame during which long-distance travel needs to be examined is defined and could span several months or a whole year. • The data vendor identifies residents’ cell phone devices that travel outside the specified geo- graphic region during the specified time frame. 9.5 Future Research Directions The collection and analysis of CDR locational data represent dynamic areas of data and research that can benefit transportation planning and modeling. Technological advancements, the changing patterns of cell phone use by segments of the population, and strong academic research efforts to harvest the value of locational data will continue to change the properties and potential value of CDR data. The interest in locational data is reflected in analytical methods and machine learning approaches aimed at using CDR locational data for transportation planning and various other purposes. The following have recently been seen: the emergence of new data collection methods that use smartphone devices, ongoing research to better infer travel purposes on the basis of land use data, and the fusion of and quilting together of different data sources. 9.5.1 Dynamic Nature of Cell Phone Data The collection and analysis of CDR locational data represent dynamic areas of data and research that can benefit transportation planning and modeling. Technological advancements, changing patterns of cell phone use by segments of the population, and strong academic research efforts to harvest the value of locational data will continue to change the properties and value of CDR data. This report focused on the properties and the analysis of cell phone CDR records from 2010 that were available to the research team and 2015 data provided by a vendor. The raw nature of the 2010 CDR data allowed the research team to analyze these data in detail and compare them with traditional transportation surveys and travel demand models for the Boston area. It is therefore important to put in context the 2010 CDR data compared with today’s cell phone technology and changing use of cell phone devices. The 2010 CDR data represent a period of less-intense use of cell phones for calls, text mes- saging, and Internet data access as compared with today. The increase in cell phone use over the past 7 years has resulted in an increase in the density of CDR signals and the amount of trace information. The availability and analysis of more and better-quality locational data during a typical day can produce more robust results about daily activities and travel. CDR data can support long- distance travel models.

Guidelines for Practitioners 129 The cost of cell phone service has also been decreasing over time, especially for text and data, with packages now often including unlimited data transmissions. Cell phones are becoming more of a necessity for many households, including those in lower-income market segments. Use of cell phones to access data has increased to the point that, for many users, their cell phone provides the principal means of accessing the Internet. In November 2016, the Pew Research Center reported that 77% of American adults owned a smartphone of some kind, up from 35% in the spring of 2011 (Pew Research Center 2017). The market penetration of smartphones is likely to increase further and thus increase the size of CDR databases and data records as compared with the research case study discussed in this report. Furthermore, as both active and passive data transmissions from smartphones increase, the cell phone data records available also will increase accordingly. As the density of CDR signals and locational data increases for every device, the ability to determine trip end locations and time of activities could also improve. In addition, all smartphone devices include a GPS location service. When this feature is enabled, the location of the device can be more precisely determined, thus improving the qual- ity of locational information and providing CDR vendors with an additional source for mining cell phone location data. • A 2015 Pew Research Center report on cell phone use titled The Smartphone Difference notes that smartphones already help users navigate their environment with real-time directions provided by the inbuilt GPS system. • As many as two-thirds of smartphone owners use their phone at least occasionally for turn- by-turn navigation while driving, with 31% saying that they do so “frequently.” • One in four respondents uses his or her phone at least occasionally to get public transit infor- mation, with 10% doing so “frequently.” • While Tables 9-2 and 9-3 make a distinction between data obtained from cell phones and data obtained from personal GPS devices, this distinction will become more blurred with GPS-enabled smartphones. • As more GPS-enabled devices are used during traveling, traces from CDR data will continue to increase and will provide more data points that should help improve an analyst’s ability to determine trip ends. The technology used to collect cell phone data records has been changing in a manner consistent with changes in the technology serving cell phones. In 2010, wireless industry data transmission standards were 1G and 2G transmissions. The industry standard today is 4G/LTE service, while many providers are advertising faster, enhanced service. Although it is unlikely that raw CDRs using these technologies will become available, practitioners should ask whether a vendor’s product with processed CDR data takes advantage of the latest technology. This becomes even more important if the new technology commands a large share of the cell phone marketplace. In addition to the technological methods used, the sources of CDR records available from vendors may be changing. The wireless and cell phone provider market is extremely competitive and is changing rapidly. Wireless providers who are used by a CDR vendor should be disclosed to ensure that the CDR data being processed are representative of the market for cell phones in the area for which the processed CDRs are being obtained. The research presented in this report indicates the potential to develop credible and useful travel information on O-D patterns from voluminous cell phone data. Much of that information has to be inferred and is limited in nature; the limitations include three travel purposes linked to home, work, and “other” locations; the lack of socioeconomic information; and the lack of infor- mation on the travel modes used. However, technological changes suggest that a greater amount of CDR data with higher resolution will be available in the future. With continuing research in Use of cell phones is increasing and yields more locational data. GPS-enabled smartphones yield higher resolution data. Monitoring of changes in technology and marketplace is needed for a representative sample.

130 Cell Phone Location Data for Travel Behavior Analysis inferring activities and trips by purpose, the increased volume and quality of CDR data make it easier to address data gaps. 9.5.2 Status of Relevant Research in Locational Data The growing penetration of cell phones among different age cohorts, the increased use of cell phones for Internet data access, and the improvements in the technology embedded in smart- phones provide opportunities to leverage locational data for use by transportation planners, modelers, and decision makers. The following sections briefly comment on the current status of some of the most relevant research activities in locational data and highlight their potential promise to improve the industry’s best practices in data, planning, and modeling methods. 9.5.2.1 Quilting of Diverse Data Sources The cell phone is a truly disruptive technology that has significantly changed the way people communicate, get information, and travel today. Cell phones are increasingly being used to hail cabs, build transit itineraries, plan travel by auto, and use bike-sharing modes. The question for planners and modelers is how to leverage these data to study accessibility, improve mobility, and provide a better quality of life. The strengths and weaknesses of new and traditional data sources suggest the need for continu- ing research to integrate data from diverse data sources and transportation user apps with socio- economic and land use data to provide a complete picture of travel. Such a quilting of data sources must be conducted with a data transfer protocol that is mindful of user privacy considerations. 9.5.2.2 Smartphone Apps for Data Collection A recent trend in survey data collection is the use of cell phones as the means for collecting data at the individual and household levels. The owner of a cell phone device is recruited and agrees that his or her travel will be monitored over a given period along with that of other mem- bers of the household. The methods used to validate the day’s travel differ across the different companies that have invested in this technology. Prompted recall methods are used to validate a specific activity or to verify a day’s travel patterns. Respondents are asked to edit the responses inferred or left blank by the software to provide a full day’s worth of travel information. Some of the methods and products rely on sophisticated machine learning algorithms to ben- efit from respondent entries and improve the data collected (Ghorpade et al. 2015). 9.5.2.3 Machine Learning Concepts Recent research has used CDR data and machine learning algorithms to annotate user activi- ties and fill gaps to reveal temporal activity profiles and transitioning between activities. This research focused on activity patterns; the socioeconomic, land use, and mode-specific informa- tion that is needed to provide the complete picture of travel behavior is still missing. Therefore, future research should consider evaluating and expanding these machine learning techniques to determine how these CDR data can provide a complete picture of travel behavior (Yin et al. 2016). 9.5.2.4 Consumer Data as Inputs to Models An example of ongoing research that aims to bridge the gap between traditional survey data and passive cell phone data is offered by a model developed for the Asheville region of North Carolina (Kressner et al. 2016). This research presents an approach to overcoming the limita- tions of passive location data. In addition to traditional National Household Travel Survey data and travel time data, this method uses consumer data that include much of the household and

Guidelines for Practitioners 131 individual socioeconomic information used in travel demand modeling. It builds a tour-based model with passive data by using a person-based discrete event simulation framework. A com- parison of assignment results and average link error with the trip-based model for Asheville showed the results of this innovative approach to be promising. 9.5.2.5 Streamlined Collection of Survey Data Localities, regions, and states currently use third-party tools to collect travel surveys. While these tools provide information that is relevant to planners and modelers, they sometimes require installation of an additional app that may require more than one touch by the user and thus potentially affect response rates. An alternative option is the use of inbuilt Google Maps location data that are available to each smartphone user. While Google Maps is installed by default in Android phones, iPhone devices require that Google Maps be installed as an independent app. Given the market penetration of Google Maps, it is reasonable to expect that it would be part of an iPhone user’s app environment. Given this ubiquity and its editing features, this tool can supplement travel surveys. However, research is needed on issues of sample design, response rates, cell phone operation system biases, and the effect on users who do not use smartphones. The promise of this technology is that it reduces costs for data collection, given that it already uses an existing app and, more impor- tantly, provides information on long-distance trips. The challenge is addressing biases resulting from lower smartphone penetration among the elderly and poor. 9.5.2.6 Machine Learning Versus Econometric Modeling One session at the 2017 annual meeting of the Transportation Research Board compared the benefits of traditional econometric methods with machine learning approaches. This session, titled “Machine Learning Is from Venus, Econometric Modeling Is from Mars: Two Different Travel Forecasting Perspectives,” highlighted many of the themes discussed in the literature and the practitioner considerations mentioned in this report. The discussion was moder- ated by David Ory, and the panel members included Josephine Kressner, Joel Freedman, Alexei Pozdnoukhov, and Eric Miller. Alexei Pozdnoukhov argued in favor of machine learning techniques. He reviewed a paper by Eric Miller presenting a tour-based mode choice model and critiqued its econometric approach. In his summary Pozdnoukhov pointed out that the model is applied and validated to the same data set; it does not apply outside the data sample; and it is probably more complicated than needed. Eric Miller argued in favor of econometric approaches. He reviewed a paper by Alexei Poz- dnoukhov that describes a way to supplement traditional household survey data with cell phone data. The approach uses input–output hidden Markov models to infer travelers’ activity patterns from their CDR records. In his summary, Miller pointed out that the model is also based on ran- dom utility; it does not use socioeconomic attributes; and it does not take into account household interactions. On the basis of the validation of activity durations by purpose and across space and of the assignment of trips to the network, the overall model performance seems very good. A key point during the panel discussion was that although socioeconomic characteristics may not be essential to match counts or predict future traffic, they are critical in understanding trav- eler behavior and addressing policy questions on subject areas such as toll roads, transit travel, and managed lanes. 9.5.2.7 The Promises of Big Data and Small Data for Travel Behavior In a 2016 review paper, Cynthia Chen and her colleagues discussed the potential for collabo- ration and sharing of cross-discipline ideas between transportation researchers who focus on

132 Cell Phone Location Data for Travel Behavior Analysis models of travel behavior and the computer scientists and physicists who use big data to address human mobility patterns (Chen et al. 2016). Chen et al. pointed out the “tension” between the traditional behavioral approach that aims to formulate and represent causality in travel behavior and the analysis of passive CDR data that aims to identify mobility patterns. The discussion cautions about the potential risk of ecological fallacy in CDR data, a consideration that was also present in aggregate travel demand models. The risk of ecological fallacy is that although a model mechanism can predict average and total regional travel conditions, it does not necessarily perform well at the individual level without an approach rooted in the behavioral paradigm and a conceptual framework that explains travel behavior at the individual traveler level. The conclusions and cautionary notes in Chen et al. reflect the line of thinking that is dis- cussed in this report. On the one hand, the CDR locational data have tremendous potential to provide travel-related data and estimates at the regional, state, and national levels. On the other hand, the potential of CDR data can only be enhanced by adopting a travel behavior perspec- tive and building on the collective experience of the field while opening up the black box to the community to reveal the assumptions underlying the data. 9.6 Epilogue This guidebook for transportation practitioners concludes with a few thoughts about the properties of cell phone data and the types of behavioral modeling approaches that will help shape best practices in the future. Practitioners evaluating their policy needs, data options, and modeling tools should be guided by the following principles: • Be aware of the underlying assumptions made to process cell phone data to determine loca- tions and to infer activities and purposes. • Recognize that results from traditional surveys and models are also built on different sets of assumptions and that ground truth is tough to establish. • Expect that increases in the quantity of CDR data, improvements in signal and CDR data quality, and the use of machine learning algorithms will improve methods for analyzing loca- tional data and inferring travel patterns. • Appreciate the uncertainty underlying both CDR estimates and the traditional data and mea- sures of travel patterns. • Use as a guide in your evaluation the conceptual framework that is based on the behavioral paradigm examining individuals’ travel behavior. The field’s collective experience, academic research over the years, and the collaborative approach linking research to practice have helped refine the data design process, spawned new and more sophisticated methods, and increased the understanding of travel behavior and its driv- ers. As new data and methods are introduced, the interpretation of their value and uses through a behavioral framework lens will help improve the state of the art in this profession and community. A transparent approach to the strengths and weaknesses of CDR and traditional data sources, along with a thoughtful approach that integrates new and old data and methods, will help refine the state of best practice. The approaches that will emerge may be different than those used today but will continue to have a behavioral foundation and will leverage the strengths of different data sources. This will help improve the field’s ability to make inferences about travel and will result in analysis methods that best harness the value of new and traditional data sources to interpret, quantify, and forecast travel behavior and mobility patterns.

Next: Glossary »
Cell Phone Location Data for Travel Behavior Analysis Get This Book
×
 Cell Phone Location Data for Travel Behavior Analysis
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Research Report 868: Cell Phone Location Data for Travel Behavior Analysis presents guidelines for transportation planners and travel modelers on how to evaluate the extent to which cell phone location data and associated products accurately depict travel. The report identifies whether and how these extensive data resources can be used to improve understanding of travel characteristics and the ability to model travel patterns and behavior more effectively. It also supports the evaluation of the strengths and weaknesses of anonymized call detail record locations from cell phone data. The report includes guidelines for transportation practitioners and agency staff with a vested interest in developing and applying new methods of capturing travel data from cell phones to enhance travel models.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!