Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
APPENDIX A Data Types and Sharing Attributes Transit Data Types and Attributes for Data Sharing Public transit data comes in many forms, with a variety of potential external uses if shared. Fare and Bank Card Data When passengers use smart cards or bank cards for boarding, as is becoming increasingly prevalent, it produces records of boardings or station entries and also introduces the possibility of tracking the use of a given card over time. Although this characteristic of the data raises privacy risks, it also enables transit agencies and researchers to understand more about how individual passengers use the network. Some researchers and agencies, including MBTA, Washington Metropolitan Area Transit Agency (WMATA), and New York City Transit Authority have applied methods to infer desti- nations of passengers within their networks based on AFC data combined with AVL data (Barry et al. 2002; Gordon et al. 2013). This originâdestination level data is valuable to transit agencies and researchers because it typically provides a large, year-round sample of passenger demand patterns. Smart card data can also be used to understand how the travel behavior of users changes over time, including in response to fare or service changes, weather, or micro levels, such as a bus stop or zone (Morency et al. 2007). Smart card data also facilitates the grouping of transit users by their distinct trip sequence structure. Using TfL data for a 4-week period, researchers clustered transit users based on their activity patterns. They found that 40% of frequent transit users did not follow a conventional trip activity sequence involving one trip to work in the morning and another trip home in the evening (Goulet-Langlois et al. 2016). Information on the passenger segments that use different stations or routes may be provided to advertisers to enable them to customize ads to different passenger types (commuters, visitors, etc.). As such, this data is not only valuable for research purposes, but it can also be used to generate revenue through advertising. Wi-Fi and Bluetooth Data Although smart card and bank card data can provide stop- or station-level originâdestination information, Wi-Fi and Bluetooth connection records enable even more detailed tracking of passenger movements within stations or within a gated transit system. For example, TfL engaged in a pilot study collecting Wi-Fi signals from passengersâ phones, which they used to understand passengersâ route choices within the subway network (Transport for London 2017). This aspect of Wi-Fi and Bluetooth data makes it especially valuable for research and also makes it an important source for informing advertising. Transit agencies can estimate not only the 61 Â
62ââ Data Sharing Guidance for Public Transit AgenciesâNow and in the Future number of passengers who pass through a station, but also the number of passengers who pass a specific location within a station. This has the potential to enable the transit agency to generate more advertising revenue by providing these detailed statistics to advertisers (Cheshire 2017). Like fare card transaction data, the fact that this data tracks individual devices means that there are some privacy risks associated with sharing the data in disaggregate form. Video Data Many transit vehicles and stations are equipped with video cameras. From a research and planning point of view, this data can provide insight on crowding and passengers left behind at stations. This data can also be valuable for police investigations, and there are examples of transit agencies sharing this data with law enforcement agencies. Video systems vary; some, but not all systems, enable âpan-tiltâ or âzoom-inâ features, which allow for facial recognition. Particularly in systems that permit facial recognition, there are significant privacy risks associated with storing and sharing this data (Thomas 2018). Transportation App and Webpage Usage Data Some transit agencies have developed or commissioned the development of a customer- facing transportation app for trip planning, ticketing, or both. When transit agencies develop or own their apps, they can harvest data from these apps. In trip planning apps, app usersâ destination requests are saved. In bus or train arrival apps, the specific bus or route information requested is saved. In addition, many apps save data on usersâ locations while they use the app, which can be used to infer origins and destinations for trips made (Lu et al. 2015). In a research context, this data can be used to draw insights about mode choice and alterna- tives based on the behavior of app users. For example, users may search for transit directions in an app, but ultimately choose to use a different mode. This choice can be inferred from the userâs app usage and smartphone location data. These apps and the data they generate also have the potential to be used in location-based advertising and other geotargeted information. According to a recent study, most transit agency ticketing apps are location-aware, but this feature is only used to locate nearby stops/routes. Although this may be an untapped revenue source for transit agencies, initial research suggests there may be pushback to this type of advertisement (Brakewood et al. 2017). Any discussion of transportation app data merits noting that the majority of transit agencies rely on third-party developers to provide customer-facing transportation apps to their customers. Some transit agencies have access to the data from these apps, while others do not. Different models for app development and data access are described in Section 4.3. Even transit agencies that do not have a proprietary app have a public website that often includes a route planning tool. Although usage of web planning tools may be more limited and does not provide location information, transit agencies may nonetheless be able to generate value from their web traffic analytics. Transit agencies can draw insight from the type of information their web visitors access, and this information may also be useful for local planners and developers. Survey Data Transit agencies regularly conduct surveys of their users. Although some of the informa- tion collected in surveys, such as originâdestination patterns, can now be inferred from other sources, surveys continue to be a valuable source of information on things like trip purpose, trip alternatives considered, and demographic characteristics of transit customers. Surveys are also
Data Types and Sharing Attributesââ 63Â Â used to assess customer satisfaction and collect information about transit passengersâ prefer- ences and priorities. In short, surveys often provide information that cannot be gathered from other transit agency sources, nor can it be inferred from external data sources, such as cellphone and GPS data. As a result, this data may be valuable to researchers and others, particularly if it can be combined with other data sets. Raw survey data can pose privacy risks, because responses can contain identifying information, such as home address and demographic characteristics of the respondent. Instead of sharing survey data openly, transit agencies typically share aggregated reports on the surveys they conduct. Of the 11 transit agencies interviewed, one publishes survey responses to its customer satisfaction survey, aggregated by month. Six other transit agencies publish reports that summarize survey findings. Passenger Count Data Some systems for monitoring passenger movements provide aggregate information on pas- senger counts without tracking individuals. APC systems use sensors to estimate the number of passengers that board a vehicle, and âload weightâ data (data on the weight of a train or vehicle and its occupants at different points along a route) allows analysts to estimate passenger loads. Some fare collection systems also produce anonymous count data, for example, estimates of the number of people who pass through a turnstile or interact with a farebox. Fareboxes often record all boardings regardless of fare payment type, and even unpaid boardings recorded by the driver. As a result, this data can support studies of fare evasion. More broadly, it is used to analyze crowding and productivity of routes and lines. Except in the case of very small samples (e.g., if just one person boards a bus at a stop during the period reported on), this data does not enable the identification of individuals and therefore does not elicit privacy concerns. Four tran- sit agency interviewees reported their agencies share APC data with researchers (in three cases) or municipalities (in one case). Incident Data Transit agencies collect data on incidents, including details on the cause of incidents and the operational response. They also collect data on passenger injuries and claims. Sharing this data can support research that helps transit agencies improve incident response protocols or prevent incidents. In rare cases, incident data may pose privacy risks if individuals involved in the inci- dents are described in an identifiable way. One transit agency interviewee noted that their agency does not release incident data publicly because of the staff effort that would be required to read descriptive data fields to confirm that they could not be used to identify individual passengers. In general, the transit agency interviewees indicated that incident data is not released publicly (though some publish real-time alerts about incidents). However, one transit agency shares this data with a research partner who has analyzed incident responses and passenger disruption impacts. Route and Schedule Data Route and schedule data are commonly shared publicly by transit agencies. GTFS is a stan- dardized format for this data, though some agencies use proprietary formats from scheduling software companies. Nearly all transit agencies responding to a 2015 survey provided this type of data free of charge (Schweiger 2015). This data is used in trip planning and real-time transit information applications (Antrim and Barbeau 2013; Schweiger 2015). Transit agencies also have more detailed transit system data, such as station diagrams. This data can be useful for research on how passengers move through the network, but some transit agencies opt not to share it widely because of security concerns.
64ââ Data Sharing Guidance for Public Transit AgenciesâNow and in the Future Automated Vehicle Location Data AVL data tracks the location of vehicles over time. AVL data is often a critical input to analysis that infers passenger destinations (Gordon et al. 2013). In addition, AVL data can be used to track and display transit system performance, evaluating headway variability and schedule adherence. Transit agencies use data from AVL systems to provide information to customers about the next train or bus arrival. Many transit agencies share AVL data streams publicly, and app developers use this data to fuel transit arrival apps (Schweiger 2015). In many cases, this is accompanied by real-time alert information. GTFS-RT is a standardized feed specification for this type of data, although not all transit agencies use this format for the published data (Barbeau 2018A). Transit System and Vehicle Maintenance Data This category of transit data may include records of failures and maintenance activities as well as maintenance facilities and maintenance costs. Transit agencies that report to the NTD report vehicle reliability statistics (defined as the average distance between major mechanical failures). Beyond this national reporting, sharing of transit vehicle maintenance data is not a prevalent topic in recent literature, and the interviewees for this study did not reveal any exter- nal sharing of public transit maintenance data. Researchers may use transit agency maintenance data for life cycle cost assessments (Chester and Horvath 2010). Some researchers have also considered the possibility of using sensor data to predict maintenance needs (Corazza et al. 2018). External research has the potential to support internal use of maintenance data for deci- sion support. This data informs both maintenance strategies and capital investment decisions. Staffing and Operations Data Staffing and operations data includes crew and vehicle assignments, absenteeism data, and operational procedures. This data can support research on operational efficiency and scheduling, which may ultimately allow the transit agency to operate more efficiently. However, there was little discussion of these data types in the transit agency interviews on data sharing. In the area of crew scheduling, there was considerable research in the past, but there are now off-the-shelf solutions that transit agencies use. One transit agency interviewee indicated their agency provided operations data to a research partner who helped them pilot a new bus operations method. Financial Data Financial data includes transit agency spending and subsidies. The NTD collects information on transit agency spending. Sharing this data helps transit agencies maintain transparency and accountability. One transit agency interviewee indicated that their agency posts budgeted and actual expense and revenue data on a monthly basis. Geospatial Data of Transit Facilities Based on the interviews with several representatives from MaaS companies, such as ridesharing and micromobility companies, the need for open data on detailed (and accessible) transit station entry locations and parking facility locations is rising. For example, the digital information of specific transit station entrance locations is rarely provided by transit agencies to the public. Such information could help MaaS companies provide better and smoother integration with public transit services for first- and last-mile riders.
Data Types and Sharing Attributesââ 65  External Data Types and Transit Agency Uses There are a wide variety of data sources that could have relevance to transit agencies, including financial data and social media data, such as Twitter. Although these data sets may be benefi- cial to transit agencies (spending patterns can reveal customersâ movements or trip purposes; Twitter can be mined for tweets about public transit disruptions and other events), this section describes three classes of data that most directly measure travel patterns: trace data from cell- phones and other GPS-enabled devices, data from transportation apps, and other data from private mobility providers. Cellphone, Location-Based Services, and GPS Trace Data Cellphone connection data is collected by cellular service companies, while smartphone apps that use usersâ locations collect LBS data. According to Crunchbase (2019), there were more than 3,300 organizations in the LBS sector in 2019. This includes fitness, navigation, social media, and dating apps, which collect data on peopleâs whereabouts. These data sources are aggregated by analytics companies who derive and sell speed and originâdestination insights (Cambridge Systematics, Inc., 2018). Transit agencies can use this data to understand characteristics of alternate modes, demand patterns on alternate modes, and transit access and egress behavior. Some companies and researchers use phone location and phone system data to infer a userâs mode of travel. This mode-of-travel information can add value to this data for transit agencies. Transportation Planning App Data Transportation planning apps include navigation apps, such as Google Maps and Waze, and apps such as Transit App and NextBus, that provide information on transit vehicle arrivals and collect information including the following: ⢠Records for each session, including beginning and ending coordinates and time stamps ⢠Placemarksâstored home and work locations ⢠Carshare, bikeshare, and TNC bookings (if available through the app) ⢠Trip planning routes, stops searched, and favorite routes Data from these apps provides an additional layer of insight about other location data from smartphone apps. Because an analyst can identify when and where a user looks at transit infor- mation for a particular location or route and then how they behave after (whether they take transit, book an alternate mode, or do not travel at all), this app data allows transit agencies to better understand their customersâ decisionmaking processes. Private Mobility and MaaS Data Transit agencies are very interested in the travel alternatives that transit passengers have, as these are major determinants of transit demand. The transit agency interviewees were interested in TNC, scooter, carshare, and bikeshare data. One transit agency interviewee indicated their agency had already used bikeshare data to understand public transitâs competitiveness with other modes. However, the small user base for the bikeshare system made it difficult to draw conclusions. Some private mobility providers share some data publicly. For example, several bikeshare systems have released data on trip history (https://www.capitalbikeshare.com/system-data and https://www.citibikenyc.com/system-data) and Uber provides Uber Movement data, which shows zone-to-zone travel times based on Uber driver data. However, most TNC companies are hesitant to share demand data publicly.