CHAPTER 3
Research Approach: TRANSIT-data-tools as GTFS Creation and Assembly Platform
To effectively manage the collection of GTFS feeds from carriers, validate GTFS for errors that may affect a feed’s ability to be integrated into the ICBA, create feeds, and train carriers to create their own GTFS, the research team leveraged the use of the TRANSIT-data-tools platform, developed by research team member Arcadis. This chapter provides an overview of the tool and its features as well as a discussion of the technical assistance provided to intercity carriers.
Tool Overview
TRANSIT-data-tools is an open source GTFS development tool developed by Arcadis to allow for public transit operators to intuitively create GTFS from schedule, location, and other miscellaneous data that they already possessed.
The following subsections discuss the various features of the tool and their application to the ICBA effort.
Tool Features
GTFS Development and Editing
Key among all features is the platform’s GTFS editor tool. In combination with the other features to be discussed in the following subsections, this tool allows for easy creation and revision of GTFS feeds in a user-friendly platform designed for use by those who may not have extensive technical expertise. Because a GTFS feed is composed of a collection of upward of 20 individual .txt files, editing and creating these directly can be unintuitive, unwieldy, and difficult to understand. Instead, the GTFS editor breaks down the information contained by these files into a smaller number of categories. These categories include the following:
- Feed Info
- Agencies
- Routes
- Stops
- Calendar
- Fares
Feed Info contains much of the basic metadata associated with a GTFS feed and is important to fill out to ensure consistency between feeds and that users can understand the validity of the feed and reach the developers in case errors are uncovered. This includes a unique ID number, the name and website of the organization/individual responsible for developing the feed, feed language, validity start and end dates, contact information for the feed developer, version numbers (if applicable), and information about defaults for later attributes.
The agency section broadly corresponds to the agency.txt file within a GTFS feed and is intended to collect information about the transit operating agency or agencies represented in a file. While most feeds will represent only a single agency, in some cases multiple transit operators will be included—with Amtrak Thruway and its associated carriers (e.g., Maryland’s BayRunner Shuttle or Washington State’s Dungeness line) as one example in the context of the ICBA. For each agency, the developer must fill in similar information to what was added to the Feed Info tab; this includes a unique ID number, a name and contact information for the agency, the primary language of the agency, a primary website, and optional websites linking to fare and branding information.
Although files like these can be edited directly as text files, creating stops, routes, and schedules can be much more difficult, and the interface of the TRANSIT-data-tools platform represents an improvement when conducting these tasks. Using the integrated map, users can place stops at specific locations without needing to know the specific coordinates of these locations (though best practices do still suggest that users be as precise as possible and exact coordinates may be helpful) and link placed stops with automatically generated routes. After placing routes and stops, attributes like names and links to stop- or route-specific information, names, descriptions, unique IDs, accessibility, and stop spacing in minutes (for routes specifically) can be added, as shown in Figure 2.
After setting up stops and routes and indicating what days they run, with exceptions able to be made to these calendars for holidays and other special events, developers can input schedules for each service, with an example included in Figure 3. These can be done two ways. First, if stop times differ substantially between trips, each trip’s stop times can be input individually. In most cases, however, where times between stops remain fairly consistent, each trip can be populated with generalized stop times to be offset for each specific trip; for example, if the bus leaves at stop A at 9:30 a.m. and 10:30 a.m. and arrives at stop B at 10:00 a.m. and 11:00 a.m., stop B could be listed as 30 minutes after stop A, with an offset of 9 hours and 30 minutes and of 10 hours and 30 minutes, respectively.
Additional information can be added regarding fares, but this is optional and is not required for a valid feed or to proceed in saving a feed through the platform. Because of the difference in operational models between local transit and intercity services, especially where prices may fluctuate due to demand, or where commercial sensitivities exist, this may not be applicable to many carriers and can generally be ignored.
Data Management
While the TRANSIT-data-tools platform can be used to create single feeds through the data editor, it is also a powerful tool in organizing multiple feeds and multiple versions of feeds, as discussed previously. This has been the primary use of the platform for the research team working to construct the ICBA.
After determining that the team would be collecting data from carriers in the form of GTFS feeds, a process was needed to organize the many feeds that would be collected. The Data Manager functions native to this platform allow the team to do a few key tasks:
- Upload, organize, and access feeds from multiple carriers over the internet
- Label feeds for easier searching and management
- View feed metadata
- View information from the feed
- Validate feeds for errors
- Apply transformations to entire feeds
Each of these will be discussed in greater detail.
Using TRANSIT-data-tools’ data management functions, individual pages can be created for different carriers and data can be added. When multiple carriers are present, search and filtering functions are available and allow users to find a specific carrier’s page or look for only active feeds, expired feeds, expiring feeds, or those with certain tags applied. These tags can be customized by users to represent anything needed for a specific project; for the purposes of the ICBA project, the research team created tags labeling carriers by priority level, whether a feed had been collected, whether a carrier needed to be contacted, and which feeds were obtained from aggregators, among other characteristics (see Figure 4).
After accessing a carriers’ page, one can access and download the GTFS feed associated with that carrier, as well as view a collection of associated information (see Figure 5). Each page includes upload dates, validity dates, and the name of the user publishing a given version. A map can be used to show routes (if a shapes.txt file is included) and an overview of the region served by the carrier in question, and data visualizations show the number of trips or service hours by date, information about the stops served and trips made per hour by individual routes, similar information to individual stopping patterns under the umbrellas of each route, and timetables. Since the data manager can maintain a record of multiple versions of a carrier’s feed (discussed in greater detail in the following section), this information can be viewed for each individual version, allowing for comparisons across time; the timetables tab specifically indicates decreases and increases in headways and travel times from one version to the next.
On each page, users can also access the GTFS validator (see Figure 6). This tool automatically checks each feed for formatting inconsistent with the GTFS format and missing fields, as well as issues within the data that may affect its usability; some examples of these issues include stops that are far from the rest of the service area that may indicate incorrect coordinates, travel times that seem improbably fast, missing fields, and stops that are not referenced by any timepoints.
Version Control
One of the most powerful functions of the TRANSIT-data-tools platform, in creating and in collecting GTFS, is its version control abilities. With this functionality, the platform keeps track of multiple versions of the GTFS feed or feeds stored, with metadata recorded for each.
GTFS feeds can be added to TRANSIT-data-tools three ways: (1) creating directly in the platform, (2) fetching from a hosted location online, and (3) uploading directly. The specifics of version control functions differ accordingly.
For those feeds that are created directly in the platform, there are two layers of version control: snapshots and versions. Snapshots are like progress saves in a video game; they allow developers to save their unfinished progress while creating a GTFS feed without requiring them to finish all sections or publish the feed. In addition, when saving snapshots, developers can add additional notes, which allows for greater context regarding the work performed between snapshots. Once a feed is complete, the developer can then publish as a new version instead of saving as a snapshot.
When retrieving feeds from an online location, users input a URL and select from several options for how often to fetch new data (e.g., daily, weekly, and biweekly). TRANSIT-data-tools will then query the site indicated by the user on the timeline selected; if changes in the file hosted at that location are detected, the file will be saved as a new version, but no new version will be created if no changes are detected.
When uploading directly, each uploaded file is saved as its own version.
The data editor can be used to make changes with any of the input methods; these changes can be saved as snapshots and new versions as required. The feed information and the GTFS validator are available for each version of a given feed.
TRANSIT-data-tools as an Open Source Resource
One important consideration in the research team’s adoption of the TRANSIT-data-tools platform was the fact that it is available as an open source resource. Per the definition of the Open Source Initiative, open source software must be freely distributed, have source code available, be able to be freely modified and used as material to derive further works from, and not discriminate against persons or groups, among other more technical criteria (Open Source Initiative 2024). As such, the TRANSIT-data-tools platform is available for any interested party to use, to modify, and to add additional features, with source code and documentation available publicly over Github.
Technical Assistance Plan Overview
As part of the larger process of gaining cooperation from the intercity bus industry and in addition to setting up the TRANSIT-data-tools platform for the organization of GTFS feeds collected through engagement with carriers, research team members at Arcadis were also tasked with providing technical assistance and other guidance to prospective participants in the ICBA. This included creating user documentation for the TRANSIT-data-tools platform to better guide those developing GTFS, conducting GTFS development training sessions with carriers to teach them how to use the platform to create their own GTFS, and providing support to users of the platform.
User Documentation
To make the TRANSIT-data-tools more accessible to those who may not have any experience with GTFS development, research team members at Arcadis have worked to develop a suite of documentation describing the various components of its use. The main body of this documentation concerns the population of various fields within a GTFS feed, as well as how to use the platform to manage their feeds.
Users are first introduced to the Data Manager component of the platform, which allows users to coordinate and keep record of multiple GTFS feeds, feeds with multiple sources (i.e., created in data-tools, fetched from a location online, and uploaded directly), and the various versions of a given feed created as updates are made, either through the editor or as updated versions are fetched or re-uploaded. Given the complexity of GTFS feeds, being able to clearly and efficiently manage collections of multiple feeds (if necessary) and of multiple versions of feeds is important to ensuring data quality, reducing time spent manually organizing files or making changes to incorrect versions, and being able to compare services across time periods. Since multiple users can work on the same project, the documentation also includes a description of how to manage user accounts and the permissions associated with each user.
After introducing users to how to manage their data, the documentation provides an overview of the GTFS data editor’s interface, including how to save and publish versions, as well as the various GTFS components under which various individual GTFS tables are nested: feed information, agencies, routes, stops, calendars, and fares. Users are then introduced to these components in greater detail, with the guide documenting how to view the components, place routes and stops, and fill in attributes and timetables. Included in many of these pages are instructional videos demonstrating these procedures for greater clarity.
Once a GTFS feed is added, either through development directly in the data-tools platform or through import from a file or online source, TRANSIT-data-tools’ built-in validator will check the feed for common issues that may affect its integrity and usability in trip planning applications.
Even though descriptions of the issues are included within the validator and instances of a given issue are documented with line numbers for easier location and troubleshooting of problems, the guide includes an appendix describing some of the most common issues in greater detail.
Since this platform is available as an open source resource for GTFS development that can be set up separately for each carrier or group of carriers, the user documentation also includes a section of information for developers and other interested parties. Although this may not be immediately relevant for ICBA users, providing documentation on how to set up the platform for further open source development, localization, and interaction with user-created tools may prove useful in expanding the functionality and reach of TRANSIT-data-tools.
GTFS Trainings and Support
While the user documentation created by Arcadis has been written so that carriers can develop their own GTFS feeds completely independently after being added as users to the data-tools platform, the research team aimed to work directly with carriers to train them how to use the platform. Each session lasted around an hour, with representatives from Arcadis and RSG guiding participants through how to use the platform to input their stops, routes, schedules, and other information. One of these sessions has been recorded for future distribution or to use in developing future training materials (see also Chapter 2 of this report).
In addition to the support offered to carriers, a User Guide for Intercity Bus Carriers, Ticketing Services, Schedulers, and Business Development Planners was developed that explained the ICBA, how to participate, and how to develop GTFS. In its discussion of the atlas project, the research team introduces readers to the ICBA background, as well as some of the benefits of developing GTFS and participating in the project. The research team hopes that by participating in the project carriers are provided with an additional level of visibility to the public and to policymakers and planners beyond what is currently available. By developing GTFS, regardless of project participation, carriers can more easily add their routes to trip planning applications and increase the ease of integrating their services into planning analyses. Next, carriers are introduced in succession to the various GTFS components in detail, with explanations of their attributes, requirements, and best practices deemed applicable to intercity carriers, as well as deeper explorations of two GTFS development platforms: the RTAP GTFS Builder and the TRANSIT-data-tools platform.