National Academies Press: OpenBook

Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests (2014)

Chapter: Appendix F - Using the Bundled Scripts and Code

« Previous: Appendix E - Experiment B Models
Page 141
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 141
Page 142
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 142
Page 143
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 143
Page 144
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 144
Page 145
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 145
Page 146
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 146
Page 147
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 147
Page 148
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 148
Page 149
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 149
Page 150
Suggested Citation:"Appendix F - Using the Bundled Scripts and Code." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 150

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

141 Introduction The scripts compiled as part of this appendix implement basic aspects of the GPS processing methods tested as part of Experiments A and B. The main goal of these implemen- tations was to test the feasibility of various methods and to assess how well they worked. As such, these implementations do not necessarily produce readily usable results, although they certainly can be modified and extended to meet these types of applications. It should also be noted that the code and procedures provided with this report were developed as part of this project, but NCHRP makes no warranty that the code and procedures will continue to work as written given that the software tools they depend upon are periodically updated. Prerequisites for Running the Bundled Code The methods implemented as part of Experiment A were mostly developed using R version 3.0.1, the latest version of R for various platforms can be downloaded from www.r-project.org. The methods implemented in R are the most straightforward to use given their procedural nature (i.e., to get results simply call the appropriate function with the correct data parameters). The fact that R supports multiple computing platforms also makes it easy to use these methods in Microsoft Windows, Mac OS X, and various Linux distributions. To facilitate the use of R, end users are encouraged to download and install RStudio, which provides a nice integrated user environment for running R scripts, editing code, browsing data, and viewing graphical outputs. RStudio install packages for most popular computing platforms can be downloaded from www.rstudio.com. The WEKA toolkit was also used to implement some of the procedures described; for procedures implemented using WEKA, a simple list of steps to follow is given that should allow practitioners to reproduce the results. Whenever appli- cable, model configuration files are provided that can be used as a starting point by practitioners. The latest version of WEKA can be obtained from www.cs.waikato.ac.nz/ml/ A P P E N D I X F Using the Bundled Scripts and Code weka. WEKA requires Java Runtime, which can be installed in Microsoft Windows, Mac OS X, and various Linux Distros. The discrete choice modeling package BIOGEME was used to estimate models for travel mode and trip purpose identifi- cation. The original Bison BIOGEME model specification files are provided along with instructions on how to use them with BIOGEME. Source code and pre-compiled binaries of Bison BIOGEME can downloaded from biogeme.epfl.ch. The web- site makes available pre-compiled Microsoft Windows binaries, but BIOGEME can also be built from source on Mac OS X and most Linux Distros. Finally, code in Java, SQL, and C++ is also referenced as part of some of the implemented methods. The Java code is invoked directly from R using the rJava package; pre-compiled .jar files are provided so only the Java Run-Time Engine (JRE) is needed. The latest version of the JRE can be downloaded from https:// java.com/en/download/index.jsp. The SQL scripts provided were used with an instance of PostgreSql version 9.1, which can be downloaded from www. postgresql.org. The C++ code implements the tool named NCHRP_GPS_Data_Reduction, which is used to prepare the input data for Experiment B; it can be compiled in the Microsoft Windows platform using free versions of Micro- soft Visual Studio, which can be downloaded from http:// www.microsoft.com/visualstudio/eng/downloads#d-2013- express. Using the provided C++ source code in other plat- forms is possible, but may require modifications as well as the creation of make files, which are not included in this package. Experiment A Instructions This section covers the loading and use of the procedures implemented as part of five Experiment A methods tested, which include: 1. GPS point noise filtering 2. Trip end identification 3. Mode transition identification

142 4. Travel mode identification 5. Trip purpose identification Information on how to use the Experiment A method imple- mentations is organized based on the software tools used. The majority of the methods in Experiment A were implemented using R, with a smaller set done using WEKA and BIOGEME. Methods Implemented Using R Before the routines can be loaded, it is necessary to config- ure the R environment by ensuring that the following pack- ages are installed: geosphere, rJava, ggplot2, and ggmap. This action can be done by issuing the following command: > install.packages(“geosphere,rJava, ggplot2,ggmap”); Once these packages are loaded, the methods routines can be loaded into R for use. The simplest way to do this is to load the RStudio project file NchrpScripts.Rproj; this action will set up R’s home to the home folder of the script files. Once the project file is open, the following command can be issued to initialize the R environment and preload the implemented routines: > source(‘Initialize.r’) Loading GPS Point Data The package includes the function loadData() for loading GPS point data in GeoLogger format. The GeoLogger format Field Description Field 1 A = valid data, GPS ok D = valid data, DGPS ok V = first valid point after loss of signal or power Field 2 Latitude (dd.ddddd) Field 3 N = North of the Equator S = South of the Equator Field 4 Longitude (ddd.dddd) Field 5 E = East of Greenwich W = West of Greenwich Field 6 Speed in mph (s.s) Field 7 Time UTC (hhmmss) Field 8 Date UTC (ddmmyy) Field 9 Heading – clockwise degrees from north (000 - 259) Field 10 Altitude in feet (a.a) Field 11 HDOP (00.5 - 99.9) Field 12 Number of satellites (00 - 12) is a comma-separated values text file that contains the follow- ing data fields: The loadData function returns a data set that can be used as inputs to point-based methods. To invoke the provided function and have the loaded GPS data assigned to a variable, the following command can be issued: > rawPoints <- loadData(‘~/NchrpScripts/ Data/sample_points.csv’) The loaded points can then be viewed using RStudio’s built-in data grid browser by issuing the following command: View(rawPoints)

143 The built-in statistical functions of R can also be used to summarize and graph the data: > summary(rawPoints) > hist(rawPoints$speedmph) GPS Point Noise Filtering To run the noise filtering methods, pass in the raw GPS points loaded using the loadData() function. The methods add a Boolean variable (noise) to the passed-in data set that is set to TRUE if the point is considered to be noise. Three noise filtering method implementations are contained in the bundled source code. The sample code below shows how to run them and do a quick summarization of their results: > lwPoints <- noiseFiltering_Lawson (rawPoints) > summary(lwPoints$noise) Mode FALSE TRUE NA’s logical 4337 663 0 > > safPoints <- noiseFiltering_Schuessler_ Axhausen(rawPoints) > summary(safPoints$noise) Mode FALSE TRUE NA’s logical 4744 256 0 > > stfPoints <- noiseFiltering_Stopher (rawPoints) > summary(stfPoints$noise) Mode FALSE TRUE NA’s logical 4996 4 0 The filtered out points can also be visualized using the ggmap library. For example, the following commands will create a map using a bounding box computed based on the points’ coordinates and will apply different colors (or gray levels) based on their speeds: > fgps <- subset(lwPoints, noise) > bbox <- c(min(fgps $long), min(fgps $lat), max(fgps $long), max(fgps $lat)) > map <- qmap(bbox, zoom = 15) > map + labs(x = “Longitude”, y=”Latitude”) > map + geom_point(aes(long,lat, colour= speedmph, size=2, alpha=0.25), data= fgps) + scale_colour_gradient(low=”red”, high=”green”) Trip Identification The trip identification methods implemented take as input a data set of GPS points and return a list of GPS trips, with some basic attributes added. To derive trips using the two imple- mented methods and the GPS points filtered by the Lawson method as input, the following commands can be used: saTrips <- tripIdentification_Schuessler_ Axhausen(subset(lwPoints, !noise))

144 wlTrips <- tripIdentification_Wolf(subset (lwPoints, !noise)) The generated trips include the following basic attributes: • startindex & endindex: point indexes into the passed-in point data for each trip • starttime & endtime: UTC date and time stamps for the start and end of each trip • startlat & startlong: latitude and longitude coordinates for the trip start • endtlat & endlong: latitude and longitude coordinates for the trip end • distancemeters: total distance accumulated over the trip’s points, calculated using the great-circle distance formula and returned in meters • travtimeminutes: endtime – starttime in fractional minutes • avgspeedkph: trip’s average speed in km/h Travel Mode Transition Identification These methods can break a sequence of GPS points into mode segments and are only applicable to GPS data that have been collected using on-person data loggers. Mode segments consist of individual legs in a multimodal trip. The output of these methods is similar to that of the trip identification meth- ods, as input that take in a data set of GPS points returned by the loadData() function. The first implemented mode transi- tion identification methods can be invoked using the follow- ing commands: segments <- modeTransitionIdentification_ Oliveira (rawPoints) The second method (modeTransitionIdentification_Tsui Shalaby_SchuesslerAxhausen) employed a Fuzzy Logic Engine written in Java by Edward Sazonov that can be found at http://people.clarkson.edu/~esazonov/FuzzyEngine.htm. The authors modified the engine by adding a method to return several variables, one for each mode of interest. (A fuzzy engine’s normal operation is to return a decimal value between 0 and 1; the modification allowed an array of values to be returned: 0 to 1 for each mode, such that the sum of all is 1.) Because the Experiment A reference data used a different set of travel modes, further modifications to the engine to gener- alize the mode handling were made. The modified code and compiled objects are included in the distributed package. The package rJava was used to invoke the Java .jar code directly from R, so the method can be invoked directly using the fol- lowing command: modeSegments <- travelModeIdentification_ TsuiShalaby_SchuesslerAxhausen(segments, filteredPoints) Travel Mode Identification The implementation of the Stopher travel mode identifi- cation method was very dependent on the available spatial data regarding the location of roadways and railroads. To implement this method within R, it was necessary to conduct

145 extensive preprocessing of the data using a GIS; the results of this preprocessing were then saved as text files, which are directly referenced by the R code. This makes the implemen- tation unsuitable for uses outside of this project, and, because of this, this method is not covered in the bundled package. Methods Implemented Using WEKA The WEKA machine learning tools were used to implement machine learning-based methods, namely neural networks for travel mode identification and decision trees for trip purpose. The packaged model files (.model) can be opened with WEKA to run on other data sources. Note that text files produced by WEKA use UNIX line endings, which Windows Notepad will not display correctly. Use WordPad or a more advanced text editor to view them. WEKA makes a distinction between numeric values and nominal values. A numeric variable can vary over the whole real line, while a nominal value is one of a set. We use nominal values where there is no “closeness” relationship, and the con- cept of “for cases with less than a value” does not make sense. For example, the number of household members on a trip (hhmem) is a numeric value because there might be inter- esting relationships between having one household member and having more than one household member on the trip. On the other hand, the place type (ptype) is nominal, because there is no obvious relationship that home and work share that school does not. Travel Mode Identification Using Neural Networks The saved file ForWekaTrain180.model contains the final trained neural net, which can identify between the travel modes walk, bike, car, bus, and train. This file can be opened in WEKA and then applied to input data files containing independent variables that the net can use to estimate travel mode. Input files for WEKA use the Attribute-Relation File For- mat (ARFF). These are CSV text files that include metadata that covers such items as all the possible values that nominal variables can take. For more information on ARFF files see http://www.cs.waikato.ac.nz/~ml/weka/arff.html. You can use WEKA’s explorer to prepare input ARFF files from CSVs. In this case the following transformations were done: 1. Remote attributes 2 and 3 (startindex and endindex) 2. Convert attribute 6 (travmode) from numeric to nominal 3. Convert attribute 1 (caseid) from numeric to nominal, and then from nominal to string. The ARFF file records this as: @relation ForWekaTrain180-weka.filters. unsupervised.attribute.Remove-R2-3-weka. filters.unsupervised.attribute.Numeric ToNominal-R6-weka.filters.unsupervised. attribute.NumericToNominal-R1-weka. filters.unsupervised.attribute.Nominal ToString-C1 As part of Experiment A, a random sample of 180 trips was chosen to train the network, and then a separate random sample of 90 trips was used to test the network. The neural net uses the following fields (travmode is the result): • Average speed in mph • Max speed in mph • Standard deviation of the distance between locations in feet • Dwell time in seconds • Travel mode [1 = walk, 2 = bike, 3 = car, 5 = bus, 7 = train] Once an ARFF file is ready, the neural network model can be built by training it on that ARFF. This example script uses a learning rate of 0.1 and runs for 300 epochs: >java -cp weka.jar weka.classifiers.meta. FilteredClassifier -d ForWekaTrain180. model -t ForWekaTrain180.csv.arff -F weka.filters.unsupervised.attribute. RemoveType -W weka.classifiers.functions. MultilayerPerceptron — -L 0.1 -N 300 > ForWekaTrain180.out.txt To validate the model, run the following script on the test ARFF: >java -cp weka.jar weka.classifiers.meta. FilteredClassifier -l ForWekaTrain180. model -T ForWekaTest90.csv.arff > ForWekaTest90.out.txt The FilteredClassifier/RemoveType is necessary to have the input ARFF file have a record identifier, but not allow the neural net to make inferences based on that record identi- fier. The record identifier allows the mapping of the results back to the original data, but since the neural net’s training set and test set must have the same schema, the removal of the record identifier must be done within the WEKA command. It is RemoveType because the record identifier is a string, and it is the only string in the data. Trip Purpose Identification Using Decision Trees To apply the trip purpose decision tree, it is first necessary to prepare an input file containing all of the variables refer- enced in the model, as specified in Appendix D. This process

146 was conducted using a PostgreSql database. A template that can be used to recreate this database’s structure is included in the bundled packager. To use it, first install PostgreSql and start the service. Once you connect to the server, create a new database and load the PostGIS extension on it. More infor- mation on how to configure and use PostgreSql and PostGIS can be found at http://www.postgresql.org/docs/manuals/ and http://postgis.net/documentation, respectively. Once the database is created, open a query window to it using a client like pgAdmin (http://www.pgadmin.org/) and then load the purpose_template.sql file and execute it. This action will create a blank database structure as well as a series of functions that can be called to prepare the data. The database structure uses places to store trip informa- tion. Each place record has a reference to a location record, where the actual destination addresses and coordinates are stored. Similarly, household, person, and vehicle informa- tion should be entered into the supporting tables. Once the database is populated, the function prepare_data() can be invoked with the following command: SELECT prepare_data(); This command will populate the table placestripimpute- variablesints, which can be exported to CSVs and used as input to the trip purpose identification models. The saved decision tree for estimating trip purpose is included in the file arc_agg_sample.model. This file can be opened in WEKA and used to classify trip purpose files. Use WEKA’s Explorer to prepare an ARFF file with input data and then follow these steps to make the data usable by WEKA: 1. Remove columns ID, finalpersoncategory, tpurp, stpurp, distancetohome, airportdestnotflying, orig_taz, dest_taz, airport, outoftown, airportpurpose, longitude, latitude 2. Move apurp to the last column (WEKA prefers that train- ing validation/output columns be last) 3. When preparing the validation file, remove nearschool, lu_name 4. Turn all columns into nominal values except arrhour, age, actdur, tripdistance, tottr, hhmem 5. When preparing the validation file, remove home pur- poses with a. RemoveWithValues index 99 (apurp) b. Nominal indexes 1,2,3,16,17 modifyheader (16 and 17 are codes 96 and 97) 6. Rename apurp to tpurp by editing the ARFF file in a text editor This is recorded in the resulting ARFF as: @relation ‘arc_agg_sample-weka.filters. unsupervised.attribute.Remove-R1,5,15-16, 21,33,49-51,53-54,111-112-weka.filters. unsupervised.attribute.Reorder-Rfirst- 12,14-last,13-weka.filters.unsupervised. attribute.NumericToNominal-R2,3,12,15, 16,36-V’ And for the validate file: @relation ‘arc_agg_validate-weka. filters.unsupervised.attribute. Remove-R1,5,15-16,21,33,49-51, 53-54,113-114-weka.filters.unsupervised. attribute.Reorder-Rfirst-12,14-last, 13-weka.filters.unsuper vised.attribute. Remove-R96,100-weka.filters.unsupervised. attribute.NumericToNominal-R2,3,12,15, 16,36-V-weka.filters.unsupervised. instance.RemoveWithValues-S0.0-Clast- L1,2,3,16,17-H’ The steps described here can also be done in the graphi- cal user interface (GUI) tool but it is simpler and faster to explain them as command-line interface (CLI) statements. First make a .model file, which contains all the information about the tree. This can be done by issuing the following command: >java -cp weka.jar weka.classifiers. trees.J48 -t arc_agg_sample.csv.arff -M 25 -x 10 -d arc_agg_sample.model -i > arc_agg_sample.output.txt This command gives WEKA an ARFF file and produces a J48 decision tree with a minimum leaf of 25 instances and 10 folds of cross-validation. The output file gives a textual description of the decision tree, its success rate on the train- ing sample, and the confusion matrix on the training sample (which is a grid comparing model predictions with actual values). The tree can be visualized by creating a DOT file, which can be done using the following command: >java -cp weka.jar weka.classifiers. trees.J48 -t arc_agg_sample.csv.arff -M 25 -x 10 -g > arc_agg_sample.dotty The open-source program “dot” can then be used to cre- ate a flowchart using the .dotty file. The easiest way to get this program is to install “GraphViz” from www.graphviz. org. The output from the dot program will be scalable vec- tor graphics (SVG) flowcharts, which can be opened in most modern browsers. Alternatively, WEKA’s built-in visualiza- tion of trees can be used, but it does not produce as clean

147 of an image. To obtain a SVG flowchart, issue the following command: >dot.exe -Tsvg -o -Kdot arc_agg_sample. dotty > arc_agg_sample.svg Finally, the created tree can be applied to an existing data set (arc_agg_validate.csv.arff) to generate aggregate results using the following command: >java -cp weka.jar weka.classifiers. trees.J48 -T arc_agg_validate.csv.arff -l arc_agg_sample.model > arc_agg_ validate.output.txt The following command can be used to obtain a file with the predicted values for each input record: >java -cp weka.jar weka.classifiers. trees.J48 -T arc_agg_validate.csv.arff -l arc_agg_sample.model -p 0 > arc_agg_ validate.predictions.txt Methods Implemented Using Bison BIOGEME BIOGEME is very strict about its input variables; so, to save time and to avoid major headaches, ensure that the input data contain only numbers (except for the header row) and that no empty values are present in the file. Bison BIOGEME uses a CLI; to obtain a CLI console in Windows, click on the start menu, type in cmd.exe, and hit enter. Once the console window is open, navigate to the path where the BIOGEME model file is using the console’s cd command. Finally, to run Bison BIOGEME with the provided model specification file (assumes extension .mod) and, passing in the input file name (tab-delimited text file), issue the following console command: > biogeme mymodel sample.dat Once a satisfactory model estimation is obtained, the BIOSIM tool can be used to simulate choices using the Monte Carlo method. Several output files will be created by the pro- gram as it runs. The two most important files are an HTML document with a detailed report on the model estimation results and a .res text file, which follows the same format as the input model file, with the final estimated model. The final model .res file can be used as input along with a tab- delimited text file containing the independent variables of the model. To do this, create a copy of the file and change its exten- sion to .mod, then open the file and update the [SampleEnum] value to the desired number of simulated outcomes. A GUI is also available. More information on using the GUI and BIOGEME can be found on the biogeme.epfl.ch website; a helpful tutorial is also available at http://biogeme. epfl.ch/v18/tutorialv18.pdf. Probabilistic Travel Mode Identification To use the included discrete choice travel mode identifica- tion model, prepare an input tab-delimited text file with the appropriate speed attributes, as described in Appendix D. The file header should contain the following columns: Caseid startindex endindex minspeedmph maxspeedmph avgspeedmph sdspeedmph minaccelmps2 maxaccelmps2 avgaccelmps2 sdaccelmps2 distancemiles travmode It is important to have all records populated with a valid value for travmode as shown in Appendix D; otherwise the record will be ignored by BIOGEME. Once the input file is assembled, an enumeration file can be generated by issuing the following command: >biosim final test.dat This command will produce an enumeration file (.enu) with utilities for all the choices, probabilities, and simulated outcomes. Trip Purpose Identification To apply the trip purpose discrete choice model, it is first necessary to prepare an input file containing all of the vari- ables referenced in the model, as specified in Appendix D. The database procedure described previously in the section named “Trip Purpose Identification Using Decision Trees” can be used for this purpose. Once the data are prepared, BIOSIM can be used to simulate choices based on an input file. To simulate choices using the aggregate purpose model, the following command can be used: > biosim agg_purpose input.dat The resulting enumeration results can then be related back to the input data for analysis. Experiment B Instructions This section provides instructions for applying the mod- eling process for identifying demographic characteristics of GPS sample data described in Experiment B. The process here assumes that the sample models estimated during the develop- ment of the modeling process are to be applied to sample data that have been generated through the trip imputation process. Prerequisites Obtain the ‘NCHRP_GPS_Data_Reduction.exe’ execut- able from the bundled files. Alternatively the project can be

148 compiled from source code. This can be done using free software such as Microsoft Visual C++ express, or free open- source software such as Eclipse. Steps 1. Generate the Trip Input File from the results of the GPS trace analysis in the format shown in the table below. There should be one record for each travel episode for each person. If no unique linkage between persons in the sample is known (i.e., no household connections), set the PERNO variable = 1, and the SAMPN variable as the unique identifier. Ideally, multiple days of input data will be used here, but a minimum of 1 day is required. Save the file as a tab-delimited text file. 2. Run the NCHRP_GPS_Data_Reduction tool and enter the inputs requested. These include the filepath to the trip input file described above, and the length of data col- lection for the trips in the filepath. For a one day survey enter “1.” 3. After pressing Enter, the program will then run for a short time and create three output files, trip_info.xls, tour_info. xls, and person_info.xls. The tour_info and trip_info files show the trips/tours identified by the algorithm, while the person_info.xls file contains the estimated travel/tour characteristics for the individuals in the sample. This file forms the basis for further analysis. 4. From the original input file resulting from the Trip Impu- tation process, identify the Home Location for each individual in the sample, which will include the coor- dinates of the home location. Using suitable GIS soft- ware, create a Shapefile of the sample home locations and perform an overlay analysis with Census Tract Shapefiles from TIGER Line or other sources to iden- tify the home Census Tract. This can be approximated without GIS by calculating straightline distances from the home location coordinates to Census Tract centroids and assigning the sampled individual to the nearest tract. Add the home census tract for each individual to the person_info.xls file. 5. For each census tract in the study area, create the follow- ing variables using the Census Transportation Planning Package data: Data Requirements for Tour Identification Variable Data Type Descripon SAMPN Integer Unique Idenfier of Person (or Household if HH level analysis) PERNO Integer Idenfier of Person in Household (if HH level analysis), otherwise set to 1 PLANO Integer Idenfier of Acvity, unique within SAMPN PERNO combinaon LOCATION_TYPE String Required Locaon types: 'Home, Work, School, Other' LOCATION_ID Integer Locaon idenfier unique within SAMPN PERNO combinaon MODE Integer 1 10 Walk=1, Bike, Drive, Pass, Transit, Paratransit, Taxi, School bus, Carpool, Other TRPDUR Integer Trip duraon in minutes ACTDUR Integer Acvity at trip end duraon in minutes Variable Descripon transituse % of residents in tract using transit road_density length of roads in CT / area (miles/sq mile) intersecon_density intersecons / area (#/sq mile) block_size avg block size (road density/intersecon density) employment_density employees per sq mile pop_density populaon per sq mile housing_density housing units per sq mile 6. Join the land use variables to the person_info.xls file using the Census Tract ID. 7. First, apply the four education models conditional on the work status to the sample (full-time worker, part- time worker, retiree, and other – students and children are excluded). This gives the conditional log-sum values to be used in the upper-level work status model. To cal- culate the logsums for each work-status category, follow this procedure: a. Rename person_info.xls to person_info.dat. b. Select this file as the input file in the biogeme GUI. c. Select the “educ_<workstatus>_sim.mod” file for the workstatus for which the IV is being estimated, as the “model” input in BIOGEME GUI. Mod files are included in the Report appendix and on the GitHub repository. d. Alternatively, steps b and c can be input at once using BIOGEME command line. e. Press “Simulate” in BIOGEME. This will result in the creation of a *.enu file, which contains the utility esti- mates for each education alternative. f. Calculate the inclusive value (IV) to use in work status model from the utility estimates: IV_<workstatus> = ln ∑ieVi, where utility Vi is given in the *.enu file for each alternative. g. Append the education IV to the person_info.dat file for each sample. h. Repeat for each work status.

149 8. Next, apply the upper-level work status model using BIOGEME in the same manner as described in 7 a–e, using “wkstat_simulation_FINAL.mod” to estimate the work status utilities. 9. Calculate work status probabilities for each sample using the estimated utilities in the *.enu file and the MNL for- mula: Pi = eVi / ∑jeVj and choose a realization of the work status using simulation. Append the work status to the person_info.dat file as a new column. 10. Estimate the education status conditional on work status as follows: a. Split the person_info.dat file into four temporary files based on work status (ignore child and student work status in this step. b. Repeat steps 7 b–e to estimate the utilities for each education alternative. c. Follow step 11 using the *.enu file in the previous step to estimate probabilities for each educational status and choose a realization. Append to the temporary data file. d. Education status values have been selected for each work-status data file; append the results back to the person_info.dat main input file. At this point the work status and education status conditional on the work status have been fully specified for each individual in the sample. 11. Estimate the gender of the sample using the process described in 7 a–e using the “gender_ALL_simulation. mod” BIOGEME model file. Generate the gender for the sample using the process described in 11 with the result- ing *.enu file. 12. The person age model is an ordered logit model devel- oped using the NLOGIT software. The probabilities for age categories can be estimated in Excel as follows: a. Add a column to the “person_info.dat” file called utility. b. Calculate the utility by multiplying all of the estimated coefficients shown in the final report by the appropri- ate columns and summing the results. c. Add five columns called cumulative probability to calculate the probability of each category. d. Calculate the cumulative probability of each category (0–4) using the formula Pi = eµi-V / (eµi-V + 1), where µ0 = 0 and µ4 = ∞, the remaining µ-values are as shown as shown in “Ordinal Logit Model for Age Categories” table in Appendix E. e. Finally, calculate the probabilities for each category by subtracting the cumulative probability of the previous category (except for category 0, where the cumulative probability equals the category probability). 13. Estimate the possession of a driver’s license for the indi- vidual using the WEKA software. a. Download “license_model.model” from the code repository. b. Save a copy of the “person_info.dat” file as a csv file (using Excel or other file converter). c. Open WEKA (if the “Weka GUI Chooser” window appears, select “Explorer”). d. Under the “Preprocess” tab click “Open file . . . ” and select the csv file. e. In the lower left corner labeled “Attributes,” select all variables not in the list in the table below, then click the “Remove” button. It is critical that the remaining attributes shown in the window match EXACTLY the variables shown in the table, including variable names, otherwise the simulation will fail. If no “LIC” (license) variable exists, create the column and set all values equal to “NO” in the .DAT file and repeat steps a–c. f. Click the “Save . . . “ button at the top of the form and save as an ARFF file. g. Navigate to the “Classify” tab, and click the “Set..” but- ton next to “Supplied test set” and choose the file saved in the previous step. h. Right click in the “Result list” area and select the “license_model.model” file previously downloaded. i. Right click on the model that was loaded in the “Result list” area and select “Re-evaluate model on current test set.” j. Right click again on the model and select “Visualize Classifier Errors”; when the window appears, click “Save.” The saved file will then contain all of the attri- butes as well as the column “predictedLIC,” which con- tains the license prediction. This column can then be joined back to the original “person_info.dat” file. 14. The process for running the Household-type joint model is nearly identical to the process for the license model. Dif- ferences in the steps are listed based on the corresponding letter. a. Download “household_type_j48.model” from the code repository. e. Create the “HHTYPE” column if necessary and default values to “H_100.” h. Select “household_type_j48.model.” j. Save the file that will contain the “predictedHHTYPE” column. Create new columns in the original “per- son_info.dat” file for the household size, number of vehicles, and presence of children, then populate the columns using the first, second, and third digit after the “H_” in the “predictedHHTYPE” column.

Abbreviations and acronyms used without definitions in TRB publications: A4A Airlines for America AAAE American Association of Airport Executives AASHO American Association of State Highway Officials AASHTO American Association of State Highway and Transportation Officials ACI–NA Airports Council International–North America ACRP Airport Cooperative Research Program ADA Americans with Disabilities Act APTA American Public Transportation Association ASCE American Society of Civil Engineers ASME American Society of Mechanical Engineers ASTM American Society for Testing and Materials ATA American Trucking Associations CTAA Community Transportation Association of America CTBSSP Commercial Truck and Bus Safety Synthesis Program DHS Department of Homeland Security DOE Department of Energy EPA Environmental Protection Agency FAA Federal Aviation Administration FHWA Federal Highway Administration FMCSA Federal Motor Carrier Safety Administration FRA Federal Railroad Administration FTA Federal Transit Administration HMCRP Hazardous Materials Cooperative Research Program IEEE Institute of Electrical and Electronics Engineers ISTEA Intermodal Surface Transportation Efficiency Act of 1991 ITE Institute of Transportation Engineers MAP-21 Moving Ahead for Progress in the 21st Century Act (2012) NASA National Aeronautics and Space Administration NASAO National Association of State Aviation Officials NCFRP National Cooperative Freight Research Program NCHRP National Cooperative Highway Research Program NHTSA National Highway Traffic Safety Administration NTSB National Transportation Safety Board PHMSA Pipeline and Hazardous Materials Safety Administration RITA Research and Innovative Technology Administration SAE Society of Automotive Engineers SAFETEA-LU Safe, Accountable, Flexible, Efficient Transportation Equity Act: A Legacy for Users (2005) TCRP Transit Cooperative Research Program TEA-21 Transportation Equity Act for the 21st Century (1998) TRB Transportation Research Board TSA Transportation Security Administration U.S.DOT United States Department of Transportation

Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests Get This Book
×
 Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests
Buy Paperback | $75.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s National Cooperative Highway Research Program (NCHRP) Report 775: Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests describes the research process that was used to develop guidelines on the use of multiple sources of Global Positioning System (GPS) data to understand travel behavior and activity. The guidelines, which are included in NCHRP Report 775, Volume II are intended to provide a jump-start for processing GPS data for travel behavior purposes and provide key information elements that practitioners should consider when using GPS data.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!