Read "Designing the Archive for SHRP 2 Reliability and Reliability-Related Data" at NAP.edu

« Previous: Chapter 4 - System and User Needs and Requirements

Page 49

Suggested Citation:"Chapter 5 - Artifact Upload." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.

Page 50

Page 51

Page 52

Page 53

Page 54

Page 55

Page 56

Page 57

Page 58

Page 59

Page 60

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

49 C h a p t e r 5 5.1 types of artifacts archived The Archive is designed to collect all project related data as shown below. 5.1.1 Data Sets â¢ Include, for example, traffic engineering data such as travel time, flow, and occupancy (an example data set is shown in Figure 5.1). â¢ Must be in comma-separated values (.csv) format. â¢ Can be used for visualization. â¢ Can be queried. â¢ Include metadata. (While metadata is needed for all files, it is especially important for data sets. Metadata must cor- rectly identify every column of data within the file and pre- cisely locate the geographical location where the data were collected. Otherwise, the data are less usable.) â¢ Require special and general metadata. â¢ Require a data dictionary. 5.1.2 NonâData Sets â¢ Include, for example, documents, computer codes, simula- tion models, spreadsheets, presentations (see Figure 5.2). â¢ When documents, must be in .pdf format. â¢ Require only general metadata. â¢ Are not for visualization. â¢ Include Excel spreadsheets. â¢ Require data dictionaries only for Excel spreadsheets. 5.2 artifact Ingestion process An artifact passes through different steps from the time it is being collected from the researchers until it becomes avail- able to the Archive user. Figure 5.3 summarizes these steps. 5.2.1 Step 1. Artifact/Metadata Collection In this step, the person responsible for uploading collects proj- ect artifacts provided by researchers, along with their associ- ated metadata and data dictionary (if needed). For data sets, providing a data dictionary is mandatory. A standard data dictionary template has been developed to help the researcher provide required information related to data sets. 5.2.2 Step 2. Preparation Each artifact needs to meet some basic requirements before being uploaded into the Archive. There are two types of requirements: general and specific. The general requirements are common for all types of artifacts. For instance, file size should not exceed 1 GB. Specific requirements apply only to data sets and are mandated by database and visualization tool constraints. They are checked by the system during the upload process. The creator of the artifact and the person uploading the artifact are responsible for making sure that the upload criteria are met (e.g., number of columns, column type, column name, location information, data/time format- ting). The Archive rejects any artifact that fails to meet the requirements. Section 5.6 provides detailed information on the artifact preparation process. 5.2.3 Step 3. Upload Using the upload wizard interface (Figure 5.4) PIs, creators, or administrators can upload artifacts and provide metadata information. The interface guides the user through all the upload steps. In this step, the system asks the user to provide appropriate metadata information after the user uploads the file. Some of the metadata fields are mandatory (see Figure 5.5). If the user is uploading a data set or an Excel spreadsheet file, a data dictionary needs to be attached as well. Also, the Artifact Upload

50 user can define column type and modify column labels for data sets. The system produces an error message if this is not completed properly and does not proceed to the next step. For more information, the user may review Chapter 6 or the online Help section. 5.2.4 Step 4. Back-End Processing On completion of the upload task, the administrator and user receive an e-mail confirming the upload. Then the uploaded artifact appears on the administratorâs workflow page under the âArtifact to Be Processedâ list. The administrator then needs to review the artifact and accept or reject the upload. Administrator credentials are required to access the page. The artifact is then processed internally in the back end. This postupload process is called back-end processing, in which the artifacts get prepared to support search and visu- alization features. For security reasons, the administrator needs to approve any further processing of an artifact in the back end by clicking on the âProcessâ link. This intermediate step gives the administrator the ability to check the artifact Figure 5.1. Data set example. Figure 5.2. Nonâdata set example.

51 Figure 5.3. Artifact ingestion process. Figure 5.4. Upload wizard.

52 content. The postupload workflow process consists of the following steps: 5.2.4.1 For Data Sets Step 4.1: Validation. The back end runs a checklist on the data set to make sure that it meets certain criteria that are mandated by database and visualization tool constraints. Step 4.2: Database upload. Once the data set passes the vali- dation phase, the system starts uploading the fields of the data set into a database table. Step 4.3: Database indexing. In this phase, the system indexes each column to enable the use of queries. Step 4.4: Metadata keyword indexing. The system indexes the metadata text for keyword search. 5.2.4.2 For NonâData Sets Step 4.1: Keyword indexing of the content. In this step, the sys- tem indexes the text content of the artifact to support the full-text search feature. Step 4.4: Metadata keyword indexing. The system indexes the metadata text for keyword search. After successful completion of Step 4, the artifact along with its metadata will show up on the Archive webpage. 5.3 Data Dictionary A data dictionary is a companion document that describes the data stored in the data set. It is a user guide about the data set file. It should contain the following information: â¢ Data collection methodology; â¢ Data processing techniques that were applied; â¢ Column headings for the data set; â¢ Units of measurements for each column; â¢ Any other relevant information about the data in each column; and â¢ Acknowledgment to the people who contributed to creat- ing the data set, such as the road authority that owns the Figure 5.5. Metadata page.

53 vehicle detector or individuals/organizations that helped process the data. Submission of a data dictionary is mandatory along with any data set and Excel spreadsheet. 5.4 Metadata The most common definition of metadata is âdata about data.â Metadata describes the original data. Metadata in the SHRP 2 Reliability Archive provides information about the artifacts, including title, description, file size, type of artifact, how the data were collected (data sets only), and much more. Metadata is used throughout the Archive to describe vari- ous objects as follows: â¢ Overall site; â¢ Focus area; â¢ Projects; â¢ Users; and â¢ Artifacts. 5.4.1 Metadata Relating to the Overall Site Metadata is used to describe both the structure of the Archive and the artifacts stored within it. Figure 5.6 shows the hierarchical structure of the Archive. The design of the site is flexible and more folders can be added later under the âFocus Areaâ category, if needed. Descriptive metadata was attached to each of the site elementsâsite, focus area, project, artifacts, and user (collaboration)âand this metadata is of critical impor- tance. As part of the metadata scheme design, the L13A team defined element sets, lists of metadata attributes, and relationships that apply to each site element. Attributes are descriptive elements such as title, abstract, and artifact type. Relationships are links between archive elements, such as the link between an artifact and its creator. For each element in an element set, the project team deter- mined the controlled vocabulary, cardinality (1:1, 1:many, many:1, or many:many), generator (system versus user), whether the element is user-editable, and whether each element is mandatory or optional. Mandatory metadata must be filled in to complete the artifact submission process, while optional metadata can be left blank. Any mandatory or optional meta- data that is editable can be updated by a creator; the administra- tor can later correct any errors or add in missing information. User-generated metadata must be completed by a user (typi- cally the creator); system-generated metadata will be generated by the Archive system, typically by scanning the submitted arti- fact for embedded metadata. Controlled vocabularies (e.g., a list Figure 5.6. Archive site structure.

54 of state names to select from) and encoding schemes (e.g., YYYY-MM-DD format) prevent unintended metadata entry errors and help ensure that artifacts can be found by users using the search functionality of the system. Site, focus area, and project metadata was entered into the system by the L13A project team administrator as part of the Archive software development process. Before the Archive sys- tem went live, the SHRP 2 team reviewed the site, focus area, and project metadata. Comments and requested changes were submitted to the L13A team. Similarly, project metadata was reviewed by the relevant team, and comments with any requested changes were submitted to the L13A team. The L13A team responded to the comments and made final changes to the site. 5.4.2 Site Metadata Table 5.1 provides a brief summary of the site metadata elements. 5.4.3 Focus Area Metadata Table 5.2 provides a brief summary of the focus area meta- data elements. 5.4.4 Project Metadata Table 5.3 provides a brief summary of the project metadata elements. 5.4.5 User Metadata User metadata is handled within each userâs account profile. The only required elements of a user profile are a userID, password, e-mail, and display name. All other elements of a user profile are optional or set by the site administrator (e.g., site roles). Table 5.4 provides a brief summary of the user metadata elements. Table 5.1. Site Metadata Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Title Attribute Yes Yes No Administrator Text None Description Attribute No Yes No Administrator Text None URL Attribute Yes No No Administrator URL None Child focus areas Relationship Yes No Yes Administrator Focus area Administrator(s) Relationship Yes Yes Yes Administrator User Creator Relationship Yes No No System-generated User Viewer(s)âregistered user PI Relationship Yes Yes Yes Administrator User Viewer(s)âregistered user Relationship Yes Yes Yes System-generated User Viewer(s)âguest user Relationship Yes Yes Yes System-generated User Table 5.2. Focus Area Metadata Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Name Attribute Yes Yes No Administrator Text Yes Description Attribute No Yes No Administrator Text No Date created Attribute Yes No No System-generated Date-time Yes Date modified Attribute Yes No No System-generated Date-time Yes Child project(s) Relationship Yes Yes Yes Administrator Project Administrator(s) Relationship Yes Yes Yes Administrator User Creator Relationship Yes No No System-generated User Viewer(s)âregistered user PI Relationship Yes Yes Yes Administrator User Viewer(s)âregistered user Relationship Yes Yes Yes System-generated User Viewer(s)âguest user Relationship Yes Yes Yes System-generated User

55 Table 5.3. Project Metadata Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Name Attribute Yes Yes No Administrator Text Yes Title Attribute Yes Yes No Administrator Text No Background Attribute No Yes No Administrator Text No Objectives Attribute No Yes No Administrator Text No Research agency Attribute No Yes No Administrator Text No Primary investigator Attribute No Yes No Administrator Text No Keywords/tags Attribute No Yes Yes Administrator Text No Date created Attribute Yes No No System-generated Date-time Yes Date modified Attribute Yes No No System-generated Date-time Yes Parent site Relationship Yes Yes No Administrator Site Parent focus area Relationship Yes Yes No Administrator Focus area Child artifact(s) Relationship Yes No Yes Administrator Artifact Administrator(s) Relationship Yes Yes Yes Administrator User Creator Relationship Yes No No System-generated User Viewer(s)âregistered user PI Relationship Yes Yes Yes Administrator User Viewer(s)âregistered user Relationship Yes Yes Yes System-generated User Viewer(s)âguest user Relationship Yes Yes Yes System-generated User Table 5.4. User Metadata Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Username Attribute Yes Yes No Registered user Text Yes Password Attribute Yes Yes No Registered user Text Yes Display name Attribute Yes Yes No Registered user Text No E-mail address Attribute Yes Yes No Registered user E-mail Yes First name Attribute No Yesb No Registered user Text No Last name Attribute No Yesb No Registered user Text No Biographical information Attribute No Yes No Registered user Text No Site role Attribute Yes Yes No Administrator Role Yes Submitted artifact(s) Relationship Yesa No Yes System-generated Artifact User comment(s) Relationship Yesa Yes Yes System-generated Text No a If applicable. b Editable only by administrator. 5.4.6 Artifact Metadata Artifact metadata is fundamental to the Archiveâs search func- tion and is the supporting information users rely on to deter- mine the applicability and utility of a data set to their needs. Therefore, the quality of this metadata is also very important. However, artifact metadata quality is subject to an important trade-off between producers and consumers. On the one hand, consumers (the Archive users) demand complete and accurate descriptions of each artifact. On the other hand, pro- ducers (the users submitting data) have a limited amount of time and resources to devote to metadata gathering and sub- mission. Asking for, or requiring, too much metadata may cause producers to rebel, either by entering poor quality

56 metadata or by abandoning the process altogether (specifi- cally for user-submitted artifacts). Asking for, or requiring, too little metadata will not provide enough information to help consumers find the data they want. The list of requested metadata and the metadata submission method represent a compromise between these two competing interests. The artifact submission UI collects a limited amount of metadata and any remaining metadata is uploaded as a sepa- rate document (e.g., a data dictionary or user guide). In this way, the metadata burden is minimized for both submitters and administrators, while still providing valuable informa- tion for users of the system. Requested metadata, both mandatory and optional, depends on the artifact type. The artifacts that have beenâor will beâgenerated by Reliability projects are categorized into two types for the purposes of this archive: 1. Data sets. These are structured data sets in .csv file format. 2. Everything else. Documents fall within this category. Table 5.5 provides a brief summary of the elements in each set; Table 5.6 summarizes the additional metadata require- ments for data sets. Table 5.5. General Metadata Element Set Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Filename Attribute Yes No No System-generated Text No Title Attribute Yes Yes No Creator Text No Abstract/description Attribute Yes Yes No Creator Text No ID Attribute No Yes No System-generated URL No Primary SHRP 2 artifact? Attribute Yes Yes No Creator Text Yes Focus area Relationship Yes Yes No Creator Focus area Yes Project Relationship Yes Yes No Creator Project Yes Artifact type Attribute Yes Yes No Creator Text Yes Artifact relation Relationship No Yes Yes Creator Text No Location(s) Attribute No Yes Yes (up to 10) Creator City, state Yes Latitude, longitude Attribute No Yes Yes (up to 10) System-generated Decimal degrees Yes Year range Attribute No Yes No Creator YYYYâ YYYY Yes Date uploaded Attribute Yes No No System-generated Date-time Yes Date last modified Attribute Yes No No System-generated Date-time Yes File format Attribute Yes No No System-generated Text Yes File size Attribute Yes No No System-generated Number No Number of downloads Attribute Yes No No System-generated Number No Related comments Relationship Yes No No System-generated Comment Yes Review status Attribute Yes Yes No System-generated Number Yes Workflow status Attribute Yes No No System-generated Number Yes Validation status Attribute Yes No No System-generated Number Yes Indexing status Attribute Yes No No System-generated Number Yes Administrator(s) Relationship Yes Yes Yes Creator User Yes Creator Relationship Yes No No System-generated User Yes Viewer(s)âregistered user PI Relationship No Yes Yes Creator User Yes Viewer(s)âregistered user Relationship Yes Yes Yes System-generated User Yes Viewer(s)âguest user Relationship Yes Yes Yes System-generated User Yes

57 5.5 artifact relationships An important feature of the Archive is its ability to relate one artifact to another, providing the user with links to other arti- facts that may be of interest. Ideally, links between artifacts would be bidirectional. For example, if Artifact A is related to Artifact B (i.e., there is a link to Artifact B in Artifact Aâs page), then Artifact B also contains a related link back to Artifact A. The database software does not do this automatically, instead relying on PIs or creators to keep track of link relationships. 5.6 preparing artifacts for Upload Preparation is required for every artifact that is uploaded into the Archive (see Section 5.2). This section discusses the prep- aration work for data sets and nonâdata sets. 5.6.1 Data Sets 5.6.1.1 Need for Preparation A standardized format for data sets increases the usability of the research data by future users and maximizes the distribu- tion and impact of the research. In addition to downloading SHRP 2 Reliability artifacts, users of data sets will be able to visualize research data in a grid layout, as a graph, or on a map. They will be able to create filter queries that customize the data set to their needs and quickly preview it before downloading. Enabling these features in the Archive requires some prepa- ration of the data sets into a standardized format. The system will accept other types of data files (including spreadsheets), although the data will not be classified by the Archive as a data set. Visualization functionality is available only on data sets. 5.6.1.2 Data Set Preparation Checklist Table 5.7 shows the checklist for preparing data sets. 5.6.1.2.1 Data Set Size ReStRictionS Data sets should be less than 500 MB; however, the system will accept data sets as large as 1 GB. Data sets larger than 1 GB should be split into multiple files less than 1 GB each and uploaded separately. 5.6.1.2.2 Data Set FoRmat Data set files must be in comma-delimited (.csv) format. The first row should contain column names, and each row must contain the same number of fields (i.e., the same number of commas). It is recommended that the user prepare data sets using a spreadsheet program or database tool, then save as or export Table 5.6. Data Set Metadata Element Set Element Name Type Mandatory? Editable? Multiple? Generator Format Controlled Vocabulary? Column name Attribute Yes Yes No Creator Text No Column index Attribute Yes Yes No Creator Number Yes Column type Attribute Yes Yes No Creator Text Yes Column label Attribute Yes Yes No Creator Text No Latitude 1 column Attribute Yesa Yes No Creator Column Yes Longitude 1 column Attribute Yesa Yes No Creator Column Yes Latitude 2 column Attribute Yesa No No Creator Column Yes Longitude 2 column Attribute Yesa No No Creator Column Yes Data set dictionary No Yes No Creator URL No Data source(s) Attribute No Yes Yes Creator Text No Data type(s) Attribute No Yes Yes Creator Text Yes Corridor(s) Attribute No Yes Yes Creator Text No Collection technology(ies) Attribute No Yes Yes Creator Text Yes Collection frequency Attribute No Yes Yes Creator Text Yes Days of the week Attribute No Yes No Creator Text Yes Holidays included? Attribute No Yes No Creator Text Yes Note: These data are required in addition to the metadata requirements from Table 5.5. a If applicable.

58 to a .csv file (Figure 5.7). Larger files may require manipula- tion using a programming language. 5.6.1.2.3 column HeaDingS Headings must be between 1 and 80 characters long, unique, and contain only permitted characters, including a to z, A to Z, 0 to 9, dashes, spaces, and underscores. Example column headings may include â¢ Time; â¢ Date; â¢ Volume; â¢ Speed; â¢ Travel time; â¢ Station ID; â¢ Latitude; and â¢ Longitude. The user should try to avoid column names that match SQL reserved words. The system will insert underscore characters if the column name is an exact match. For example âselectâ is changed to â_select_â. The user should place latitude and longitude columns after the second column of the data set. The first two columns should not include latitude and longitude information. 5.6.1.2.4 Data type The user should be methodical about the type of data in each column. The system will process each column as one of the following data types: â¢ Text: a text string of any length such as âUS101â; â¢ Number: an integer (1, 2, 3) or real number (1.1, 1.2, 1.3); â¢ Date and time format (see below); or â¢ Latitude and longitude (see below). 5.6.1.2.5 Date anD time FoRmat Data sets that include date and/or time information must conform to one of the formats shown in Figure 5.8. âDate onlyâ and âtime onlyâ fields are preferred. Table 5.7. Data Set Preparation Checklist Data Set Size The size of the data set is: (please tick) q Less than 1GBâproceed to Step 2 q Greater than 1GBâcontact the Administrator Data Set Format Check that the data set meets the following conditions: q CSV formatâsee extra information q Has at least 1 column of data q Has less than 60 columns of data q Each row of data has the same number of columns (i.e., same number of commas) q The first row contains header names q Each column has at least one non-null field Column Headings Check that each heading name: q Is between 1 and 80 characters long q Is unique q Contains only permitted characters: aâz, AâZ, 0â9, dash, space and underscore Data Type Check that each column of data conforms to the requirements for that data type: Text fields? Number fields? Date & time fields? Data collection points? q Contains any character (e.g., US101) q Contains only number characters, a minus sign or a period (e.g., -1.1) q Must be in a permitted date/time formatâsee extra information q Must be accompanied by location coordinates q Latitude and longitude coordinates must be in decimal format Example spreadsheet file Example .csv format Date_time,Volume,Travel time,Latitude,Longitude 4/25/2011 17:00,7,248.6,39.311977,- 120.495774 4/25/2011 17:00,57,37.5,39.311977,- 120.495774 4/25/2011 17:00,5,204,39.311977,- 120.495774 4/25/2011 17:00,71,4.1,39.329142,- 120.292934 4/25/2011 17:00,9,261.4,39.329142,- 120.292934 Figure 5.7. Examples of file formats.

59 5.6.1.2.6 latituDe anD longituDe FoRmat Latitude and longitude are used to identify the geographical location of data collection points, such as detector stations. Latitude and longitude values should be in decimal format. The latitude values range from -90 to 90; north is positive and south is negative. The longitude values range from -180 to 180; east is positive and west is negative. Example: 37.8716667, â122.2716667 5.6.1.3 Common Errors When Preparing Data Sets The Archive has been designed to allow future users to visual- ize data sets as grids or graphs and on maps. As such, data should be formatted using the specified convention; other- wise, the data set may not successfully pass through valida- tion or may not be compatible with the visualization features. Solutions to common errors are shown below. 5.6.1.3.1 incoRRect Data type â¢ Number columns should contain integers or real numbers only. â¢ Latitude and longitude should be in decimal format. The following formats will be processed as text or cause a vali- dation error: 41 25 01N, 41Â°25â²01â³N. â¢ Date and time should be in the specified format (see Figure 5.8). 5.6.1.3.2 miSSing ValueS â¢ Missing numbers or latitude/longitude values are substi- tuted with a zero. â¢ Missing text values are replaced by no character. â¢ Missing date/time values will cause an error for the entire column. 5.6.2 NonâData Sets 5.6.2.1 Artifact Size Restrictions Nonâdata sets must be less than 1 GB. Nonâdata sets larger than 1 GB should be split into multiple artifacts less than 1 GB each and uploaded separately. 5.6.2.2 File Format Reports and documents should be in .pdf format. Many other file types are accepted. The complete list (in alphabetical order) is as follows: 7z, asc, asf, asx, avi, bmp, c, cc, class, co, css, csv, divx, doc, docm, docx, dotm, dotx, exe, flv, gif, gz, gzip, h, htm, html, ics, jpe, jpeg, jpg, js, m4a, m4b, m4v, mdb, mid, midi, mka, mkv, mov, mp3, mp4, mpe, mpeg, mpg, mpp, odb, odc, odf, odg, odp, ods, odt, oga, ogg, ogv, onepkg, onetmp, onetoc, onetoc2, pdf, png, pot, potm, potx, ppam, pps, ppsm, ppsx, ppt, pptm, pptx, qt, ra, ram, rar, rtf, rtx, sldm, sldx, swf, tar, tif, tiff, tsv, txt, wav, wax, wma, wmv, wmx, wp, wpd, wri, xla, xlam, xls, xlsb, xlsm, xlsx, xlt, xltm, xltx, xlw, zip. 5.7 Supplementary Documents to assist principal Investigators and Creators One of the challenges of gathering project deliverables from over 30 different projects, conducted by a similarly large number of PIs and consultants, is providing consistency in the experience of the Archiveâs users. For that purpose, the project team and SHRP 2 staff have drafted a series of docu- ments to help PIs complete their task and to foster an orga- nized and internally consistent archive. â¢ Data Dictionary Template. The project team produced the Data Dictionary Template (Appendix A) to help PIs get started on documenting their data set artifacts. This template helps ensure that the format and quality of the metadata are consistent between different data sets, PIs, and SHRP 2 projects. â¢ Archive Ingestion and Visualization Guide. This document was drafted by the project team to provide PIs and Archive users with instructions for uploading artifacts and visualiz- ing data sets. The document contains valuable information regarding the Archiveâs limitations and the correct column formatting to be used when uploading data sets. The user Date only columns â¢yyyy/MM/dd â¢yyyy-MM-dd â¢MM/dd/yyyy â¢MM-dd-yyyy â¢yyyyMMdd Time only columns â¢HH:mm:ss â¢HH:mm Date & time columns â¢MM/dd/yy HH:mm â¢MM-dd-yy HH:mm â¢MM/dd/yy HH:mm:ss â¢MM-dd-yy HH:mm:ss â¢MM/dd/yyyy HH:mm â¢MM-dd-yyyy HH:mm â¢MM/dd/yyyy HH:mm:ss â¢MM-dd-yyyy HH:mm:ss â¢yyyy-MM-dd HH:mm:ss â¢yyyy/MM/dd HH:mm:ss Figure 5.8. Date and time formats.

60 guide, in the form of an online Help section, is available on the Archive (http://shrp2archive.org/?page_id=155). â¢ SHRP 2 Policy on Software Version Control for Reliability- Related Projects. SHRP 2 created this policy document to establish a convention for version control pertaining to software developed by contractors within the Reliability focus area. In summary, the document requires that all software contain a descriptive label unambiguously identi- fying the version of the software. For example, the software convention is as follows: SHRP 2_Softwarename_LXXA_ Contractor_Vn.n_dd/mm/yyyy. In addition, a label that is visible and easily read on opening shall be included with software and spreadsheets. 5.8 Quality assurance after Uploading artifacts The team has developed checklists for ensuring adherence to upload requirements and other SHRP 2 guidelines, with the intent of creating an archive with high-quality, consistent, and well-documented artifacts. These checklists focus on list- ing the steps necessary to conduct a quality check for each type of artifact (data sets and nonâdata sets) after completion of the upload process. 5.8.1 Checklist for Data Sets Data sets are the most time-consuming artifact type to upload. They are typically large files with many rows and columns and must be carefully formatted to be correctly processed by the Archive. Once the data set has been satisfactorily processed, the following checks are performed: 1. After downloading and opening the data dictionary PDF file, does the file adhere to the data dictionary template? Does it accurately describe the data set artifact? 2. If the data set contains a data collection stationing column (e.g., âLoop Count Station #â), is there an adjacent col- umn with a related artifact that provides the geographical location of stations (i.e., a stations configuration file) or latitude/longitude coordinates? 3. On the artifactâs Data tab, does the grid content load cor- rectly? While it may take a few seconds to load, it should not hang for minutes. 4. If a time and date field is available, does the data grid update readily after filtering? 5. Switching to the Graph subtab and adding an x-axis and y-axis field, does the graph plot the first 300 points appropriately? 6. If the artifact contains latitude and longitude information, after navigating to the Map subtab and using the controls on the Build Map pane to plot latitude and longitude, does the map plot the first 300 points appropriately? 5.8.2 Checklist for NonâData Sets 5.8.2.1 Documents and Presentations Microsoft Word, Adobe PDF, and Microsoft PowerPoint files are the quickest items to upload to the Archive. They do not require detailed metadata and can be uploaded in the format provided by the PI. Accordingly, the quality check for this type of artifact is quick: 1. Is the document or presentation a final deliverable? The Archive should only be populated with final versions. 2. After uploading and processing the artifact, then down- loading and opening the document or presentation from the Archive, does the file open OK? 5.8.2.2 Spreadsheets and Computer Code The complexity in uploading and processing spreadsheets and computer code lies between documents/presentations and data sets. Although considered nonâdata sets, spreadsheets and computer code typically benefit from documentation in the form of metadata, a user guide, or a data dictionary file. The following checks are performed for this type of artifact: 1. Is computer code named in accordance with the software version control policy? If the spreadsheet is a computa- tional tool, it should also be named in accordance with this policy. 2. Does the computational spreadsheet have a readily visible label in accordance with the software version control pol- icy? For software, does it contain a âreadmeâ text file with the label in the main folder directory? 3. Does the computational spreadsheet have secure macros? If so, they must be unlocked before uploading to the Archive to avoid indexing errors. Metadata and data dictionaries shall be included with all spreadsheets. However, the process for uploading data dic- tionaries for spreadsheets and computer code is different than that for data sets. While the data set specifically requires a data dictionary when uploaded, for spreadsheets the data dictionary must be uploaded as a separate, nonâdata set arti- fact and related to the spreadsheet or computer code using the Related Artifacts tool. A user guide for a software artifact can be provided in a similar manner.

Next: Chapter 6 - User Guide Working with the Archive »

Designing the Archive for SHRP 2 Reliability and Reliability-Related Data (2014)

Chapter: Chapter 5 - Artifact Upload

Welcome to OpenBook!

Get Email Updates