National Academies Press: OpenBook

The Transportation Research Thesaurus: Capabilities and Enhancements (2018)

Chapter: Chapter 3 - Assessment of the Current TRT

« Previous: Chapter 2 - The Transportation Research Thesaurus Historical and Current Context
Page 18
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 18
Page 19
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 19
Page 20
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 20
Page 21
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 21
Page 22
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 22
Page 23
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 23
Page 24
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 24
Page 25
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 25
Page 26
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 26
Page 27
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 27
Page 28
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 28
Page 29
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 29
Page 30
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 30
Page 31
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 31
Page 32
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 32
Page 33
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 33
Page 34
Suggested Citation:"Chapter 3 - Assessment of the Current TRT." National Academies of Sciences, Engineering, and Medicine. 2018. The Transportation Research Thesaurus: Capabilities and Enhancements. Washington, DC: The National Academies Press. doi: 10.17226/25087.
×
Page 34

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

18 This chapter describes and assesses the current state of TRT content. In a well-formed the- saurus, the expectation is to find terms that align with good practices and are, at a minimum, well suited to indexing and searching. Another expectation is to find terms that are at various stages of approval and of active or archived status and to find a full set of thesaurus relationships defined and applied in a way that reflects their use and relevance to the subject domain. Thesaurus content also includes the metadata that are required to manage thesaurus terms throughout their lifecycle, including source and origins (S&Os), definitions, scope notes (SNs), unique iden- tification numbers, notations of changes in status, and so on. In a multidisciplinary field like transportation, a well-formed, overarching structure is also expected in a well-formed thesaurus. TRT Content, Management, and Maintenance Using the reconstructed thesaurus as the main data source, the research team assessed the TRT content using 46 criteria. The criteria ranged from the number of terms per facet, to the numbers and types of relationships between terms, and the treatment of various types of terms such as homographs and adjectives. Particular attention was paid to the non-standard Related Term (Hierarchical) relationship, which is unique to the TRT and is intended to support the display of the terms. The occurrence of term-related metadata such as definitions and SNs was assessed, including metadata that support the management of the lifecycle of a term from rec- ommendation through deprecation. Through this detailed assessment, the research team also identified possible errors in the content. Finally, the research team analyzed the overall structure of the TRT, focusing on the balance of terms by level across the facets. Thesaurus Terms The total number of terms is identified as enumerated terms versus non-enumerated terms (see Table 3-1). Enumerated terms have codes that place them into the browse display hierarchy. The non-enumerated terms are non-preferred terms. The number of non-preferred terms varies across facets. A common benchmark for non-preferred terms in a thesaurus used for information discovery, automated classification, and automated indexing is between 15% and 20%. This means that 15% to 20% of the non-enumerated terms are true synonyms, quasi- synonyms, lexical variants, language and spelling variants, common misspellings, initialisms, acronyms, or abbreviations. From a numbers perspective, many of the facets in the TRT appear to achieve this benchmark while others fall short (e.g., G [Testing], N [Organizations], T [Disciplines], U [Mathematics], V [Areas and Regions], and W [Time]). This is discussed in more detail in the next section, Thesaurus Relationships. C h a p t e r 3 Assessment of the Current TRT

assessment of the Current trt 19 The assessment also found a high incidence of qualified terms. Parenthetical qualifiers are used to provide context and distinction for terms with a meaning that may be ambiguous to users. This is particularly prevalent in the TRT to disambiguate homographs, terms with the same spelling but different meanings. An example would be Accelerators (Concrete), Accelera- tors (Materials), and Accelerators (Devices). The analysis found fairly heavy use of parenthetical qualifiers across the facets, although they are most heavily concentrated in Facets P [Facilities], Q [Vehicles and Equipment], R [Materials], S [Physical Phenomena], and T [Disciplines]. The preferred practice is to use a qualifying adjective and noun to represent the concept and to link the concept using thesaurus relationships rather than use parenthetical qualifiers. For example, the more standard thesaurus record would be the following: Concrete accelerators UF: Accelerators (Concrete) BT: Materials accelerators Another area of concern raised by this portion of the assessment is the use of container or struc- tural terms. Structural terms are terms that represent empty classes in the structure of a thesaurus. They are designed to function as an entry point for a further breakdown by characteristic within a hierarchy. In a standard thesaurus structure, relationships are defined to represent such classes. For example, “xxx by type” would be defined as Narrower Term Type and applied as a formal relation- ship between the main term and the terms that represent types. The TRT design does not allow the management team to develop these types of relationships. During the conversion of the work- ing copy, the research team found a significant number of structural terms used across the facets, with particular concentrations in Facets P [Facilities], Q [Vehicles and Equipment], R [Materials], S [Physical Phenomena], D [Communication and Control], and U [Mathematics]. The research team also found instances of stand-alone adjectives and adverbs (e.g., present, daily, weekly, etc.). These instances appear to be particularly prevalent in Facet V [Areas and Regions]. The use of stand-alone adjectives as search or indexing terms in repositories other than the TRB’s will produce unintended and potentially irrelevant results and is not considered best thesaurus practice. The specificity of terms varied by facet, with the largest number of specific terms in the facets with the most terms. This is not surprising, but does result in an uneven, unbalanced thesaurus. Thesaurus Relationships Hierarchical relationships define links between broader and narrower terms. In a standard thesaurus, these relationships are represented as BT/NT. They may also be defined at a more granular level to represent Whole/Part, Concept/Type, or Concept/Instance relationships. No formal hierarchical relationships are specified in the TRT. Rather, a hierarchical struc- ture and display is automatically constructed using the alphabetical enumeration, as previously shown in Figure 1-2. In the construction of the working thesaurus, the research team recon- structed the enumeration structure into formal BT/NT relationships. Explicit hierarchical rela- tionships are important if the TRT is to be used in any context beyond TRID and other related TRB resources that now utilize the TRT. In thesaurus standards, associative or related term relationships link terms that are related semantically or conceptually, although not in an equivalent or hierarchical way. Associative Total Terms All Facets Enumerated Terms Non-Enumerated Terms 12,125 9,591 2,534 Table 3-1. Total enumerated and non-enumerated terms in the TRT.

20 the transportation research thesaurus: Capabilities and enhancements links may include, but are not limited to: (1) sibling terms that are loosely associated and sometimes interchangeable; (2) terms that are linked by familial or derivational relationships; (3) disciplines or fields of study and the objects studied or its practitioners; (4) operations, processes, and their agents and instruments; (5) processes and their counteragents; (6) actions and their results or processes and their products; (7) actions and their targets; (8) objects or substances and their properties; (9) concepts and their causes; and (10) concepts and their units of measure. In the master version of the TRT, instances of associative relationships were found. However, these relationships are only discovered by viewing and capturing individual fully built out term records. In most semantically elaborated thesauri, one would expect to find that between 30% and 40% of the relationships defined for an individual term are associative relationships. Analysis of the working copy of the TRT showed that the number of associative relationships varied widely by facet, with some facets having no associative relationships and others having as high as 49% associative relationships. However, the analysis suggests that those terms that have a related term have only one related term. This does not achieve the benchmark of 30% to 40% of all the relationships defined, suggesting that the TRT has a low level of semantic relationing, a feature that is important for effective use in other information contexts. In constructing the working copy of the TRT, the research team discovered that a second type of associative relationship is defined. It was characterized as Related Term (Hierarchical). This relationship is undefined in thesaurus standards. Through further investigation, the research team determined that this relationship type represents sibling terms to the term referenced. They are the narrower terms of the referenced term’s parent. Figure 3-1 is an example of a term with Related Term (Hierarchical) relationships. Ocean currents Broader Term Hydrologic phenomena (Jbh) Narrower Terms Sinking (Oceanography) (Jbhms) Related Terms (Hierarchical) Aggressive waters (Jbha) Degradation (Hydrology) (Jbhb) Desiccation (Jbhc) Drainage (Jbhd) Floods (Jbhf) Hydrologic cycle (Jbhh) Streamflow (Jbhi) River currents (Jbhl) Peak discharge (Jbhn) Ponding (Jbho) Runoff (Jbhp) Sediment discharge (Jbhr) Seepage (Jbhs) Storm surges (Jbht) Upwelling (Jbhu) Water table (Jbhv) Water waves (Jbhw) Sea level (Jbhx) Phenomena of frozen water (Jbhy) Figure 3-1. Example of term with Related Term (Hierarchical) relationships.

assessment of the Current trt 21 Related Term (Hierarchical) relationships are not formally constructed between terms in the TRT, but are automatically generated and displayed for individual terms. The relationships are neither deliberately nor semantically defined for those terms that are siblings to the parent term. For example, “Aggressive waters (Jbha)” is a child term of “Hydrologic phenomena (Jbh),” and thus a sibling term to “Ocean currents (Jbhm).” Because the relationships are not deliberately created, the nature of the relationship between the terms is undefined, presenting design chal- lenges for using the TRT outside of TRB information repositories. For the term in Figure 3-1, “ocean currents,” the research team notes that no true Related Term (Associative) relationships have been created. Semantic richness would be achieved if there were true Related Term (Associative) relationships built for the term. For example, semantic richness would be achieved if the relationships for the term “ocean currents” were expanded to include NTs for “surface currents” and “deep water currents.” In addition, semantic richness would be improved if Related Term (Associative) relationships were added from “ocean cur- rents” to terms such as “measuring currents,” “ocean current impacts,” “weather,” “ocean going vessels,” “ocean tides,” and “ocean waves.” Finally, the team found some possible errors in the construction of relationships in the cur- rent TRT. Examples are documented in the Final Interim Report 1 and suggest that as part of any enhancement to the TRT, a thorough review of the relationships should be performed. This should be based on clear guidance and requires generation of a report for management that makes it easier to identify these issues. Metadata to Manage Terms Metadata for managing terms include definitions, SNs, and S&O fields. While definitions are included in the TRT, the practice is not consistent. Some of the smaller facets have a higher per- centage of definitions. Those facets that have a higher concentration of terms and whose terms are more granular and specific to transportation objects tend to have fewer definitions. SNs are intended to guide indexers in the use of thesaurus terms. They generally explain the ways in which the term might be interpreted and how it should be applied. The occurrence of SNs is very sparse throughout the TRT and consistently sparse across facets. In the detailed task to develop definitions for 50 current TRT terms, it became clear to the research team that the definitions are often used as SNs and, in fact, may be duplicated in the SN field for some terms. This means that the practice of SNs is inconsistently applied as well. No S&O fields are included in the current TRT, which makes it difficult to trace the provenance of a term in the future. Another key element in managing terms is the assignment of unique identifiers. While the files provided by the TRT management team each included a number that appeared to be a unique identifier for each term, through discussions the research team understands that the values do not serve that purpose. In most thesaurus management applications, a unique identifier is auto- matically generated by the system and assigned to the term. The unique identifier is a database- level control point that cannot be changed by an end user and is never reused at the database level. The unique identifier should be understood as a database-level control point rather than as a definable or changeable attribute of the thesaurus term. The research team’s understanding is that unique identifiers are not assigned to individual terms in the TRT database. Neither are unique URLs or digital identifiers assigned to individual thesaurus terms as digital object identi- fiers so that they can be globally persistent outside the TRT system. Treatment of Terms Throughout Their Lifecycle Lifecycle management of terms is supported by the inclusion of status fields and workflows to support candidate, provisional, and deprecated terms and a history/comment log that details

22 the transportation research thesaurus: Capabilities and enhancements what has occurred throughout a term’s lifecycle. These functions are not supported in the cur- rent TRT. This, and the fact that all dates were changed to the same date when the last conversion was performed, made it impossible to determine from the TRT itself the number of additions or changes to the TRT during a given period of time. Regarding deprecated terms, from the information provided by the TRT management team, the research team was able to observe a few examples of changes of term status. Where a term changed from a preferred to a non-preferred term (e.g., when a term shifted from enumerated to non-enumerated), the non-preferred term was still linked to old TRID records. The research team learned that deprecated terms had their status change to uncontrolled and were removed from the enumerated hierarchy. Their previous enumeration was assigned to the new preferred term. In interviews with indexers and transportation librarians, the research team learned that uncontrolled terms retain their links to bibliographic records in TRID. Uncon- trolled terms may be reported out as a list available to the TRT management team, but they are not explicitly manageable as non-preferred terms. Overall Thesaurus Structure A thesaurus that is composed of multiple facets should strive to achieve balanced coverage of the language of the domains it supports and serves. This issue is addressed as scope and cover- age in thesaurus standards. In practice, robust scope and coverage results from close working relationships with experts in the domain. In a complex field like transportation, where there are multiple modes with specific vocabularies, interdisciplinary factors, and emerging topics, a thesaurus structure is most effectively managed by an overarching facet or category structure. Within each facet, one would expect there to be a robust second level of subfacets within which are built out specific vocabularies of terms and their relationships. Balanced structures enable thesaurus managers to better assess whether a facet’s vocabulary is supporting the domain or area of practice. On average across all facets of the TRT, the ratio of Top Terms to Level 2 Terms is 1:8. This would be a good ratio if it were implemented consistently across facets. However, the second level of the current version of the TRT does not follow a predictable or consistent pattern. The smallest number of Level 2 divisions is found in Facet R [Materials], and the greatest number is found in Facet M [Persons and Personal Characterizations]. As noted earlier, the variant prac- tices observed are attributable to the strict hierarchical foundation of the TRT in contrast to the extended semantic relationing one would expect to find in a standards-compliant thesaurus. Each Level 2 represents a different browsing structure for end users. This is one of the reasons why stakeholders generally reported that they do not use the hierarchical browse structure of the TRT. Below Level 2, the research team found additional variance in the depth of the structures across facets (see Figure 3-2). These variances do not show good thesaurus practice and result in an unbalanced structure across the TRT. Findings In general, the descriptive information suggests that current TRT content is strong in some areas, but there are clear weaknesses in others. The weaknesses appear to be an un-intended con- sequence of the rigid hierarchical enumeration that is used as a backbone structure. The research team understands that there are constraints to changing this structure that are associated with

assessment of the Current trt 23 its design as part of TRID. Where these constraints could be mitigated there is an opportunity to enhance the TRT. There are important opportunities for improvement in the TRT relation- ships used in the TRT, but again there are constraints imposed by the integrated TRID architec- ture. Similar opportunities for improvement were found for the metadata for managing terms, the management of terms through their lifecycles, and for the overall structure of the thesaurus. Access and Use Dimension B addresses the various means of accessing and using the TRT content. Failing to consider different kinds of uses can lead to suboptimal and constrained design, which limits the products, services, growth, and potential long-term value of the TRT. The research team identified eight criteria to assess access and use. Access and use includes explicit, direct use by people. The criteria to assess explicit use include using the TRT for TRID indexing or searching, to understand the transportation research domain, and to locate defini- tions for transportation-related terms. Embedded use is where the TRT is used within systems. The criteria to assess embedded use include external systems using the TRT content and a num- ber of criteria related to the use of the TRT terms for searching and for TRID indexing. The latter are described at various levels and by facet. Customized use, the partial extraction of TRT content to create customized vocabularies, is assessed anecdotally. Finally, general TRT usability was assessed using criteria based on Usability.gov. Explicit Use Explicit use is defined as the direct access of the TRT by people—interactive use. It includes direct manual use whether through a web interface or another access point made available to the community. Explicit use can include manual consultation of the TRT and searching the TRT to identify index terms or to select terms for searching in TRB repositories or other commercial information sources. This type of use also includes direct use of the thesaurus to identify defini- tions or to extract terms or term records from the TRT. The TRT is often used by librarians at universities, state DOTs, transportation research centers, and private-sector organizations to identify terms to use in indexing. Catalogers and Figure 3-2. Depth of terms below Level 2.

24 the transportation research thesaurus: Capabilities and enhancements indexers most often reported searching for transportation terms as a way of validating the terms they had identified for a particular document or publication. Explicit use of the hierarchical browsing structure was noted by a few catalogers, particularly when a pre-identified term that they were searching for was not found in the TRT search results. Catalogers and indexers also reported selecting a closely related term and reviewing the list of terms associated with that term. In some cases, this provided guidance. Most stakeholders interviewed, though, did not report using the Related Term (Hierarchical) terms in the display. The TRT is rarely used to identify terms for searching in TRB information sources. The strategy most often described for searching in TRB information sources was a trial-error approach—an initial term or query is searched, and depending on the results, the query is refined. This anecdotal information was validated in Google Analytics reports (discussed below). Reference librarians were aware of the TRT, but generally reported that they did not consult the TRT in preparing their search strategies in TRID or other transportation informa- tion sources. SMEs were generally unaware of the existence of the TRT and, therefore, did not report explicit use. While SMEs were generally unaware of the TRT, when alerted to its existence, they noted its potential value and use for other information discovery tasks. One SME, who had worked in both private-sector transportation organizations and an academic environment, noted the TRT’s value as a resource for gaining a quick understanding of the state of research in different aspects of the field. Another SME, who has many high-profile review and editorial responsibili- ties, noted the potential value of the TRT as a visual navigation tool for transportation resources within and beyond TRB. All SMEs noted the potential value of the TRT as an embedded tool supporting current awareness and new resource recommendations. In general, the research team found that while the TRT is available for explicit use, it is rarely accessed. While the research team did not have sufficient user log data to determine exactly how the TRT was being accessed directly, the interviews with librarians and researchers suggested that they do not use the TRT as a resource for identifying search terms for other information resources. In fact, there was little to no awareness on the part of researchers (as represented by our SMEs) of the existence of the TRT. Embedded Use Embedded use is defined as the integration of the TRT into information- or business-related applications, such as a search system, an information or content management system, an auto- mated classification or indexing system, or a workflow or decision support system. Embedded use implies that the consuming application uses and interprets the TRT without any human inter- vention other than the initial setup of business rules. Therefore, this type of use does not include manual addition of TRT terms to cataloging applications or authority control systems. The use of the TRT in TRB information repositories is an example of embedded use. For some criteria in this section, the research team relied on input from stakeholder interviews. For other criteria, such as the use of the TRT embedded in TRB information sources, Google Analytics reports provided reliable data for 4 months of activity. The two primary examples of embedded use are (1) the TRT’s integration into TRB reposi- tories for indexing and (2) the TRT’s integration into TRB repositories for searching. Data and use patterns for these examples are presented and discussed below. Beyond these two primary examples, the stakeholder interviews and data collection efforts surfaced only two other embed- ded examples. The first was a conversion of the TRT rectangular tables into an Oracle Thesaurus component to support searching in a business application at a transportation research center. The second was a manually converted integration into a metadata management application. By

assessment of the Current trt 25 and large, all other examples described in the stakeholder interviews or data collection efforts fell under the customized use category. Since the TRT’s main purpose is as an embedded tool for both TRID indexing and search, the team performed extensive analysis of the TRT terms for both searching and indexing. The use of the TRT terms alone in search queries is rare; the vast majority of the searches are keyword searches. The ratio of the TRT terms to their use in indexing was overall considered to be good, but it varied widely when calculated at the facet level. With the exception of the transporta- tion mode terms in Facet A, all facets showed an extensive use of the top term. This may sug- gest underdevelopment or overdevelopment of some facets. Facet R [Materials] has the most terms and is also the most heavily used for indexing. Suggestions were made for rebalancing and improving some of the facets based on the assessments performed and discussed with the project panel as part of the TRT assessment and the detailed Facet X (Information Organization) analysis. In terms of its use as an embedded tool for other systems, the TRT’s tight integration with TRID indexing and searching makes it difficult for the TRT to support embedded use in any other system that is not closely aligned with TRID. The limitations of structure and functional- ity and the lack of a fully ISO-compliant thesaurus also severely limit the embedded uses. Those uses that were identified were anecdotal and reported significant cost and difficulties in doing customization. Customized Use Customized use is defined as a partial extraction of the TRT based on a particular facet or selec- tion of a particular level of terms to create a classification scheme or a customized vocabulary man- agement system focused on a particular mode or emerging topic. Customized use in specialized applications may include local uses requiring customization of the content, uses that are focused on a single transportation mode, or uses that extract terms from across all facets of the TRT to gener- ate a new thesaurus product. In addition, specialized applications may include the extraction of thesaurus terms, with the construction of new relationships to describe conditions in a particular country or region. Customized use is important for supporting future semantic applications and the generation of new products, and it enhances the ability of the transportation community to use the TRT as a basis for new products and services. For these criteria, the research team relied on the input from stakeholder interviews. Just as with embedded use outside of TRID, limited custom use was found. This is primarily because of the lack of extraction capabilities in the current TMS. Cases where there have been attempts at custom use have been small scale and have required significant manual effort. Usability Usability criteria focus on the explicit use of the thesaurus and the functionality that is avail- able to stakeholders. The research team used the criteria available at Usability.gov, an authorita- tive source used across the industry, for guidance in assessing the TRT’s usability. While the TRT meets the usability criteria for the functionality provided by the interface, there are significant difficulties, particularly in the use of the navigation structure to browse. The navigation structure is dictated by the enumeration codes, and the facets that one would antici- pate as the most heavily browsed are also the most difficult to navigate because of the structure. Some functionality, such as the generation of KWIC and KWOC, which are basically oriented toward print indexes, are of questionable value to stakeholders.

26 the transportation research thesaurus: Capabilities and enhancements Findings Improving the TRT in aspects related to access and use is difficult because these criteria are so heavily dependent on, and therefore constrained by, the TRT application. While some enhance- ments to the TRT interface to improve explicit use are possible, they may not be cost-effective given the lack of usage. The majority of contemporary applications that involve embedded or custom uses are impossible or very difficult in the current TRT context. Governance Processes Standards in the field of information science are intended to provide guidance but will always be interpreted and adapted to a specific environment. While one cannot look to the standards for specific assessment criteria for governance, the governance process for the TRT should be designed to support the transportation community. This dimension was assessed using the following categories: (1) community engagement in TRT governance, (2) level of focus of governance, (3) use of the TRT system to support decision making, and (4) TRT change management. The research team found that engagement in TRT governance is dependent on the role played by the individual; involvement is highest among members of the TRT Subcommittee, who are primarily librarians and information professionals who oversee and guide changes to the TRT. The information science community provides an important kind of understanding of the the- saurus and its value to information management and discovery. However, even though some information professionals are also SMEs in certain areas of transportation, there is a general lack of direct engagement on the part of transportation SMEs, and a subsequent lack of transporta- tion expertise in the process. The research team found some references to consultation with transportation experts, but they appeared to occur when there was a need for clarification or guidance in the use of a particular term. On a practical level, however, the research team understands that the current structure of the TRT constrains the use of expert guidance. Let’s consider that an SME in intelligent transportation systems or intelligent cities advises that these are fast-growing areas of focus for transportation, and they warrant full vocabulary development. The TRT Subcommittee’s effort would be substantial. The current architecture of the TRT would constrain the TRT Subcommittee from adding a new facet—a challenge not imposed by a standard thesaurus. Instead they would have to find an appro- priate placement within the existing structure or they would have to redefine and refocus an existing facet. Depending on how deeply the new “top term” is placed in the hierarchy, there may be limita- tions to how broadly the new field may be represented. Before it is suggested that the governance process be expanded, there have to be equivalent opportunities for the TRT Subcommittee to easily and practically act upon expanded input. The governance process is focused on the selection, addi- tion, and placement of terms, which may limit the development of related terms around the term’s broader concept and, therefore, the growth of the thesaurus. The TRT system provides limited sup- port to the governance process. While there is a process for recommending terms, the feedback to recommenders is limited and may discourage further suggestions. The system lacks the functionality to track suggested terms and the history of discussion regarding the decisions that are made or to produce reports that would provide the comprehensive views of the content and structure that are needed to make strategic decisions about the content or to perform periodic evaluation/audits of the content. Overall guidance of transportation experts on the direction of the TRT was noted as missing from the governance process. In addition, SMEs interviewed—with one exception—were unaware of the

assessment of the Current trt 27 TRT. The research team could not find evidence of annual reviews or audits of facets with SMEs. Nonetheless, SMEs expressed an interest in being involved in the governance process. The challenge will be how to structure that engagement so that it has a low impact on SME time and still provides critical information to guide the TRT’s development. This is the area where significant improvements can be made both to the governance process and to the TRT system’s support for that process. The research team acknowledges the difficul- ties of engaging transportation researchers in the governance process, but there are also signifi- cant benefits in terms of the content and promotion of the TRT. Another area of opportunity is the addition of reports that would support the governance process and provide more complete feedback to those involved. Governance Tools This analysis focused on the TRT’s alignment with thesaurus standards, the alignment of the TRT with the scope and coverage of the field of transportation, and the productivity of the TRT terms when used in TRB and non-TRB information sources. The research suggests that the TRT partially supports thesaurus standards. The primary criterion for improvement is the strength and interpretation of literary warrant. The variation may be attributable to different understand- ings of the structure and use of thesauri versus library subject headings. In addition, variation may be attributable to different views on whether the TRT is an extended classification scheme or a thesaurus. Regardless of the reasons for different understandings of the TRT’s warrant, these variations are reflected in the scope and coverage of the field. TRT Alignment with International and National Thesaurus Standards National and international thesaurus standards provide guidance on the formation of terms that may lead to more effective search queries and to indexing terms that represent the way that people in a domain think and speak. The readability and understandability of terms outside of their context in the thesaurus is an important guideline. By and large, the TRT content supports this criterion with the exception of the structural terms, the use of parenthetical qualifiers, and stand-alone adjectives and adverbs. Where a thesaurus has developed and published style guidelines it is important that terms align with those guidelines. The TRT governance process ensures that terms are aligned with stated guidelines. However, while the guidelines refer explicitly or implicitly to some of the style issues highlighted in thesaurus standards, there are a few minor gaps (e.g., qualitative and quan- titative nouns). In addition, there was feedback from indexers and transportation librarians on the need for either clarifications or expanded guidance. Finally, perhaps the most important characteristic of a thesaurus is its warrant—the rationale that is at the foundation of all decisions to include, exclude, or link terms. The two most promi- nent forms of warrant are literary and user warrant. The TRT guidelines suggest that literary war- rant guides the development of the TRT. Through the research team’s interviews with the TRT Subcommittee members, literary warrant was confirmed. However, there was some uncertainty as to how literary warrant was being interpreted. In particular, does literary warrant mean use of a candidate term in TRB publications only, or is it use in the general transportation literature? Does it mean heavy use or is emerging use sufficient? The literary warrant was also developed in the 1990s. Transportation literature and the use of transportation language in that literature have changed sig- nificantly since the 1990s. Because the transportation literature is extensive, there is a need to clarify what is meant by literary warrant.

28 the transportation research thesaurus: Capabilities and enhancements TRT Alignment with Scope and Coverage of Transportation Domain For this assessment, the research team identified 20 resources that represent the transpor- tation domain. They ranged from TRB Annual Meeting program areas to handbooks and peer-reviewed journals. A set of TRT terms was selected and searched against these resources— recording “exact matches,” “no matches,” and “potential relationships” (see Figure 3-3). Poten- tial relationships are defined as cases where a term identified in a non-TRT resource could be linked to an existing TRT term. For example, “economic development” exists in the TRT, but “urban economic development” is found in a non-TRT resource. It is considered a potential relationship match since it could be added to the TRT by adding a BT/NT relationship with “economic development.” There is a place to semantically “hook” or “anchor” the new term in the existing TRT structure. No matches may be viable terms for the TRT, but would require a new structure in order to be added. Handling of Transportation Modes Additional analysis focused on how well the TRT deals with particular modes of transpor- tation. Transportation modes are not currently represented by dedicated facets. Instead, the TRT aims to cover all modes of transportation in a modal-agnostic structure. Vocabularies for all modes, specialized topics, and community perspectives have been integrated into this Figure 3-3. Breakdown of exact, potential relationship, and no match results across sources.

assessment of the Current trt 29 Figure 3-4. Air transport terms scatter. modal-agnostic structure. This was a point of assessment raised by the SMEs supporting the project during the interviews. By far, the greatest number of terms pertains to highway and road transport, but even that mode is not represented by a dedicated facet for roads and highways or a way to bring the terms together. In fact, this facet is lightly treated as a subdivision of land transport. The other modes—air, rail and water transport—are scattered across the TRT. An illustrative analysis is provided in the following paragraphs and Figures 3-4, 3-5, and 3-6; there may be other modal views that need to be addressed. Air Transport Terminology Scatter Terms related to air transport are scattered across the thesaurus without term-level relation- ships that would allow one to see the full scope and coverage of that mode. This structure makes it difficult to identify and assess the coverage of a specific mode. Figure 3-4 demonstrates the distribution of air transport terms across the current thesaurus structure. Rail Transport Terminology Scatter Terms related to rail transport also are scattered across the thesaurus without term-level relation- ships that would allow one to see the full scope and coverage of that mode. Figure 3-5 demonstrates the distribution of rail transport terms across 11 of the 21 facets of the current thesaurus structure. The greatest concentration of rail terms is in Facet Q, which represents a vehicles and equipment perspective. Water Transport Terminology Scatter Terms related to water (marine) transport are also scattered across 13 of the 21 facets (Figure 3-6). About half as many terms related to water transport are included in the TRT as are provided for the other two modes. Terms for water transport are concentrated in Facets H [Safety and Security], P [Facilities], and R [Materials]. TRT Term Alignment with Other Transportation Vocabularies Finally, the research team explored the degree to which the terminology in the TRT matches that of other vocabularies with significant transportation coverage (see Figure 3-7). 218 terms

Figure 3-5. Rail transport terms scatter. Figure 3-6. Water transport terms scatter. Figure 3-7. Coverage of other transportation vocabulary terms in the TRT.

assessment of the Current trt 31 from the TRT were selected and searched against The World Bank Thesaurus (2010 edition). There were 37% exact matches, 18% no matches, and 45% potential relationships. A similar matching was performed against the IEEE Thesaurus, using 51 selected terms. The results were 13.30% exact matches, 16.06% no matches, and 70.64% potential relationships. The final source of other controlled vocabularies was derived from an exercise in 2012 to generate terminology from a corpus of transportation documents using semantic technologies. The list, called “Semantic Technologies” for purposes of this assessment, was based on noun phrase extraction and a manual review and trimming. The list was also categorized into areas of transportation. The original source was in the tens of thousands of concepts. From that list, the research team selected 92 keywords for testing. The results were fairly evenly distributed across the three categories, with 23.91% exact matches, 33.69% no matches, and 42.39% potential relationships. The rate of exact matches is higher than for other sources. The results show that while there is some overlap with these other vocabularies, there are some gaps that can be filled. The use of this technique to identify such gaps could help to support further semantic enrichment of the TRT, utilizing these resources to identify both new terms and additional relationships. As noted earlier, there are opportunities within the no match and potential match results to identify terms that have logical links to the TRT terms and could be integrated into the TRT. Findings The TRT has a rich foundation of terms. However, the foundation only partially aligns with the scope and coverage of the field of transportation. This is understandable, since the current TRT application presents constraints and challenges to thesaurus managers, consumers, and stakeholders, and the structure encourages a focus on term-level management rather than on facets or subdomains. The results of the search tests against other resources are consistent with the gaps surfaced in the Google Analytics search logs discussed earlier in this chapter under Embedded Use. The terms found in the Open Keywords search logs were not unlike those keywords selected from authoritative transportation research products. The search tests against other domain resources also highlighted the generic nature of many terms in the TRT and gaps in transportation-specific terms. Of particular note was the lack of alignment of the TRT’s overall structure with TRB subject areas as represented on the TRB website and in TRB Annual Meeting programs. The research team also notes that a simple review of the 21 facets of the TRT surfaced only two that would be recognized as transportation-focused from their names. Nineteen of the facets are generic and might be understood to be part of any other controlled vocabulary. A simple example would be Transportation Management and Organiza- tion. While the transportation context may be implicitly understood when TRT is used in TRID, this context is lost if the TRT is used outside of TRID. The research team also found gaps in coverage when comparing the TRT terms to other transportation thesauri and controlled vocabularies. There was a low exact match rate of transportation-focused terms drawn from three sources (e.g., World Bank, IEEE Thesaurus, and Semantically Generated Keywords) with the terms in the TRT. While the overall no match rate was high, there is opportunity in the potential matches between terms from other thesauri and terms in the TRT. This portion of the assessment also addressed the productivity of the TRT terms in TRB and non-TRB information sources that are frequently used by transportation professionals and prac- titioners. The research team notes that the assessment criteria for the productivity of the TRT terms are partially supported because there are wide variations in results where one would expect more consistent behavior. In addition, the structural terms, parenthetical qualifiers, and other

32 the transportation research thesaurus: Capabilities and enhancements grammatically non-standard terms are not productive in searching other sources. For the TRT to have value to the full transportation community, it should be applicable to other environments and used in contexts beyond TRB. The research team addressed the question of pre- and post-coordinate indexing practices. Pre- coordinated terms involve the combination or two or more concepts (i.e., this should not be con- fused with simple multiple word concepts). An example of a pre-coordinated concept is “economic development in urban areas.” In contrast, an example of a multiple word term is “urban economic development.” The use of pre-coordinated terms assumes that an indexer must predefine or pre- coordinate the terms that are likely to be searched and assign them as “whole” indexing terms. Pre- coordinate indexing or subject description practices were first used in physical card catalogs where there was a single physical entry point that a user was likely to find. This practice assumes that the search system or the searcher cannot efficiently handle searching for two distinct concepts. This ear- lier context was also limited by what search system architects refer to as “left edge matching.” If the index design did not allow for matching second, third, or fourth words in the index entry, no match would be found. In the traditional context, pre-coordinate indexing could lead to missed matches to search queries. This risk is reduced in today’s full-text search index architectures. With the advent of online catalogs and fielded search systems such as BRS and DIALOG, it was possible to build search system indexes that could “slice and dice” or combine indexing terms. Post-coordination of terms—enabling the search system to find specific index entries for multiple terms and retrieve all relevant entries in a combined result set—was an effective advancement for that point in time and for the context. Using the example from above, the search system might find terms for “economic development” and for “urban areas.” The main challenge with a post-coordinated approach to indexing is the large and often irrelevant recall that results. Where post-coordinated terms are generic and not context specific, this can result in low-performance searching. The post-coordinated approach also was dependent upon (1) the searcher specifying how to combine the two different sets of results, or (2) the search system architects setting the correct default matching algorithms for results. Because there is no one correct default for all contexts, search system architects typically set the default to include all results. Depending on the matching algorithms, the most relevant results may not appear in the top results, which are most likely to be viewed by the searcher. With the increased computing power of the 1990s and the development and continued archi- tecture advancements of full-text search indexes, indexers are no longer constrained to either pre- or post-coordinated indexing strategies. Most full-text search systems and most biblio- graphic search systems now build their internal index architectures based on what are called “rolling parsed entries,” which are the equivalent of KWIC structures. This means that an index entry will be constructed for every word in an indexing term field. It is possible and efficient to try to match a multiple word search query against such a full-text index without the searcher having to specify how to query and match every search term. The question has shifted to “how are the query-processing and query-matching strategies defined?” It is now possible to manually and machine index content to reflect how people talk and think about what they are looking for in a search. Thesauri are now important tools in developing query-processing and query- matching strategies, whereas in the past they were primarily used for guiding manual indexing. For TRB, the future development of the TRT highlights the underlying question—what type of search systems will the TRT be used with in the future? In the past 20 years, the question has moved beyond pre- and post-coordinated indexing practices. If the TRT is to have a future, it must more closely reflect the vocabulary and language of those involved in transportation, without concern for pre- or post-coordination of terms. A positive observation from the tests of the TRT is the opportunity to expand the scope and coverage of the TRT by leveraging “literary warrant” more extensively—through the integration

assessment of the Current trt 33 of transportation-focused terms from search logs, commercial sources, and other transportation vocabularies. The overall results of the assessment of scope and coverage suggest a shift to a proactive liter- ary warrant strategy to support the TRT. There are important gaps in scope and coverage. Such a shift would involve a continuous and proactive review of the literature to ensure that current coverage supports the full domain. This does not have to be a labor-intensive manual process; it can be automated to increase the pipeline of terms flowing into the decision/review process. While a shift to a proactive literary warrant strategy is suggested, it is not clear that this approach could be implemented within the current TRT architecture. Architecture Dimension E focuses on the architecture, functionality, and protocols of the application that supports the TRT. The architecture and functionality of the application are important because they define what capabilities are available to work with, what can and cannot be done, and what products and services can be produced. The research team identified four levels of architecture that impact Dimension E. These levels are (1) the architecture of the TMS application, (2) the architecture in which the thesaurus functions, (3) the communication architecture that supports interoperability, and (4) the architecture required to support multilingual versions and uses of the thesaurus. Forty-eight of the original 134 assessment criteria pertained to Dimension E. Of these, 39 pertain to the architecture of the TMS, 5 to the context in which the TRT functions, 3 to interoperability, and 1 to multilingual use. Sources of data for these assessment criteria were derived from the research team’s construc- tion of the working copy of the TRT, comparisons to other thesaurus management tools and to the ISO 25964 conceptual architecture, and from extensive interviews with the TRT manage- ment team and the TRT system administrator/designer. Dimension E is the least amenable to enhancements in the current TRT context. The cur- rent architecture only partially meets the assessment criteria. This is not surprising considering that the thesaurus architecture is also constrained by the tight coupling of the TRT application to TRID indexing and searching. The database schema revolves around this primary purpose rather than a more general thesaurus standard-oriented approach that would support greater flexibility in terms of thesaurus management functionality and easier interoperability with other applications. There are many functions that are not currently provided within this context. The opportunity exists to review these functions, especially those that support interoperability, export, and governance reporting to determine how the architecture might be supplemented to address more of the requirements in this dimension. Because of the importance of Dimension E to the capabilities of the TRT, a separate task was performed to assess the TMS market. The findings for this task are reported in Chapter 5. Overall Assessment of the TRT Based on the detailed criteria for each of the five dimensions defined for the assessment of the TRT, the team generated an overall assessment of the TRT against the criteria. Aggregate assessments of criteria in each dimension are presented in Table 3-2. The data are presented as percentages of the total number of criteria in the dimension. None of the dimensions produced an aggregate “fully supported/fulfilled” result. Dimension A, which focuses on thesaurus content, had the highest percentage of criteria that were fully sup- ported. Dimension C, which focuses on governance processes, had no criteria that were fully

34 the transportation research thesaurus: Capabilities and enhancements supported. Dimension A and Dimension B (Access and Use) had the highest percentages of criteria that were partially supported or represented variant practices. In contrast, Dimension C, Dimension D (Governance Tools), and Dimension E (Architecture) had high percentages of criteria that were not supported or had no current practices. In general, the content of the thesaurus and its current use in TRID appear to be areas of strength. Dimension D (Governance Tools) has areas of strength and weakness. One area of potential enhancement, however, is the scope and coverage of the field of transportation. Dimension E (Architecture) is another clear area for improvement of the current TRT. The team performed further analysis across the assessment criteria and determined that the majority of the criteria could not be readily improved from partially or not supported to fully supported within the context of the current TRT system and architecture. The cost, time, and risk involved, based on the constraints of the integration with TRID, suggest that this approach would provide limited benefits for the cost. Table 3-2. Overall assessment of the five dimensions defined for the assessment of the TRT. Overall Assessment Dimension A Dimension B Dimension C Dimension D Dimension E Fully Supported/Fulfilled 15.55% 7.69% 0.0% 12.50% 8.3% Partially Supported/Fulfilled 48.88% 57.69% 33.4% 20.83% 16.6% Not Supported/No Practice 35.55% 34.61% 66.6% 68.75% 75.0% Total Criteria 45 26 12 33 48

Next: Chapter 4 - Future Strategies »
The Transportation Research Thesaurus: Capabilities and Enhancements Get This Book
×
 The Transportation Research Thesaurus: Capabilities and Enhancements
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Research Report 874: The Transportation Research Thesaurus: Capabilities and Enhancements documents the results of a comprehensive assessment of the Transportation Research Thesaurus’s (TRT’s) capabilities and strategies for the TRT’s future development. The TRT is a structured, controlled vocabulary of terms in English, used by TRB and a variety of other organizations to support indexing, search, and retrieval of technical reports, research documents, and other transportation information. The TRT, covering all modes and aspects of transportation, has evolved over a number of years and is continuously being refined and expanded.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!