Engineering Behavioral Ontologies
What would it take to realize the benefits ontologies could bring in the behavioral sciences? To supplement its close examination of key example behavioral ontologies, the committee attempted to gain an overview of how ontologies are developed and used in the field. We also explored what it takes to engineer ontologies: the socio-cognitive (human) processes, computer-based tools, and institutional and organizational structures through which ontologies operate and are sustained.
EXISTING BEHAVIORAL ONTOLOGIES
The committee commissioned a scoping review of the published literature on behavioral science ontologies.1 The review was designed to provide an understanding of how and to what extent these ontologies have been studied; the significant findings from this body of work; and any discern-able trends or patterns in the behavioral science ontologies that have been studied. The committee also hoped to gain insight into the methods by which ontologies have been created in the behavioral sciences and their strengths and limitations.
The scoping review identified a wide array of applications for existing ontologies in research, clinical settings, and education. These applications and functions closely track many of the elements identified in an informal assessment of how ontologies are used carried out by the committee. These
1 The paper is available at https://nap.nationalacademies.org/resource/26464/Falzon-comissionedpaper.pdf
applications identified including not only the research functions discussed in Chapter 3, but also other important functions that support clinicians, policymakers, and individuals.
The studies did not provide a solid picture of how many models in current use can be accurately classified as ontologies, rather than sets of concepts that have not been formally specified. The scoping review did not locate any systematic documentation of existing ontologies (broadly understood), and we note that carrying out such a systematic survey would be challenging. It is also difficult to develop a picture of the extent to which existing ontological systems are currently contributing to progress in the behavioral sciences or the gaps and barriers to their development in the behavioral sciences.
Based on this review, our detailed look at several example ontologies, and the committee’s own observations, it is clear that the behavioral sciences have not yet taken full advantage of the scientific advantages ontologies offer.
THE ONTOLOGY ENGINEERING PROCESS
The development of scientific ontologies rests on two equally important components: human, socio-cognitive practices and decisions, and computational tools.
Humans must make key decisions about the terms and relationships to be covered in an ontology. Ontology engineering is a creative and inventive process that requires substantial intellectual and social effort; the knowledge humans have, their activities, and their interactions with technology (social interactions, cognitive strategies, language, and patterns of communication) are all essential. The work of ontology creation, editing, dissemination, debugging, and understanding all require such practices.
The creation of an ontology requires the translation of complex, potentially ambiguous concepts into a formal specification. A first step is the identification of the key notions, as well as key features or attributes of those notions, that will be included in the ontology. Experts are needed to determine how the key concepts and features should be formally represented. The value of a particular ontology may depend partly on intended uses, which should be clearly articulated. Ontology creation should thus also involve identification of the stakeholders who may be affected by the
ontology, the goals and knowledge bases that will depend on it, and how it will be used.
To be useful an ontology must be disseminated, which involves not only transmission of the formal specification but also instruction about ways to use it. Debugging an ontology requires an understanding of what should—and equally importantly, should not—be included in the ontology (i.e., what is important in the domain).
Change and Evolution
Since ontologies are formal specifications, it is easy to mistake them for relatively static structures. But ontologies are dynamic: they evolve and change significantly and sometimes rapidly in response to scientific developments and other factors. Change could be needed when scientists learn more about the world, or aspects of the world itself might be changing. Ontology evolution also occurs when scientific goals and needs shift, perhaps in response to technological advances. A third kind of ontology change is prompted by the desire to harmonize or integrate different ontologies, including the desire to bridge ontologies at different levels.
Ontology change can also be driven by factors that are external to scientific communities, such as cultural changes in conceptualizations. Social expectations, needs, values, and concepts can all change over time, and so there can be pressure—social, political, psychological—to adjust a scientific ontology so that it is more consistent with those social factors. For example, changes in cultural perceptions of homosexuality, pregnancy, and hysteria each called for changes to once-standard disease classifications.
Determining whether an ontology is—and remains—useful for the purposes it was designed for is a key to its ongoing viability. This process requires both verification and validation. Verification is the assessment of whether the ontology was built correctly, that is, whether the specification has utility for its intended purpose. Validation is determining whether the ontology correctly models the domain or real-world application for which it was intended. Essentially, verification addresses the intrinsic aspects of the ontology; validation addresses the extrinsic aspects of the ontology. Metrics such as completeness, accuracy, consistency, computational efficiency, and clarity, among others, are used in this process.
The committee identified three broad criteria for the development of an ontology—logic, validity, and usefulness—that mirror criteria used in many scientific contexts; see Box 5-1.
Since ontologies existed long before there were computers it is important to acknowledge that computational tools are not strictly necessary for ontology engineering. However, the efficiencies they provide, not to mention the capacities they afford for working with large bodies of data, have made them essential in much of behavioral science, and likely for the development and use of behavioral ontologies. Modern scientific ontologies may contain many thousands or even many millions of terms and are correspondingly complex, so technology has become essential for managing them.
Computational tools can never stand in for the human understanding, ingenuity, and social perceptions that go into the development and use of ontologies. They do not offer ready means of helping scientific communities determine what they need or of engaging colleagues in the socio-cognitive tasks detailed above. But technological tools do play an important supportive role in facilitating ontology design and use. A comprehensive review of available computational tools was beyond the scope of the committee’s charge, but three key elements of the life-cycle of an ontology illustrate the contributions of computer technology:
- creating and editing the ontology;
- disseminating it so that researchers have awareness of and ready access to the ontology; and
- evaluating and debugging the ontology, in the sense of testing the logical implications of the ontologies’ statements and folding findings back into the ontological structure.
Creation and Editing
Technological supports for human ideation and consensus-building can be useful for ontology creation. People use such tools to brainstorm ideas for what terms should be included in an ontology by suggesting competency questions, that is, questions about distinctions in the world that the ontology ought to be able to resolve (these are analogous to requirements in conventional software engineering). These questions help developers to flesh out the things that the ontology should encompass and stimulate developers to create an initial set of entities and relationships. Tools that make it easier to view a hierarchy of concepts, add to it, or add properties of those concepts are extremely useful in creating new ontologies and reviewing and editing existing ones. Advances in computing and algorithmic innovations can also allow for the processing of far more data than was previously possible and for increasingly sophisticated ways to identify classifications as alternative starting points for ontology creation. Statistical modeling algorithms can be used to automatically identify classes and nonlinear relationships among them, and to automatically organize very large datasets.
Another automated approach that can aid ontology creation is to use natural language processing (NLP) to analyze of large amounts of text or documents. NLP methods can organize a body of text-based documents into topics that can be treated as a representation of the body of knowledge associated with those documents. People working on ontology engineering hope that formal development methods or the use of design patterns—as in conventional software engineering—will greatly facilitate the early stages of ontology development. For this hope to be realized, many of these methods need to be empirically evaluated in the behavioral sciences and also built into general-purpose ontology development tools.
Making ontologies widely available and accessible is critically important, and computational tools are particularly valuable for these purposes. Available tools facilitate such tasks as searching for ontologies that contain specific terms and visualizing them. Especially important are application
programming interfaces (APIs), which allow programs to access and use information from others; they depend on shared terminology. Using an API, a third-party computer program can locate terms that may be relevant for describing a scientific problem, a dataset, or some other component of interest.
Evaluation and Debugging
Tools and technologies have been developed to automate and facilitate evaluation. For example, the widely used ontology software library ROBOT offers a “report” function that runs a series of quality control tests over an input ontology and generates a report file based on the results, suitable for use in an automated workflow. Another emerging trend related to ontology evaluation is the increasing use of NLP to generate semantic definitions through a natural language generation task, which can parse the ontologies and generate natural language text so that humans can assess its quality.
Without a doubt, developing an ontology entails a lot of hard work, community engagement, and iteration. Ontology engineering is therefore a very expensive endeavor and one that requires resources as well as specific actions and processes that are sustained. It also requires continual investment because any ontology will need to evolve as the relevant science changes. Currently, there are no clear road maps for establishing and sustaining an ontology in the behavioral sciences. Many existing, well-used scientific ontologies may not be in a secure financial position, and the situation seems to be even more precarious for ontologies in the behavioral sciences.
There are a few examples of scientific ontologies that endure as robust entities. In nearly all such cases, there is a substantial commitment of government funding (or there is a government mandate to use the ontology) that ensures the durability of these resources. There is also a need for support for the tools and practices of ontology engineering. Tools and practices developed in other contexts are likely to be valuable to behavioral scientists as they pursue ontology development, but we acknowledge that there are as yet no empirical demonstrations of how they might work in the behavioral science domain. Iterative evaluation and testing of methods applied in new contexts will need to be integrated in the broader evaluations discussed above. A wide array of changes and advancements can potentially play an important role in supporting greater reliance on ontologies in the behavioral sciences. Key gaps and shortcomings that will need to be addressed fall into three categories: discovery, capacity, and practice.
One significant need is for new information, practices, and content based on novel research and discovery. Research is needed to develop best practices for creating, disseminating, teaching, and using ontologies in the behavioral sciences. While there has been some research on best practices for ontology engineering in other domains, those techniques have not yet been widely used in the behavioral sciences. Translational research could provide important evidence about how methods in ontology engineering may need to be updated and validated for the behavioral sciences. Similarly, both foundational and translational research is needed for the development of the next generation of computational tools that can advance the capabilities and uses of ontologies. Although additional research is needed, we emphasize that the potential value of such research is not a reason to delay immediate progress in the development and use of behavioral ontologies.
Shortfalls in implementation, and the capacity for implementation, of approaches whose value has already been demonstrated also need to be addressed. When what needs to be done is clear but the resources or capacity to do it are not currently available, progress is hampered. There is a need for additional resources to increase awareness and training regarding ontologies in the behavioral sciences. In addition, full utilization of both current and future computational tools will require significant increases in computational resources including computer time, data access, and server storage. Institutions and organizations will also likely require additional resources, particularly if they play increasingly prominent roles in the development, dissemination, and use of ontologies (as suggested by the success of the NCI thesaurus).
A final set of needs involves practices and processes that could support wider use of ontologies in the behavioral sciences for which the capacity is already in place. There are currently few explicit institutional incentives to use ontologies in the behavioral sciences, whether from journals, conferences, funding agencies, review committees, or other entities. Open data and code have become much more widespread as relevant institutions have required them. The movement toward open science depends on the existence of ontologies to enable comparisons between datasets, and recent trends in expectations for data sharing are changing this situation. Funders and publishers now often require data sharing, which requires the use of
standardized metadata, which in turn requires the use of ontologies. But there have as yet been comparatively few community-level efforts to build consensus about the use of ontologies in the behavioral sciences: there are many ways to encourage this. There are no perfect or completed ontologies, regardless of domain; ontologies are always subject to revision as the scientific community learns and changes. Nonetheless, experiences in other domains have shown that consensus about an ontology is possible, though it requires concerted efforts by researchers and institutions.
This page intentionally left blank.