Building Consensus, Identifying Needs
In meeting its charge, the committee consulted more than 150 neuroscientists and computer and information science experts who contributed advice, comments, and suggestions regarding the desirability, feasibility, and possible ways of implementing electronic and digital resources to enhance neuroscience. As outlined in the introduction to this report, the committee obtained these contributions through four mechanisms. The first was the preparation of two background papers. One considered the development, administration, architecture, use patterns, and funding of genome and other scientific databases (Vela, 1990). The other described the results of a one-day meeting of computer scientists from various disciplines who had been involved in the Defense Mapping Agency's (DMA) program to digitize cartographic data (Downs et al., 1990). The meeting was held to assess what could be learned from the DMA's experience that might apply to implementation of a National Neural Circuitry Database.
The second mechanism was the organization of four task forces composed of neuroscientists and computer information specialists. Each of the groups met for two days and were organized around neuroscience topic areas; they included 35 invited participants in addition to specific committee members. (Appendix A contains a description of the themes and a list of participants of each task force.) Third, the committee solicited input from a number of other sources as an additional information mechanism. Letters were sent to the present officers and council members, as well as the past presidents, of the So-
ciety for Neuroscience describing the study and inviting opinions and suggestions. Similar descriptions with a request for input were also published in selected scientific journals (for examples, see Appendix B ).
Finally, symposia and open hearings were held in Washington, D.C., San Francisco, and Chicago. Invitations to these events were sent to members of the Society for Neuroscience within roughly a 300-mile radius of each meeting location. The program for each meeting included scientific presentations by a leading neuroscientist and by a scientist from the field of genetics or molecular modeling with experience in the use of computer and information technology in their research. Also on the agenda were demonstrations of prototype brain databases and brain imaging technologies. Finally, each meeting included an open hearing component in which selected committee members reviewed some of the issues being considered in the study and subsequently opened the floor to comments and suggestions from those in attendance. (Appendix C contains lists of speakers and demonstrators.)
The input received through these mechanisms reflected a wide variety of experiences and outlooks. Among the neuroscientists involved, some were already committed to the development of computer resources for research purposes, some had no such commitment and were more neutral, and some were frankly skeptical. In addition, they held a variety of posts, ranging from journal editors to postdoctoral fellows and from those employed in large laboratories to those working in single-person operations. Individuals in charge of library resources, scientific database administration and design, and biomedical computer applications were especially valuable participants. Each subdiscipline of computer science—including database design, graphics, software development, networks, and hardware design—was represented. In addition, participants came from academic departments, government laboratories, and private industry.
Some separation of the topics covered in each of these information-gathering activities was apparent. Participants in the open hearings and those who responded to the committee's requests for opinions were concerned largely with three issues: (1) the kind of database and the kinds of data that would be useful to them in their research, (2) the possible institution of standard methods of data collection, and (3) funding of the proposed project. In addition to these matters, task forces devoted substantial time to technical issues and administration, oversight, and implementation strategies. This chapter attempts to capture the richness of the discussions that took place throughout these activities and outlines the data on which the committee's recommendations are based.
Building a Useful Resource Complex
The data that are included must be useful to neuroscientists
The complexity of neuroscience dictates that the scope of the data included in any complex of computerized resources eventually must be quite broad. Although most participants viewed the establishment of such resources as desirable, some felt that traditional archives of neuroscience information (e.g., libraries and journals) were adequate, thus obviating the need to expend scarce funds on computerized information complexes. These dissenting views were countered by evidence, presented by experts from medical libraries and scientific databases, that clearly indicated an integral role for computerized data storage and retrieval, including the linking of scientific databases, in future library services. The majority of those giving input to the committee envisioned the proposed complex of resources as necessarily containing more kinds of information than are now or could in the future be contained in library reference materials and published journals. For example, a journal article, available from the library in fulltext format with data graphs and figures, rarely presents all the data that are available—data that could be included in a digital format. A graphically formatted synthesis could greatly facilitate understanding by making the information visual. In addition, archival brain material, such as that contained in the Yakovlev brain collection or the Comparative Mammalian Brain Collection, is not available at all in libraries; it could be made much more accessible through electronic networks and digital storage methods. Finally, the vision of a complex of resources for neuroscience includes the incorporation of informal data, such as preliminary results and interpretations, which would provide a far richer base of information for research than is currently available. (Although later sections of this chapter expand upon these nontraditional kinds of information, they are mentioned here as components of the data that may be included in the proposed resource complex.)
Given the vision of resources that go well beyond what may reasonably be obtained from libraries, nearly every participant had an idea of what kinds of research data would be most useful. Although these suggestions reflected a broad range of possibilities, most participants saw two major data categories as critical. The organizing structure for information about the brain is neuroanatomy, which provides a construct for the functional expression of brain activity. Therefore, it will be necessary to conjoin anatomy and function, and this combination should guide decisions regarding the inclusion of highly specific kinds of data.
Primary among the anatomical data categories is information about the pathways that interconnect brain regions. Representing such pathways in three dimensions and in relation to standard brain atlas maps is highly desirable and can serve a number of purposes. Viewing a particular pathway allows an immediate appreciation of its complexity and may reveal the connections among specific areas of interest. Depiction of those parts of the pathway that are well documented as well as those parts that are tentative or unknown would stimulate additional research to complete the mapping of the pathway. The pathway map could also serve as an interface or entry point to archives of relevant bibliographic references.
Most important, a pathway map could function as the skeleton on which to hang information from multiple levels of the brain's hierarchy. For example, at the cellular level of the hierarchy, the structural features or morphology of different cell types are important. Because each brain region contains sets of neurons with different structures, it is often important to know which structural type actually contributes to a pathway to a distant brain region and which structural type contributes only local connections. The neurochemistry of the neurons in the pathway is also of interest at both the cellular and the systems levels. Therefore, a number of participants requested that the proposed resource complex support the ability to associate patterns of neurotransmitter distribution with structural pathways. One scientist suggested that both visual and tabular maps of the chemical systems of the brain would be of great help in his work. Such maps would include the neurotransmitters, as well as the distribution of specific receptor types and of drug binding sites in the brain.
There was consensus that any anatomical map must be related to function, but the functions requested reflected the hierarchical level at which individual scientists were working. Those studying cognitive processing and human brain functions viewed the association of brain regions with specific aspects of behavior as necessary to the usefulness of the map. Clinical neuroscientists were interested in the association of behaviors (e.g., tremor, memory dysfunctions, motor deficits) with particular brain regions. Basic researchers, studying the electrophysiological responses of individual neurons or single-ion channel responses, wanted the map to contain information about these phenomena. Enthusiasm for including many kinds of synaptic-level information was especially high among computational neuroscientists, which reflects their interest in analyzing the functional consequences of ion channel diversity, in terms of the nonuniform spatial distribution of channels and the varied response properties of individual channels. Finally, some investigators interested in the development of the nervous sys-
tem saw particular benefits in including brain maps of different species at different developmental points as a way of comparing species.
Other kinds of data that were cited as central to neuroscience research were protein and gene sequence and gene mapping data relevant to the brain. Many developmental neuroscientists, through the use of simple organisms such as worms and fruit flies, are now mapping the genes that control neural development. As the genes that code for receptors and other molecules are identified, these data become useful for researchers at other levels of the neural hierarchy. Recent breakthroughs in identifying the genes responsible for such diseases as muscular dystrophy and neurofibromatosis herald a new era in defining the underlying causes of neural dysfunction. Against this backdrop, most task force participants felt a pressing need to include genetic information in any future computerized resource. Their suggestions ranged from the establishment of linkages to existing genome databases to establishment of a brain-specific gene database.
Although the examples given above are not exhaustive of all the sugges tions made to the committee, they illustrate the necessity, in the long term, of including specific kinds of data at each level of the neural hierarchy and the appropriateness of starting with the structural anatomy of the brain.
Computerized resources must include a variety of capabilities
Acceptance and full use of computerized resources require certain features or capabilities, apart from the actual data. For example, in molecular modeling, the data are the coordinates obtained from crystallographic analysis. A useful computerized resource must include the capability to transform those coordinates into three-dimensional images and, further, to move these figures to depict the binding of drugs to the molecule's active site. This capability is afforded by the design of the software and the interfaces the user manipulates to achieve the desired outcome. For neuroscience, the capabilities needed will again reflect the diversity of experimental questions that can be asked. Although it is impossible to separate completely the scope of the data to be included from the capabilities of the systems that might be designed, many participants, especially those with expertise in database development, stressed the value of defining desired capabilities early in the planning stages.
Many of the participants providing input to the committee were experienced in the development of prototype systems for research or education, and this group suggested that certain capabilities were essential. For example, the ability to browse through various kinds of data is critical to offering users multiple entry points to access
information specific to their needs. Simultaneous display of textual and graphic data or textual and numerical data was also suggested as a necessary feature. These capabilities alone, however, may not be sufficient to ensure use of the proposed resources to their full potential. Therefore, the committee and participants discussed more specific features, designed around the needs of neuroscientists from different subspecialties.
From the neuroscientists' point of view, an important systems capability was the display of data with varied levels of realism. For example, a neuron can be depicted as (1) a photomicrograph (taken through the microscope), (2) a two-dimensional (flat) line drawing, or (3) a fully reconstructed three-dimensional object. If a researcher were interested simply in identifying the cell type, the abstract, flat drawing might suffice. But if he or she were interested in assessing the method used in a particular experiment and in whether the neuron exhibited any changes indicative of injury, the actual photomicrograph might be necessary. In the ideal case, users could choose the level of abstraction appropriate to their experimental needs.
Another capability requested by neuroscientists was the ability to extract arbitrarily defined subsets of data. Using the pathway map as an example, an investigator might want to know the locations of all cells in the pathway that contained a specific neurotransmitter — acetylcholine, for example. Another might want to see all neurons in the pathway that had axons that branched. Yet another might want to know where in the pathway receptors for a specific drug were located. One participant commented on the sometimes puzzling mismatch between the locations of receptors for a certain neurochemical and the locations of cells that contained that neurotransmitter. In his view, the ability to extract receptor and transmitter information and to correlate the two maps would be quite useful. In another neuroscientist's view, the results of computational modeling were a data set well worth extracting. By displaying simultaneously the activity of multiple circuits, it is possible to visualize parallel processing in action.
Another helpful feature for clinical neuroscientists and those involved in human imaging is the ability to compare different brain images by precise overlaying or co-registration of the images. Each brain differs from every other in its exact shape and internal organization. In addition, technical factors, such as how a PET scanner is aligned or the placement of the subject's head, can affect the orientation of the images that are obtained. Schemes for minimizing or correcting for these differences, to facilitate the comparison of different types of images (PET, MR, or CT), have been developed and are being evaluated in most major imaging centers. The task force concerned with
human imaging discussed these strategies at length, which include the use of certain landmarks visible in every brain and the use of atlases that describe coordinates for most major brain regions. More complex approaches use experimentally defined algorithms to “warp ” one brain image to another. Once these mechanisms are validated, their inclusion, as tools, in a complex of computerized resources would allow investigators to pool images, obtain more data, and thereby gain the maximum benefit from each experiment. Because human subjects are rare and imaging experiments costly, this kind of capability is particularly desirable. Efforts are already under way to share human images among distant centers. Known as BrainMap, the activity is being coordinated by a team of investigators at the Johns Hopkins University Medical School (Science, 1990).
Whatever capabilities the proposed complex of resources might include, the consensus of participants in the committee's consultation process was that these features must be defined by the needs of the users. Many participants emphasized the need for a hands-on effort to generate a comprehensive list of desired capabilities.
Different types of databases are required
There was an overwhelming consensus among all participants that a National Neural Circuitry Database as a single entity was unworkable. Participants advocated instead a complex of different kinds of databases, combined with electronic communication facilities and other on-line research tools, that would be interlinked to provide a resource for neuroscience research, education, and clinical applications. In addition, a number of investigators, citing the international character of science, called for the establishment of links to and relationships with computerized resources outside the United States. Throughout the task force meetings and open hearings, there was substantial discussion of the kinds of databases that would be of use to neuroscience. It was the task forces, however, that gave the matter its most indepth consideration. Consequently, the definitions that follow derive largely from task force recommendations. This emphasis on their work is probably a result of the special efforts made to include in these groups individuals with expertise in database administration and development, as well as those with interest in the concept of electronic collaboratories.
One general theme that emerged was that a complex of electronic and digital resources should include databases with varied levels of accessibility. Some of the databases should be public resources and accessible by anyone; others should be private and used only by in-
dividual investigators or small groups. Still others should be semiprivate or semipublic, for use by a possibly large, but finite, group of investigators. Within those general categories, the task forces identified several different kinds of databases that are expected to be useful for neuroscience.
Reference databases would contain references to published journal reports and review papers and would be accessible to as wide a group of users as possible. These databases might be built on the kind of information contained in more traditional databases, such as MEDLINE, but they would be organized around graphic representations of brain structures. The images would represent current consensus views of various brain systems and might also show those areas that required additional research. The committee learned from individuals working with the genome databases that attempts are now being made in that community to incorporate graphic representations into those databases. The essentially visual character of neuroscience data underscores the critical importance of images to the presentation of information about the brain.
Data banks would contain source or primary data, with references, that could be deposited by investigators coincident with publication of their research in standard scientific journals. Data banks would allow users to view the complete data set from experiments and might contribute to insights that would not be possible with standard journal formats. These kinds of databases are already in use in the chemical sciences and in the protein and gene sequencing and mapping communities (Vela, 1990).
Informal resources include bulletin boards and electronic mail for exchange of research methods, ideas, and sometimes raw data; these resources might also be used to share software packages. Informal resources are highly flexible and are often designed in response to special user needs. As discussed in the previous chapter, they are an important part of efforts in the worm community to facilitate open communication and exchange (Schatz, 1991).
National and international registries or directories were recommended by each task force and by many other participants to provide listings of different kinds of information. For example, it would be helpful to have a registry of all neuroscientists who are now developing or who have developed computerized data collection strategies, or who have devised databases for storage and retrieval of data, references, or other research information. Among the respondents to the committee's request for opinions and the open hearing participants, approximately 30 individuals were working on databases or
imaging protocols for basic and clinical neuroscience applications — and the committee had been unaware of their efforts. Another use for such a registry would be to list data that are available for sharing.
Research collaboration databases are semiprivate or semipublic databases set up by defined groups of investigators for work on specific projects. Such databases would contain raw data files, methods descriptions, and other information useful to the conduct of the project. A preliminary survey of long-distance collaborative activities in neuroscience carried out by study staff reinforced the recommendations of the task forces regarding this kind of database. 1 The survey revealed that approximately 25 percent of all papers published in one year in Brain Research and the Journal of Neuroscience were reports of collaborative work done by two or more geographically distant U.S. laboratories. Further, study staff asked 10 randomly chosen investigators from these groups if electronic communication facilities, including image transmission, would have helped their work. All responded in the affirmative.
Specialty databases can also be set up by defined groups. For example, Task Force 3 (see Appendix A ) suggested that such a database would be of great use in brain studies with PET or MR imaging. If the database were composed of multiple components, including references, tools for matching or warping one brain image to another, and a registry of available data, this specialty complex could aid the transfer of information regarding human brain structure/function relationships to a wide group of experts.
In summary, this section has outlined some of the key components of a useful complex of electronic and digital resources, including a number of different kinds of databases containing information from all levels of the neural hierarchy. The capabilities afforded by the complex to browse through the data, compare images, and extract specific subsets of information would enhance the conduct of neuroscience research and provide assistance extending far beyond what is now available. But the actual implementation of this complex is a substantial undertaking, and an understanding of its challenges is essential for success.
The Challenges Ahead
Technological needs require planning and attention to the most likely advances
Although the current state of computer and information technology has reached a point that makes the proposed complex of com-
puterized resources possible, a number of technical requirements must still be addressed. Such topics received a great deal of attention during the task force deliberations. In the three major areas of databases, networks, and imaging technologies, imaging is the most advanced, from a technological standpoint, and therefore is likely to present the fewest barriers to direct application to neuroscience. Database and network technologies, although sufficiently mature to be applied to neuroscience in a productive manner, will require more modification. To transmit complex images, for example, networks must be upgraded according to the plans formulated for the National Research and Education Network. The task forces were unanimous in their support for careful planning to implement high-bandwidth network links, with special attention to the concomitant upgrades of the local area networks that link researchers inside universities with the national networks. The task forces also emphasized that the use of optical disks or other mass storage media to transmit data physically should be encouraged to obviate complete dependence on network links.
Database management technology presents the most difficult technical challenge in the initial implementation of the proposed resource complex. Currently, the most popular and most useful database design is the relational database; yet these databases cannot meet all of the eventual needs of the neuroscience community because they do not handle image data well and cannot display image and text data simultaneously. Object-oriented database management systems can handle image data, but they may not be widely available during the 1990s. Database developers from the task forces therefore recommended starting with relational models and planning for the eventual conversion to object-oriented systems.
Certain approaches may aid this planning. For example, the Entity Relationship Model is a database design tool that is used to plan the relationships to be set up in a database. In this model, definitions of individual items and their relationships are established before the database is constructed. These definitions are useful to plan both relational and object-oriented management systems. An additional advantage to this approach is that, once completed, it provides a record of the defined relationships, which facilitates later modifications. The model currently is being used at the National Center for Biotechnology Information at the National Library of Medicine in its effort to link Genbank and the Protein Information Resource databases.
Another critical aspect of database design is user interfaces. This area is quickly emerging as a unique subspecialty of computer science, combining human factors research with interface software design and query language development. Interfaces must be easy to use, yet
powerful enough to enable the user to extract needed information. Balancing these two qualities is rarely easy, and many task force participants argued for special attention to this issue. They also recommended early involvement of user interface specialists and examination of the experiences of genome database developers in the design of useful interfaces.
Underlying both database and user interface design questions is the challenge of developing software that allows data to be accessible and usable. Indeed, hardware problems are minimal compared with the problems associated with developing software. For example, neuroscientists currently use a variety of computers that run on different operating systems and employ a wide range of software tools; as a result, communication among these systems is severely limited. The task forces made a number of recommendations regarding this problem. One was to develop platform-free software for the databases that would be independent of the operating system being used in the local computing environments and that could circumvent compatibility barriers. However, a better approach might be to design translation programs that could bridge the differences between software programs. In the experience of many participants, software development represents a major portion of the costs associated with database development. Therefore, active encouragement of so-called “shareware” development may produce savings. Shareware is software developed by individuals that subsequently is shared free of charge with others. Examples include software developed in government laboratories, such as that developed for image analysis of results from deoxyglucose experiments (see Chapter 4 ). The disadvantage of shareware is that often it is not as finely tuned or as carefully maintained as commercially developed software and requires those using it to make modifications for their own needs.
The general theme in the task force software discussions was the advantage of being open to a variety of strategies. Also noted, however, was the need to develop software tools that were directly applicable to research issues. For example, the human imaging group (Task Force 3) identified the need for improvements in the software for registering images with one another. In Task Force 4, participants were acquainted with recent advances by industry and academic research centers in designing software for fast searches of data within a database, including scientific databases. A consensus emerged through these discussions that computer scientists must work in close collaboration with neuroscientists to address the complex problems of software development.
Another area of technological challenge is the mechanism of data
storage. Numerous storage options are available, but the choice depends on neuroscientists' needs for specific kinds of data transmission. For example, transmission over networks is compatible with central storage of data in mainframe computers. Because networks currently are inadequate for image transmission, however, transportable media such as optical disks or CD-ROMs are an attractive alternative. Another advantage of transportable storage media is that they may increase accessibility to information, allowing a wider user group, including international users. One computer expert cautioned, however, that the physical organization of the data through these storage mechanisms should not be confused with the logical organization of the data themselves—such confusion would limit access to the data.
The final technical topic covered by the task forces was the development of technical standards for data exchange. The technical issues inherent in standards development are closely related to certain research issues and were of great concern to those who attended the committee 's open hearings or contributed written commentary. The issue of standards is explored in the following section.
Standard formats are necessary, but they must evolve
The task forces outlined four categories of technical standards that need to be developed and examined. First, for data representation, standard data exchange formats are required for textual and numerical data and for the generation of images and graphics. Second, for algorithm representation, mechanisms for conveying new algorithms should be considered for a variety of possible applications. Third, standard user-interface packages should be considered as a way to reduce the barriers to the actual use of a given computerized resource. Finally, standard communication protocols should be expanded to provide the dynamic range of data accessibility required for researchoriented databases. Each of these categories reflects the fact that coordinated computer resources require a high level of standardization to function smoothly. Data represented in a multitude of ways, with differing algorithms and with different mechanisms for data access, are often difficult to extract. Communication is further limited by different computing environments and the ability of these environments to accommodate data in forms suitable for transmission.
How database developers approach these technical requirements can have major consequences for the usefulness of the resource. Many neuroscientists who commented on this issue expressed fears that standards might be imposed on them to which they would have difficulty adapting. But the task forces emphasized that standards should
evolve from experience and should be based on the needs of users. Their philosophy was that if a standard method or format worked well, usually people were willing to expend a little extra effort to take advantage of the resource for which the standard was designed. In addition, some suggested that establishing liaisons and joint efforts between the neuroscience community and standards development groups (e.g., the standards working groups of the Internet Activities Board) would help to increase awareness in broader user communities of the special requirements of neuroscience data.
The evolution of standards must also begin with an awareness of the specific scientific needs of the user community. One area of concern in neuroscience is that of nomenclature. Disagreements over the names of brain nuclei and subnuclei have been common since the beginning of neuroanatomy. Synonymous terms are widespread but avoidable barriers to communication. A number of participants asked for efforts to establish clear definitions of terms. Others suggested that the NLM expand the Medical Subject Headings (MeSH) in neuroscience subject areas. It might also be worthwhile to follow the lead of the genome databases, which hold regular meetings of user representatives and database managers to discuss nomenclature, as well as other issues.
The committee heard from several neuroscience groups that are already attempting to develop standards. One of these was the imaging community, whose members are seeking effective methods for comparing images. A strategy that is being explored is the general use of an atlas of the human brain as a standard reference frame for expressing neuroanatomical coordinates and regional boundaries. Establishing such a standard might also help to define the range of variability in human brain structure, something that currently is not known. Clinical and basic neuroscientists also suggested standard annotation of experimental data. In human imaging studies aimed at defining structure/function relationships, certain baseline information is now recorded in highly individual ways, which limits its usefulness. At a minimum, such information might include age, handedness, sex, educational level, or any characteristic feature of the subject or experimental group. Basic scientists interested in comparing images (e.g., maps of receptor binding or immunocytochemical localization) asked that images be tied to precise annotative information regarding experimental conditions, calibrations, chemical methods, animal weights, and other appropriate data.
In summary, any discussion of standards touches on technical, scientific, and sociological issues (see National Academy of Sciences, 1989). Although standards development is necessary and cannot be
ignored, the clear majority of participants in the committee's consultative process were in favor of allowing most standards to evolve, based on hands-on experience and careful consideration of users' needs. What is critical to keep in mind is that the design of usable standards, which do not involve large investments of individual user's time, will determine, in part, whether an electronic or digital resource will be widely used.
Technology drives sociological change
The sociology of science can be defined as the ways in which scientists work and how they interact. Different scientific fields and subspecialties may display varying sociological attributes. Yet as most of us have witnessed in our everyday lives, the incursion of technology can produce very profound changes in sociological patterns. In each of the committee's consultative activities, from task force meetings to open hearings, the question was asked: How will the incorporation of electronic and digital resources in neuroscience change neuroscience and neuroscientists? The challenge for the future is to understand the technology's possible effects and to begin to develop the policies and approaches necessary to cope with these effects.
One of the advantages of electronic communication and the establishment of informal databases will be the ability to share data with one 's colleagues or with the entire community. Data sharing can greatly increase the amount of information gleaned from each experiment and thereby quicken the progress of investigation. In addition to publication of the results of an experiment or study, data sharing could comprise very preliminary data or complete data sets. Yet despite the benefits that can be expected, there are also perceived risks for the investigator wishing to share data (Fienberg et al., 1985; National Academy of Sciences, 1989). One risk is the investment of time in an activity that results in no tangible benefit to the investigator. Methods for assigning credit to scientists who contribute data to databases are rudimentary at present. Because journal publications and other formal mechanisms for assigning credit for specific scientific concepts and results carry substantial weight as benchmarks of an individual 's success in science, the matter of proper credit is important. Another risk involves the rights of human research subjects. The strict guidelines that protect the privacy and identity of human research subjects were developed before electronic networks and digital databases were in common use. The sharing of data (e.g., PET and MRI data) obtained from studies of human subjects will require additional
consideration of methods to ensure confidentiality. Throughout its activities, the committee heard strong arguments against any imposed policy of data sharing. Rather, identification of the risks and disincentives to sharing should be a priority, followed by formulation of policies to provide meaningful incentives and protections for investigators who share their data. Finally, formal attention should be paid throughout the development and use of computerized resources to the ethical issues involved in data sharing, including those pertaining to the privacy of human research subjects.
Task forces and other participants explicitly identified the risks they perceived from data sharing and possible strategies for lessening those risks. First, methods must be developed to ensure that proper credit is assigned to those who contribute data for sharing—especially if the sharing is with a group that extends beyond formal collaborators. The responsibilities of the investigator who shares the data, as well as those who use the data, need to be defined. Mechanisms to protect all involved parties require careful reflection and measured policy formulation. For example, journals devoted to gene mapping and protein and gene sequencing are beginning to require that investigators deposit their raw data as a condition of publication. Participants suggested that a close examination of the benefits and difficulties attaching to these journal practices would be helpful to the neuroscience field. Another suggestion was to encourage university tenure committees in their decision-making processes to consider certain types of data sharing, particularly sharing of peer-reviewed data, as comparable to journal publication and teaching competence. It was clear from the discussions that neuroscientists, and much of the biomedical science community, are only beginning to grapple with the issues inherent in data sharing. Given this context, continued analysis and discussion are likely.
Another area of concern to participants was how to ensure that computerized resources would be accessible to more than just a few well-funded laboratories. The technological capabilities resident in neuroscience laboratories cover a fairly broad range. Restricting access by limiting the resources' usefulness to only highly sophisticated computer systems would be highly undesirable. Participants also noted, however, that the recent trends of computers becoming more powerful and, at the same time, cheaper will continue unabated. In addition, computing style eventually diffuses within specific user communities. Therefore, although initial planning phases might concentrate on more sophisticated hardware, it is likely that such computers would be generally available at lower cost only a few years down the line. Nevertheless, many participants saw the development
of mechanisms to increase access across a range of technological capabilities as an important goal.
There is understandable resistance to the integration of technology into the way people work. One senior administrator, who had seen the development of computerized molecular modeling, expressed surprise about how long it took investigators to embrace the technology and begin to benefit from it (see also National Academy of Sciences, 1989). Throughout the committee's activities, awareness of this kind of resistance brought repeated cautions that it would be unreasonable to expect everyone to view the establishment of computerized resources for neuroscience with the same enthusiasm. To deal with this reality, most participants endorsed continued discussion and communication within the neuroscience community. Some even suggested that professional societies (e.g., the Society for Neuroscience) might play a role in fostering communication about these issues.
A final sociological issue raised by the committee's activities relates to changes in the work force that result from greater use of technology. It is becoming more and more common for one person in a laboratory to be the “computer expert.” This person is called on to solve software problems and make the laboratory's computers work effectively to support the group's research. Ten years ago, these people were often undergraduate students, many of whom were not planning to pursue a biomedical science career. Increasingly, they are individuals who are trained in science but who also have experience and interest in the technology that supports that science, thus earning the title “scientist-programmers” (Anderson, 1989). As valuable as such people are, their career possibilities are relatively constrained. Jobs, promotions, and tenure traditionally are based on publication of research results, not the development of useful technologies to conduct research. A complementary dilemma faces computer scientists who are interested in biomedical science applications but find it difficult to publish such work in the computer science literature. Many participants stressed the value of scientist-programmers to the work of their laboratories and voiced the hope that reward structures could be improved for this new segment of the neuroscience work force.
The sociological implications of the increasing role of computer and information technology in research touch many who work in the neuroscience field (Denning, 1987), and consideration of these issues should be an integral part of planning for the future. A common theme among the topics covered in this section—sociological issues, the needed technical applications, and the development and acceptance of standard data formats—is the absence of experience on which ben-
eficial policies can be founded. Mechanisms to gain such experience constituted a major topic of discussion for the task forces.
Strategies for Building a Base of Experience
Pilot projects are good starting points
Although uniformly supportive of the long-range goal to integrate computer and information technology into neuroscience research, all four task forces strongly recommended the establishment of pilot projects, so that a badly needed base of experience could be built. These views led to the concept of a two-phase effort. The specific kinds of pilot projects that were suggested differed from one group to another, but through communication among the task forces, the suggestions coalesced into a unified concept for the committee's consideration. Rather than separate entities, pilot projects should represent a coordinated “family” of efforts with certain goals in common. Motivating the suggestion of pilot projects was the belief that such an approach would allow “in-house” development, controlled by the eventual users of the tools and resources being examined. The importance of this kind of development was stressed by several task force members with backgrounds in database design and administration, as well as by the group that considered the Defense Mapping Agency's experience in developing computerized tools (Downs et al., 1990).
A consensus emerged that groups of investigators should constitute this pilot project program. Each of the groups would involve neuroscientists working on a specific topic, some or all of whom would be geographically separated. Task force members with experience developing databases emphasized the importance of a single, clear, unifying concept behind each group; they recommended that each group be organized around neuroscience topic areas but include clear technological goals and objectives. Most important, organization around neuroscience topics would ensure that the building of the experience base proceeded in concert with continued research and discovery about the brain and its functions.
The suggested goals of such a program were generated from discussions of research and technical needs, challenges, and opportunities, which were described in the first two sections of this chapter. Although each pilot project would not necessarily address each issue, the overall goals of the program as a whole would be the following:
Develop digital data collection and storage methods for data at multiple levels of the neural hierarchy.
Identify the kinds of data, level of resolution, and experimental information necessary to facilitate new insights and stimulate research.
Examine and evaluate the various capabilities (e.g., browsing, graphic interfaces) that can increase use of the resources and enhance access to meaningful information.
databases to informal databases for research collaboration.
across different computing environments, for user interfaces, for network transmission of images, for data searching, and for image generation and comparison.
collection schemes, and to evaluate the evolution of these standards.
electronic means, including networks and transportable media.
experiences and technological developments.
The task forces considered a number of factors in relation to the neuroscience topic areas that might be chosen for the pilot project program. One was the differences among subspecialties of neuroscience in their computer “readiness”—that is, the degree to which the data in a specific field are already in digital form. For example, most of the data from human imaging studies are in digital form; a pilot group working in that area could focus its work on the goals of standards development or data sharing. In contrast, a group whose data are largely in photographic or other nondigital forms might concentrate on developing digital data collection mechanisms.
Different neuroscience subspecialties can also be separated by the “horizontal” versus “vertical” range over which they extend in the neural hierarchy. Those involved in mapping the locations of different neurotransmitters or receptors might confine the majority of their research questions horizontally to the cellular or systems levels of the hierarchy. In contrast, researchers interested in pain, sleep, or substance abuse are often interested in information that extends vertically to almost every level of the hierarchy. Because such groups may well have different data collection and management needs, including a range of neuroscience subspecialties in the pilot projects would be advantageous.
In addition to groups that vary in readiness and research focus, task force members also suggested the inclusion of groups with different kinds of computer expertise. First, individuals with programming and computer graphics expertise are needed. (Some of these experts
might be scientist-programmers; others might be computer experts who lack a strong working knowledge of biology.) Second, the expertise of information technology and networking personnel should be tapped. Finally, database specialists should be involved. Again, each program group would not necessarily include all of these experts— choices would be based on the group's defined goals. The task forces recognized that the inclusion of computer and information science experts presented difficulties in that such arrangements are not typical of the personnel structure in most federally funded biomedical research projects. To offset these difficulties, task force members suggested that sources of technical expertise, such as the National Center for Biotechnology Information, university computer science departments, or supercomputer research centers, be explored to ensure on-site availability of computer science expertise.
Finally, some task force members felt it was important to include neuroscientists with various degrees of computer experience. Neuroscientists with very little experience can provide insights about the usefulness of prototype user interfaces and data organization schemes, as well as the feasibility of standards protocols and methods for converting data to digital forms. Further, communication among neuroscientists with such varied experience will ensure an understanding of the needs of the larger neuroscience community.
In summary, the task forces made specific suggestions regarding the critical need to begin to build a base of experience in the incorporation of computer and information technologies into neuroscience research. In addition, they made recommendations on the composition of pilot project groups and the goals for the overall program. To coordinate these groups and provide a focus for applying the experience they gain to the long-range goal, the task forces also recommended to the committee that a coordinating structure be established.
Coordination, oversight, and evaluation will be needed
A key area of consensus among the task forces was that pilot projects required coordination and that oversight and evaluation mechanisms were crucial to the eventual implementation of a complex of computerized resources for neuroscience (see also National Academy of Sciences, 1989). The experiences of the task force members involved with development of the genome databases underscored this need: each of the genome databases had developed independently, and interlinking these disparate systems was proving to be troublesome (Smith, 1990). On the other hand, an excess of central oversight and planning can isolate users from the development process, as occurred
in the modernization efforts of the Defense Mapping Agency (Downs et al., 1990). Therefore, task force members considered it essential to establish a balance.
The structures suggested for coordination of the pilot projects were varied. Two of the task forces envisioned a multidisciplinary advisory board or committee that would be responsible for coordinating the pilot project activities and, possibly, for quality or editorial control of the contents of the database. Another task force assigned the responsibility for coordination to a host institution or core facility for each project. This host institution would document and evaluate the communication and collaboration achieved through the pilot project and evaluate standard data formats and software tools. In this scheme, the final evaluation in advance of implementing the long-range goal of a national effort would be conducted by a board composed of neuroscientists and computer scientists. Finally, the fourth task force suggested that regular meetings among the different pilot projects be held to validate data, assess needs and progress, and coordinate the exchange of software, protocols, and operational methods. In addition to these meetings, the task forces suggested that a nonprofit organization, similar to the Human Genome Organization (HUGO), be established. This organization would be responsible for long-range planning and coordination among various federal funding agencies.
The task forces did not enumerate the responsibilities appropriate to these various oversight structures. Nevertheless, they made numerous suggestions about what tasks eventually could be assigned to such boards or organizations. For example, a consensus brain database requires editors to ensure the quality and accuracy of the information it contains. In addition, the efficient use of large databases normally requires some training. Therefore, a service and educational component was recommended. Another suggestion was the establishment of boards for reviewing publications and data submitted to a comprehensive brain database.
By the close of the committee's activities, there was a strong consensus that the long-range goal of building a complex of computerized resources for neuroscience was technically feasible and that the realization of this goal would greatly enhance neuroscience research. The necessity of building a base of experience from which to realize this goal was reinforced throughout the meetings and open hearings. Key aspects of this base of experience should include the involvement of neuroscientists with computer and information scientists in pilot projects, the coordination and oversight of individual pilot projects, and careful attention to the sociological and ethical issues inherent in the use of computerized resources for science.
Funding the effort is an important issue
The committee's activities took place against the backdrop of an exceptionally difficult year in biomedical research funding. It was a year in which emergency meetings were called to assess the effects of the funding crisis on the future of biomedical science in the United States, and in which many issues of Science or The Scientist contained an article about decreasing award rates for grants or the disincentives to entering science as a career (Bloom and Randolph, 1990; Culliton, 1990; National Academy of Sciences and Institute of Medicine, 1990). Understandably, this climate left its mark on the opinions of the participants in each of the consultative activities.
Nearly all participants argued that, if the establishment of a complex of electronic and digital resources for neuroscience were important enough to undertake, appropriation of additional funds for its establishment would be necessary and justified. Most participants and respondents were enthusiastic about the potential benefits to be gained from greater integration of computerized resources in neuroscience and considered the additional investment to be justified. As might be expected, the majority of those expressing this view were neuroscientists with extensive experience in using computers for their research. Yet the task force groups and open hearings included neuroscientists with a range of computer experience. As the various meetings progressed, most neuroscientists who had moderate to minimum computer experience (and were often initially skeptical about the value of computerized resources) became excited about the advantage such tools might offer for their research. Nevertheless, despite this general enthusiasm, a few participants and respondents were opposed to the initiatives being considered—for two main reasons. First, they did not see enough benefit to justify the expenditure of funds. Second, they were concerned that the initiative might cause scarce funds to be sequestered and funneled to a small group of senior scientists. Committee members carefully considered all of these views as they developed their recommendations.
Anderson, G. C. 1989. New initiatives aim to emancipate “scientist-programmers. ” The Scientist (Sept. 18) : 23.
Bloom, F. E., and M. A. Randolph, eds. 1990. Funding Health Sciences Research : A Strategy to Restore Balance. Washington, D.C. : National Academy Press.
Culliton, B. J. 1990. Biomedical funding: The eternal crisis. Science 250 : 1652-1653.
Denning, P. J. 1987. The science of computing: A new paradigm for science. American Scientist 75 : 572-573.
Downs, A., B. Waxman, and C. Pechura . 1990 . Technological Implications of Cartography and Remote Sensing for a National Neural Circuitry Database . Background paper prepared for the Committee on a Neural Circuitry Database, Institute of Medicine.
Fienberg, S. E., M. E. Martin, and M. L. Straf, eds. 1985 . Sharing Research Data . Washington, D.C. : National Academy Press .
National Academy of Sciences . 1989 . Information Technology and the Conduct of Research . Washington, D.C. : National Academy Press .
National Academy of Sciences and Institute of Medicine . 1990 . Forum on Supporting Biomedical Research: Near-Term Problems and Options for Action . Summary of meeting held June 27, 1990 . Washington, D.C. : National Academy Press .
Schatz, B. R. 1991 . Building an Electronic Scientific Community . Pp. 739-748 in Proceedings of the 24th Annual Hawaii International Conference on Systems Sciences, IEEE Computer Society, vol. 3 .
Science . 1990 . What's on your mind? Check BrainMap (Briefings) . Vol. 250 : 1203 .
Smith, T. F. 1990 . The history of the genetic sequence databases . Genomics 6 : 701-707 .
Vela, C. 1990 . Overview of U. S. Genome and Selected Scientific Databases . Background paper prepared for Committee on a National Neural Circuitry Database, Institute of Medicine .
1. This preliminary survey was conducted by Elizabeth Meyer and Constance Pechura. For a description of the methods used, contact the Institute of Medicine, Division of Health Sciences Policy.