Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 Training Neuroscientists in Basic Research, Tool and Technology Development, and Big Data Key Highlights Discussed by Individual Participants â¢ Basic research is the foundation of the neuroscience enterprise (Lan- dis). â¢ Exposing trainees to the interplays among basic, translational, and clinical research can help to ensure a proper balance of the three is maintained through the next generation of neuroscientists (Landis). â¢ Trainees need to understand the fundamental principles that underlie the tools they use in order to understand the limitations of those tools and the situations in which they can be appropriately deployed (Marder). â¢ As the complexity of new tools increases, novel mechanisms for teaching trainees and other scientists to use them can facilitate wide- spread adoption (Landis). â¢ Handling and analyzing large amounts of data will be a major chal- lenge in the next era of neuroscience (Sejnowski). â¢ There are three major aspects to working with big data: data literacy, data management, and data sharing (Martone). NOTE: The items in this list were addressed by individual participants and were identified and summarized for this report by the rapporteurs. This is not intended to reflect a consensus among workshop participants. Basic research is the fuel that powers advances in neuroscience, not- ed Story Landis. A solid understanding of how neurons function, form neural circuits, and ultimately influence behavior underlies every effort to develop clinical treatments for neurological diseases (Koroshetz and 15
16 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE Landis, 2014). Several speakers and workshop participants discussed how to structure graduate programs to instill in trainees the best practices for conducting basic research. In addition, many participants highlighted the need for trainees to have a fundamental knowledge of the new tools and technology that are used to make basic research discoveries, as well as the ability to properly handle and analyze the big data that are gener- ated from them. THE NEED FOR INCREASED TRAINING IN BASIC RESEARCH In her presentation, Landis called attention to the important role that basic research plays in neuroscience. Without basic science discoveries, she said, there would be nothing to translate into clinical treatments and the whole neuroscience enterprise would collapse. Yet, an analysis of NINDSâs funding portfolio,1 overseen by Landis during her tenure as director, revealed that the instituteâs funding of basic science has de- creased over the years (see Figure 2-1). In 1997, basic research account- ed for 52 percent of NINDSâs overall budget. By 2012 that proportion dropped to 27 percent, while funding of clinical and translational science increased by a corresponding amount (Landis, 2014). Looking at data from 2 years, 2008 and 2011, requests for funding of so-called basic- basic research dropped by 21 percent, while disease-focused basic re- quests increased 23 percent, applied-translational requests increased 42 percent, and applied-clinical requests increased 38 percent (Landis, 2014). The success rate for basic science grants, however, remained un- changed over that time period (and was actually higher than all other cat- egories in both 2008 and 2011). While similar trends were not seen at the National Institute of Mental Health or NSF, Landis and several workshop participants expressed the need for training programs to emphasize to graduate students and postdoctoral researchers the importance of basic research, its relationship to translational and clinical research, and the 1 See http://blog.ninds.nih.gov/2014/03/27/back-to-basics (accessed October 28, 2014).
TRAINIING NEUROSCIIENTISTS 17 FIGUR RE 2-1 Perceentage of thee competing bbudget spent on unsoliciteed, investig gator-initiated grants in the four f subcategoories at the Nattional Institute of Neurological Disordeers and Stroke. SOURC CE: Landis, 20 014. need for f balance am mong these th hree areas (Yaamaner, 20144). In particular, Landiss and Eve Marder, M professor of bioloogy at Branddeis Universitty, stated that traineess need to heaar the messagge that not eeveryone has to conducct translationnal research to o get jobs annd funding. E Exposure to thhis messag ge about thee critical rolee of basic sccience could occur in coore coursees as well as nano-courses n or seminars tthat use succeessful neuroloog- ical treeatments as caase exampless to trace throuugh lines from m basic sciennce discovveries, to theirr translation into i drugs orr devices, treaatments, and fi- nally tot clinical tessting. There is i scope for sspecializationn; however, oone particiipant noted th hat training programs p cann become ceenters of exceel- lence for f basic, clinnical, or transllational sciennce.
18 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE TRAINING IN TOOL AND TECHNOLOGY DEVELOPMENT Increasingly, basic research discoveries have become dependent on the development of new tools and technologies, as well as the ability to handle, manage, and analyze the large quantities of data being collected with those tools. One participant recalled the deep reluctance that many students in the past had toward working to develop probes or assays, or otherwise push the technological aspects of neuroscience forward. Work on such projects was not highly valued, the participant noted; instead, students were more excited to use the new tools to make discoveries. While making important discoveries is still a priority, much of the cur- rent excitement in neuroscience stems from the development of tools and technologies, for example, optogenetics,2 CLARITY,3 and CRISPR.4 Many workshop participants noted that along with this excitement come a number of challenges, not only in training students how to develop powerful tools but also in training students on how best to deploy them while thinking deeply about their limits. As technologies are applied to advanced discoveries in basic neuroscience, there is also a growing reali- zation that those same or similar technologies can be used to provide therapeutic functions, noted Douglas Weber, program manager of the Biological Technologies Office at the Defense Advanced Research Pro- ject Agency (DARPA). Enabling Tool Development Through Transdisciplinary Collaboration Using DARPAâs Revolutionizing Prosthetics Program as an exam- ple, Weber discussed the myriad skill sets needed to develop the next generation of tools and technology. With the rapid growth and diversifi- cation of the field of neuroscience, he said, there has been a tendency for disparate groups to work in silos. He noted that groups work across dif- ferent scalesâfrom molecules to cells to networksâand study different systemsâfrom autonomic and sensory systems to cognitive functions. 2 The use of genetically encoded light-sensitive proteins to control neural activity with flashes of light. 3 Clear, Lipid-exchanged, Acrylamide-hybridized Rigid, Imaging/immunostaining compatible, Tissue hYdrogel. A process for replacing brain tissue with hydrogels to make the brain transparent in order to visualize neural ensembles. 4 Clustered Regularly Interspaced Short Palindromic Repeats. An RNA-based gene- editing platform that allows scientists to engineer any part of the human genome with extremely accurate precision.
TRAINING NEUROSCIENTISTS 19 Integrating information across these many scales and systems can be challenging. However, Weber expressed hope that these challenges can be overcome through programs such as the Brain Research through Ad- vancing Innovative Neurotechnologies (BRAIN) Initiative,5 which incor- porates a strong focus on finding ways to synthesize information across these many scales to yield a more holistic understanding of how the brain works. The goal of DARPAâs program is to modernize the design and func- tion of prosthetic hands and arms, which have lagged far behind lower limb prosthetics, he noted. Until recently, artificial hands consisted of a hook system attached to a cable wrapped around the userâs shoulder that is controlled using simple shoulder shrug maneuvers. This basic design had remained relatively untouched since the days of the Civil War. In thinking about its redesign, DARPA used as its inspiration the prosthetic hand given to Luke Skywalker after his had been severed at the wrist. That is, they sought to build a realistic-looking articulated hand with sev- eral degrees of freedom for the wrist and each digit, all integrated into the userâs nervous system and controlled directly by the brain. Weber mentioned several skill sets that the 400-member team charged with cre- ating the integral pieces of DARPAâs revolutionary prosthetic hand needed: â¢ Neuroscientists with expertise in sensory feedback and haptics, neural motor decoding and neural stimulation â¢ Materials science: Materials for every physical piece of the handâfrom the lifelike cosmetic covering that needs to be flexi- ble, durable, and waterproof to the biocompatible electrodes that interface with the userâs nervesâneed to be carefully selected, designed, and tested â¢ Systems engineering â¢ Mechanical engineering â¢ Software engineering â¢ Wireless communications â¢ Signal processing â¢ Modeling: Models for how information to control specific motor movements (e.g., reaching and grasping) is encoded in the pat- terns of neural activity that are represented in the brain â¢ Human factors 5 See http://www.whitehouse.gov/share/brain-initiative (accessed October 29, 2014).
20 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE â¢ Data analysis â¢ Behavioral analysis â¢ Surgery â¢ Physical therapy â¢ Occupational therapy â¢ Human subjects research â¢ Manufacturing â¢ Project and program management Several workshop participants discussed strategies for enabling this type of transdisciplinary collaboration at the level of graduate training pro- grams to encourage tool development. One example would be developing courses with other departments that offer hands-on labs for students to ex- amine specific topics might encourage collaboration among disciplines. One example of this approach is the University of Pennsylvaniaâs course on âBrain-Computer Interfacesâ in which neuroscientists work collabora- tively with engineers and physical scientists on programming projects.6 A few workshop participants noted that another method for encouraging transdiscipline approaches is the NSF Research Traineeship (NRT) grant program (formerly the IGERT [Integrative Graduate Education and Re- search Traineeship] program),7 which provides training funds for a group of graduate students from different departments within a university to work together on a single project. NIH can facilitate cross-discipline ap- proaches by issuing awards similar to NRT and through the creation of centers of excellence, such as the Morris K. Udall Centers for Parkinsonâs Disease Research, which are hosted at nine universities. NINDS has creat- ed a novel type of center, called the Epilepsy Centers without Walls, which bring together dozens of scientists to work on a single aspect of epilepsy regardless of their physical location. One center is focused on the investi- gation of sudden death in epilepsy and includes expertise in neuroscience, genetics, anatomy, clinical research, imaging, pathology, stem cells, in- formatics, molecular biology, and data analytics. John Morrison, professor of neuroscience at the Icahn School of Medicine at Mount Sinai, also em- phasized the importance of neuroengineering and suggested that neurosci- ence departments establish links with schools of engineering. He also suggested that more universities develop Ph.D. programs in neuroengi- neering to create more expertise in this area. 6 See Chapter 3 for further discussion about this course. 7 See Chapter 3 for further discussion about NSF Research Traineeship grants.
TRAINING NEUROSCIENTISTS 21 Demystifying Neuroscience Tools Each new neuroscience tool and technique has its own idiosyncrasies and drawbacks, as well as unique demands related to analyzing the data it produces, said Landis. Therefore, she cautioned, it will not be enough to simply know how to use a tool, but rather it is important for trainees to know the fundamentals of the tool and its function(s) and shortcomings, which in turn help them to troubleshoot problems that arise. Marder agreed, stating that trainees need to demystify all of the tools they are using, and not be mere consumers. She highlighted that this is particular- ly true when it comes to optics and microscopy. For fluorescence mi- croscopy, it is intuitive how the microscope works as one manually focuses and changes the objective. But for 2-photon microscopy and oth- er less intuitive tools, studentsâ lack of understanding of the technology can be a detriment because they are less likely to recognize when a prob- lem is occurring. Marder is equally concerned with whether next- generation microscopes will be too complicated for most students to learn to use proficiently. The expense of these new microscopes, which can run in the millions of dollars, means that only students enrolled in a few well-endowed programs will have the opportunity to learn to use them. Marder identified this issue as having the potential to be a major gap in training. One step Marder has taken to close the gap in under- standing the fundamentals of optics is by encouraging her students to take a microscope and optics lab course at Brandeis University (which is open to neuroscientists), in which students build their own microscopes. Marder suggested a number of steps that graduate programs can take to enhance studentsâ understanding of the tools they use. Programs can develop more tool-based lab courses and they can also look to outside sources of training. For example, programs can encourage students to attend courses at Cold Spring Harbor Laboratory and the Marine Biolog- ical Laboratory at Woods Hole8 that focus on teaching the fundamentals of a variety of lab tools and techniques. Programs can also fund student enrollment in mini-courses devoted to single techniques that teach train- ees the practicalities and specific details of new tools and techniques. 8 See further discussion about these courses are provided later on in this chapter.
22 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE Dissemination of Tools To close the gap in student understanding of new tools, several workshop participants asserted that novel mechanisms for tool dissemi- nation are needed. Landis suggested that plans for dissemination of any new tool could be part of the grant applications seeking funding to build the tool. While the BRAIN Initiative has no requirement for such plans in the grants it issues, the BRAIN 2025: A Scientific Vision9 report clearly values the widespread dissemination of the new tools that it is funding (NIH, 2014). Accordingly, the NIH BRAIN Initiative is funding a short course in the use of new tools and another in the analysis of large da- tasets.10 Some neuroscientists have taken the initiative to set up training op- portunities to ensure the spread of the technology they have developed, rather than restricting its access. Optogenetics has been successful in part because its creator, Karl Deisseroth of Stanford University, used a re- search supplement from NINDS to organize free 3-day workshops to train faculty and students from around the world in the required surgeries and techniques. These are held both in university settings and in course modules at Cold Spring Harbor Laboratory and the Marine Biological Laboratory at Woods Hole. Furthermore, when Deisseroth discovered that scientists were struggling to use one of his more recent technologies, CLARITY, he published a highly detailed methods paper to explain some of the more complex aspects of the technique (Tomer et al., 2014). He has also organized free 3-day workshops on CLARITY throughout the year at Stanford. As is the case for the optogenetics workshops, the CLARITY workshops have a dedicated expert in the technique to act as education manager. Mark Schnitzer, a scientist at Stanford University, has taken a differ- ent approach to disseminating his state-of-the-art invention. Along with several colleagues, Schnitzer founded a company called Inscopix to pro- duce the nVista HDâa miniaturized, head-mounted microscope to visu- alize large-scale neural circuit dynamics in freely behaving animals. To encourage scientists to use the device, Inscopix has set up a competitive grant program that will offer the use of one to four nVista HD micro- scopes as well as extensive training in their operation. 9 See http://www.braininitiative.nih.gov/2025/BRAIN2025.pdf (accessed October 29, 2014). 10 These courses are described in more detail in Chapter 4.
TRAINING NEUROSCIENTISTS 23 Another enterprising neuroscientist, Raphael Yuste of Columbia University, has recently founded the NeuroTechnology Center along with a chemist, bioengineer, and statistician. The goal of the center is to develop advanced optical, electrical, and computational technologies to study the nervous system. In addition, the center plans to use funds from the Kavli Foundation to offer training in these new technologies to neu- roscientists at all levels. Several workshop participants also discussed opportunities for grad- uate students and postdoctoral researchers to engage in intensive summer courses in the use of cutting-edge tools, including courses offered by two well-established training facilities: â¢ Marine Biological Laboratory Summer Courses11 o Neurobiology o Neural Systems and Behavior â¢ Cold Spring Harbor Laboratory Summer Courses12 o Advanced Techniques in Molecular Neuroscience o Imaging Structure and Function in the Nervous System TRAINING IN BIG DATA Until recently, the primary challenge in neuroscience has been col- lecting useful information about the brain, said Sejnowski. In the first half of the 20th century, neuroscientists exploited principles of physics to record electrical signals from neurons and develop optical methods to visualize anatomy and morphology. In the latter half of the century, mo- lecular biology techniques further expanded the repertoire of data that could be collected. The next era of neuroscience will be dominated by challenges in the ability to handle, manage, and analyze all of the data that are now becoming readily available, noted Sejnowski. Not only will there be challenges in how to manage this large amount of data, but en- tirely new methods will be needed for integrating different data types and analyzing enormous, multidimensional datasets, he added. Neuroscience is not the first discipline to be faced with big data is- sues. For decades, physicists have had to manage large amounts of data; 11 See http://www.mbl.edu/education/summer-courses (accessed October 29, 2014). 12 See http://meetings.cshl.edu/courses.html (accessed October 29, 2014).
24 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE however, many of the datasets in physics are collected in manners that have standardized data structures and annotation. Neuroscience data col- lection is less standardized and the scale and organization more closely resemble the field of genetics, which has been deluged by servers full of genetic data generated by increasingly more powerful sequencing ma- chines since the first genome was cracked more than 20 years ago (Choudhury et al., 2014). Walter Koroshetz, acting director of NINDS, suggested that neuroscience would benefit from considering lessons learned by geneticists regarding their strategies for managing data. Maryanne Martone, co-director of the National Center for Microscopy and Imaging Research at the University of California, San Diego, dis- cussed the critical need for training future scientists to work with big da- ta, focusing on data literacy, data management, and data sharing. Defining the Gaps in Handling Big Data In discussing the big data challenges facing neuroscience trainees, Martone quoted Michael Nielsen, author of Reinventing Discovery, âAn unaided humanâs ability to process large datasets is comparable to a dogâs ability to do arithmetic, and not much more valuableâ (Nielsen, 2012, pp. 112â113). She went on to discuss three highly interrelated as- pects of data handling that all trainees need to be educated about data literacy, data management, and data sharing. Data Literacy Martone noted that although not all neuroscientists need to be data scientists they will be required to use platforms to share and analyze da- ta, and need to be able to understand the fundamentals of large datasets. She made the analogy of taking a class on auto mechanics in high school, not because she ever intended to fix her car, but because she wanted to be able to talk to the people who were going to fix the car. Likewise, at- taining a minimum level of data literacy will require some specialized training in areas such as data type, structured data, databases, metadata, query languages, and data formats. In addition, an important aspect of data literacy, said Martone, is being able to navigate the âweb of dataâ to find the right dataset. Knowledge of Web services, application program-
TRAINING NEUROSCIENTISTS 25 ming interfaces (APIs),13 data repositories, Web scraping,14 and online spreadsheets are all helpful for identifying sources of data. Martone pointed out that most of the data that scientists encounter are not actionable. Instead, data get locked away within journals as static figures due to the current publication process. Some journals, such as Nature Scientific Data, are already providing open access databases for all of the data presented in an articleâs figures and tables. Martone added that the more that trainees are taught to understand the difference be- tween static and actionable data, the more pressure will be put on all journals to adopt similar practices. Open access to data will also enhance a culture of sharing and make the scientific enterprise more transparent, she noted. Several participants stated that both of these developments might help to address the crisis of irreproducible data that the scientific community is now beginning to face. Another aspect of data literacy, according to Martone, pertains to knowing oneâs data rights. Trainees need to know what rights they have to their data when making them public. They also need to know the rules concerning the use of publicly available data. Specifically, Martone said that trainees need the skills to evaluate which datasets are relevant to their own projects and have been collected with the proper vigilance and rigor. Data Management For data to be useful, they need to be properly managed, noted Martone. That is, they need to be collected in an appropriate standardized format, made readily accessible and interoperable on standardized platforms, annotated, and securely stored. She added that part of the challenge of sharing data is properly annotating them in order for others to understand the context in which they were collected. Having annotation standards in place ensures that each lab that collects a certain type of data could effec- tively use data shared from another lab. Standards take the guesswork out of what information to collect during the experiment. Many participants stated that standard data formats are also critical to sharing data. Accord- ing to Brian Litt, director of the Center for Neuroengineering and Thera- peutics at the University of Pennsylvania, standard data platforms, rather than a proliferation of individual databases, are helpful for groups and 13 Snippets of computer code that allow web-based applications to share information with one another. 14 The automatic extraction of useful data from websites.
26 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE individuals to keep track of their data, share data with others, and find relevant data that others have shared. Data platforms are central Web- based hubs that can be used to integrate and validate multidimensional, heterogeneous data from multiple sources and present them in a clean, standardized manner. Data platforms can also be used to share experi- mental procedures, analytics programs, and models. For high-output labs, which can produce more than a petabyte per year (see Box 2-1), and even many smaller labs, backing up data has be- come more complicated than simply saving everything to a series of ex- ternal hard drives or DVDs. Without a well-considered data management strategy in place, data are at risk of being lost, and older data can be dif- ficult to trace. Martone noted that she has heard senior scientists lament the fact that they feel like they have lost control of their own lab because they no longer know where their data are stored. Some funding agencies, such as NSF, have mandated data manage- ment plans to ensure that data generated via agency grants are secure and easily shared. However, because the plans are not enforced, sharing has been stymied and a significant number of labs are still at risk of potential data loss, noted Martone. A change in the overall culture, starting with trainees, regarding data management will be the only effective means of ensuring widespread sharing and prevention of potential data loss, she added. Martone mentioned several opportunities to improve data manage- ment. For example, some labs manage data with electronic laboratory notebooks to keep track of their data and to maintain digital records of experiment notes. Martone also noted that while most universities do not have centralized data depositories or support networks in place many libraries have been serving as curators of the digital assets that the labs at their universities produce. Data curators will be essential to the neurosci- ence enterprise; however, there is currently a lack of training programs and defined career paths to consider. Until the field of data curation be- comes more formalized and more valued by universities, many data sci- entists will likely occupy a status in labs similar to research technicians employed directly by investigators, said Martone.
TRAINING NEUROSCIENTISTS 27 BOX 2-1 How Big Are Big Data? How much neuroscience data are currently being collected is diffi- cult to quantify, but some high-profile projects have estimated their output: â¢ Jeff Lichtman at Harvard University estimates his connectomics projects can generate 1 terabyte (Tb) per day (or 365 Tb/year), with a 1 cc brain tissue sample containing roughly 2,000 Tb of data.a â¢ The Human Connectome Project, which plans to collect diffusion tensor imaging and resting state functional magnetic resonance imaging from 1,200 human subjects, is expected to generate more than 30 Tb of data.b â¢ The Kavli Foundation estimates that a single advanced brain la- boratory could produce 3,000 Tb of data annuallyâroughly as much data as the world's largest and most complex science pro- jects currently produce.c â¢ Calcium imaging studies in mice produce approximately 1 gigabit per second of data; anatomical datasets will readily grow to the approximately 10 petabyte scale and beyond.d _________________________ a http://www.quantamagazine.org/20131007-our-bodies-our-data (accessed October 29, 2014). b http://www.humanconnectome.org/documentation/Q1/data-sizes.html (accessed October 29, 2014). c http://www.kavlifoundation.org/science-spotlights/brain-initiative- survivingdata-deluge (accessed October 29, 2014). d http://www.braininitiative.nih.gov/2025/BRAIN2025.pdf (accessed Octo- ber 29, 2014). Data Sharing Of all the aspects of big data, data sharing was the most frequently discussed by the workshop speakers and participants. See Box 2-2 for recommendations and key points for academic institutions noted in Shar- ing Clinical Trial Data: Maximizing Benefits, Minimizing Risk, a report by the IOMâs Committee on Strategies for Responsible Sharing of Clini- cal Trial Data (IOM, 2015). Although most scientists would agree that making data public helps push science forward, at the individual level there are reservations about sharing that revolve around control, trust, and fear. Several workshop participants noted that educating trainees
28 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE about the benefits and risks of sharing data can help to alleviate these emotional concerns and facilitate a shift in culture around sharing. Con- trol over oneâs data has always been sacrosanct in science. But as Landis pointed out, data sharing is âthe wave of the futureâ and scientists will no longer be able to take their data to their graves. Trainees, she said, need to embrace the idea of making their data public. Akil brought up a poten- tially common anxiety among trainees, and scientists in general, about immediately making public the data they spend months or even years of their lives collecting, only to watch their colleagues publish the initial articles related to those data (Soranno et al., 2014). Litt suggested two potential mechanisms for creating a system that respects the rights of data collectors while maximizing the communityâs access to important or hard-to-acquire data. First is the idea of using data licenses to share data in stages or layers. Perhaps data can be initially shared among collaborators or a smaller group of scientists after a set period of time, and then later shared with the whole scientific communi- ty, noted Litt. The second idea is a sharing index, or S-index, akin to the well-known impact factor of the proposed H-index.15 The S-index, which would need support from universities, funding agencies, and publishers, could reward prolific sharing by playing a role in hiring and promotional decisions as well as in grant review. BOX 2-2 Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk The Institute of Medicine convened an ad hoc committee to develop guiding principles and a framework for the responsible sharing of clini- cal trial data. Related recommendations and key points for trainees are listed below. â¢ Recommendation 1: Stakeholders in clinical trials should foster a culture in which data sharing is the expected norm, and should commit to responsible strategies aimed at maximizing the bene- fits, minimizing the risks, and overcoming the challenges of shar- ing clinical trial data for all parties. â¢ Recommendation 2: Sponsors and investigators should share the various types of clinical trial data no later than the times speci- fied in this (IOM, 2015) report (e.g., the full analyzable dataset with metadata no later than 18 months after study completionâ with specified exceptions for trials intended to support a regulatory 15 Index used to measure the impact and scientific importance of a researcherâs publications.
TRAINING NEUROSCIENTISTS 29 applicationâand the analytic dataset supporting publication re- sults no later than 6 months after publication). â¢ Recommendation 3: Holders of clinical trial data should mitigate the risks and enhance the benefits of sharing sensitive data by implementing operational strategies that include employing data use agreements, designating an independent review panel, in- cluding members of the lay public in governance, and making ac- cess to clinical trial data transparent. Research Institutes and Universities â¢ Infrastructure Support: âHigh-quality data curation and man- agement are required to prepare for data sharing, so that investi- gators must both recognize this need and have appropriately skilled personnel available to themâ¦. Better overall support of the clinical trials enterprise within most institutions is needed to sup- port the kinds of data structuring and documentation that will be needed for data sharingâ (p. 62). â¢ Incentives: âAppropriate recognition of data sharing activities in the promotion process would provide incentives for sharing data and obtaining maximal value from completed trials. Other promo- tion-related incentives for data sharing would exist if promotion committees took into account secondary publications by others based on clinical trial data produced and shared by their facultyâ (p. 63). â¢ Training: âMost of the workforce that would be involved in activi- ties related to the sharing of clinical trial data are trained in univer- sities. Currently, there is little or no training within traditional clinical research education in the procedures and structures needed to share data. The development of such modules, either online or in classroom settings, could be instrumental in helping to move the field of data sharing forwardâ (p. 63). SOURCE: IOM, 2015. Similar to the idea of an S-index, Martone proposed the notion of separate acknowledgment in papers for those scientists who originally collected the data upon which the paper was based. Headed by Martone, FORCE 1116âa community of scholars, librarians, archivists, publishers, and research funders seeking to improve data sharingâis actively trying to create a mechanism to issue such data citations. Several participants noted that such incentives might help to reduce the anxiety among train- ees and investigators, and encourage data sharing. 16 See https://www.force11.org (accessed October 29, 2014).
30 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE A few participants noted that trust was another challenge with mak- ing data public. Much like clinical trial data, several workshop partici- pants noted that there is a moral imperative to share data to offer the greatest return to the public (see IOM, 2015). In addition, trainees need to learn how to effectively evaluate the trustworthiness of public data, as well as engender trust in data they themselves make public. Several par- ticipants stressed that without mechanisms in place to create trust scien- tists will be reluctant to devote large amounts of time analyzing shared data, or to put their reputations at risk publishing papers about those analyses. Fear of scrutiny and criticism are additional concerns that Martone speculated might make some scientists reluctant to share data. Scientists may be afraid that errors found in the raw data they make pub- lic could lead to embarrassment or more serious repercussions.17 One way to alleviate such fears, she offered, is for a certain level of data eti- quette to develop around sharing so that unintentional errors found in data are dealt with in a non-punitive fashion. Setting aside the various reservations scientists have about making their data public, Koroshetz, as well as several other participants, said that annota- tions, or metadata, are the most expensive and time consuming part of shar- ing data. Most experimental data have several pieces of metadata associated with them to include stereotaxic coordinates, cell type, stimulation parame- ters, and other experimental conditions. Even seemingly innocuous factors, such as the sex of the experimenter or the source of the food, have been known to significantly alter results in experiments with rodents. According to a few participants, tagging each set of electrophysiology traces or fluores- cent images with the appropriate annotations is not trivial, but this infor- mation needs to be integrated into the experiment workflow to maximize the utility of any shared data. Another challenge with metadata noted by several workshop participants is determining which parameters need to be included and which can be reasonably excluded. Several participants agreed that not all data are worthy of being shared, particularly given the potential cost; for example, Koroshetz noted 17 The Research Council of Norway: Norwegian Researchers Want to Share Data but Fear Jeopardizing Their Career. See http://erc.europa.eu/sites/default/files/content/pages/ pdf/2.4%20Roar%20Skalin.pdf (accessed October 29, 2014). Survey data in this workshop summary show that approximately one quarter of scien- tists say data sharing will negatively impact their careers due to at least one of the follow- ing reasons: making data available takes away valuable time for research; lack of technical infrastructure; open access would reduce possibilities of scientific publications; concerns connected to misinterpretation of data; and/or cannot give access due to sensitivity issues. Scientists with less than 3 yearsâ experience reported fewer concerns over sharing data.
TRAINING NEUROSCIENTISTS 31 the cost of NINDSâs databases for traumatic brain injuries ($2 mil- lion/year), autism ($2 million/year), and Alzheimerâs disease ($1.5 mil- lion/year). Some data, such as the human subject data from the Framingham Heart Study,18 are rare, while other data will become obsolete as technology continues to improve. In addition, Litt stated that public data deemed to be more valuable are also more likely to be annotated by users until they eventually become a gold standard. Sejnowski recounted an ex- ample from astrophysics with regard to creating high-quality public data. Grants for the Hubble Space Telescope are issued in two tracks: (1) a typi- cal R01-style study where data collection leads to individual publications; and (2) the collection of archival datasets that require significant effort to calibrate, but that are still used extensively as standards against which to compare new data. Sejnowski noted that neuroscience would benefit if NIH funded similar types of calibrated datasets. As Landis mentioned, it is not enough to hope that trainees will pick up enough knowledge about data handling through informal means; trainers need to have an active role structuring programs for these com- petencies to be developed. Trainees can be exposed to data-handling is- sues in lab courses, seminar series, or webinar series (see example skills in Box 2-3). According to a few workshop participants, training pro- grams can set requirements that students write data-management plans for their projects to accompany their thesis proposal or their Ruth L. Kirschstein National Research Services Award or NSF grants, which many programs already require as part of studentsâ qualifying exams. In addition, training programs can consider having an expert on data han- dling on staff, or share such a person with one or more departments, to act as a resource for students and faculty. BOX 2-3 Example Data Handling Skills and Knowledge Presented by Individual Participants â¢ Data management plans (and funding agency requirements) â¢ Data-sharing platforms â¢ Incentives for sharing (data citation, S-index) â¢ Evaluation of data trustworthiness â¢ Evaluation of data worth â¢ Data licenses 18 See https://www.framinghamheartstudy.org/about-fhs/history.php (accessed October 29, 2014).
32 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE â¢ Data rights â¢ Data standardization â¢ Data formats â¢ Data annotation â¢ Open-access journals â¢ Actionable versus static data â¢ Application program interfaces â¢ Web scraping â¢ Web services â¢ Online databases â¢ Cloud computing â¢ Data storage NOTE: The items in this list were addressed by individual participants and were identified and summarized for this report by the rapporteurs. This is not intended to reflect a consensus among workshop participants. Defining the Gaps in Data Analysis Although all neuroscientist trainees would benefit from training in best practices for data literacy, management, and sharing, a number of special skills are required to analyze large, complex datasets. Litt enumerated those skills and identified the best disciplines outside of neuroscience with which to build collaborations to address gaps in data analysis (see Box 2- 4). Litt also described two projects he is involved with that strive to en- hance training in data analysis among graduate students: 1. American Epilepsy Society Seizure Prediction Competition: Competitors are invited to download large datasets of intracrani- al electroencephalogram (EEG) recordings from dogs with epi- lepsy and develop algorithms to optimally predict seizure onset. 2. www.ieeg.org: Litt engineered a model data-handling platformâ found at ieeg.orgâthat lives on Amazonâs S3 browser-based cloud computing service. The platform, currently used by more than 500 people, enables sharing and annotation of computer code and EEG data from epilepsy patients. It also provides tools for large-scale analyses. Trainees at University of Pennsylvaniaâs Center for Neuroengineering and Therapeutics are required to use this platform, not Litt. They learn how to version their code, share data (structuring it into a common format), and use the cloud.
TRAINING NEUROSCIENTISTS 33 BOX 2-4 Data Analysis: Relevant Skills for Neuroscientists and Disciplines to Build Collaborations Data Analysis Skills Relevant to Neuroscientists â¢ Matlaba â¢ Rb â¢ Pythonc â¢ Apached â¢ Hadoope â¢ Visualization software â¢ Multivariate statistical analysis â¢ Competence in cloud computing (storage, retrieval, and distributed processing) â¢ Versioning of computer code script files â¢ Digital signal processing (aliasing, Nyquist, analog to digital trans- forms, filtering) â¢ Feature extraction (time, frequency, wavelet, chaotic) â¢ Data classifiers (supervised and unsupervised) â¢ Regression â¢ K-nearest neighbor algorithmf â¢ Support vector machines â¢ Data clustering â¢ Data basics (storage, databasing, integration, search, provenance) Disciplines with Which to Collaborate on Data Analyses â¢ Computer science â¢ Machine learning â¢ Engineering â¢ Signal processing â¢ Materials science â¢ Nanotechnology ______________________________________ a http://www.mathworks.com/products/matlab (accessed October 29, 2014). b http://www.r-project.org (accessed October 29, 2014). c https://www.python.org (accessed October 29, 2014). d http://www.apache.org (accessed October 29, 2014). e http://hadoop.apache.org (accessed October 29, 2014). f http://www.statsoft.com/textbook/k-nearest-neighbors (accessed October 29, 2014). SOURCE: Brian Litt presentation, University of Pennsylvania, October 28, 2014.
34 DEVELOPING A 21st CENTURY NEUROSCIENCE WORKFORCE Institute Example: Allen Institute for Brain Science Jane Roskams, executive director of strategy and alliances at the Al- len Institute for Brain Science (AIBS), described the goal of AIBS as making neuroscience tools, data, and knowledge readily and freely avail- able to the scientific community. AIBS employs multidisciplinary teams with experts in neuroscience, cell biology, modeling, data analysis, theo- ry, engineering, and genetics. Over the past 10 years, AIBS has collected more than 30 brain atlases (mouse, non-human primate, and human) and other large neuroscience-related databases. These atlases and databases, which contain more than three terabytes of combined data, are freely available to the public via an online portal. AIBS offers numerous oppor- tunities for collaboration and training related to data management and analysis through traditional classroom training sessions, summer work- shops, hackathons, and online webinars.19 19 See http://alleninstitute.org/news-events/events-training (accessed October 29, 2014).