Skip to main content

Currently Skimming:

3 Automated Research Workflows in Action
Pages 37-62

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 37...
... In some cases, the implementation of ARWs is fairly advanced, whereas in others only certain components or aspects of ARWs -- such as advanced computation, use of workflow management systems and notebooks, laboratory automation, and use of AI as a workflow component as well as in directing the "outer loop" of the research process -- are currently being utilized. The purpose of examining these specific areas of research was not to develop a comprehensive census of projects and initiatives, and these examples do not represent a complete picture of relevant work in general or in the disciplines that are represented.
From page 38...
... broader use of data outputs cyberinfrastructure approaches, such needs to be rethought exists as simulation-based • Reuse of workflows needs inference to be encouraged • Incentives and culture around data sharing vary by subfield Materials Approaches are Opportunities exist • Ability of humans to make science appearing that integrate to link workflows unexpected observations robotic laboratory and implement needs to be preserved instruments, rapid closed-loop systems • Data sharing and access are characterization, and AI inadequate • Shortage of researchers who can bridge the gap between experimentation and software development exists • Lack of a supportive culture within the community translates into weak incentives Biology Biomedical research Potential for drug • Shortage of experts who has been overturning discovery approaches can bridge the gap between reductionist paradigms using automated disciplinary and lab as they have lost experiments and AI automation expertise exists predictive power, is growing • Capability for real-time data paving the way for sharing is needed empirical data to guide discovery Biochemistry Tighter coupling Automation of high- • Nonproprietary data are between data science throughput synthesis insufficiently characterized, and chemical synthesis and screening can so labs making advances in can accelerate accelerate the design– this area have to generate optimization in drug make–test cycle their own discovery • Cultural barriers exist in the field (e.g., chemical synthesis has been an artisan process)
From page 39...
... AUTOMATED RESEARCH WORKFLOWS IN ACTION 39 TABLE 3-1 Continued General Use Case Characterization Opportunities Challenges and Barriers Epidemiology COVID-19 experience New data resources • Data that are not generated has catalyzed could be used to through a randomized initiatives to better understand controlled trial might be implement workflows disease interactions biased and improve • Confirming the quality and treatments provenance of clinical data is difficult Climate science Improving climate ARWs hold the • Climate sciences and models partly depends promise of rapidly weather data have on the ability to improving the traditionally been open, but understand and accuracy of model wider use of commercial simulate small-scale predictions data obtained through processes restrictive licenses is a threat Wildfire Interdisciplinary field New tools and • Development of an detection combines modeling, capabilities using AI integrated environment is remote sensing, data can support decision needed to pull data from science, and other making by public- and various resource monitoring fields private-sector users tools and convert them into predictive intelligence Digital Increasing use of New digital data and • Human in the loop is needed humanities computational tools AI tools can be used to provide training material and large data sets is to analyze big data for AI expanding the sorts (e.g., Latin texts) • Gap exists between of research questions traditional research and that can be addressed new tools needed by next-generation scholars Social and Access to large Real-time access to • Growing size and dispersed behavioral amounts of high- data and analysis can nature of data sets and sciences quality data is deliver actionable need to ensure integrity of available at relatively information personal information pose low cost, transforming challenges a number of fields • Improving metadata is necessary for robust findings and reuse of data
From page 40...
... target selection, but also to close the loop between data acquisition and selecting the next target that is optimally informative given the observational constraints and scientific objectives. Astronomical surveys, such as the Large Synoptic Survey Telescope, Palomar Transient Factory, Catalina Real-Time Transient Survey, and Zwicky Transient Facility, have demonstrated the effectiveness of machine learning (ML)
From page 41...
... . With regard to using automated workflows and data science tools, Szalay (2020)
From page 42...
... Funding agencies could become proactive rather than reactive with a 10-year time delay. Although more are needed, he pointed to an increasing number of trusted, archival data repositories that have developed succession plans for stewardship of their data beyond the life of the specific repository and in some cases are certified using the CoreTrustSeal.2 Particle Physics Modern particle physics involves large collaborations of up to 10,000 researchers structured around very expensive instruments such as the CERN Large Hadron Collider (LHC)
From page 43...
... However, there are barriers to progress in advancing active learning approaches that utilize workflows to physics discovery. For example, sharing data can power these new approaches, but attitudes about data sharing vary widely across subfields (Nature Physics, 2019)
From page 44...
... Source data in materials research generally are not shared. This makes it difficult for the community to develop larger-scale shared data resources, as opposed to individual labs relying mainly on the data that they generate themselves.
From page 45...
... The Clean Energy Materials Innovation Challenge Expert Workshop in 2017 highlighted the need to develop the materials discovery acceleration platforms that integrate automated robotic machinery with rapid characterization and AI (Aspuru-Guzik and Persson, 2018)
From page 46...
... Although several federal government initiatives, such as the Material Genome Initiative in 2011 and the Materials Science and Engineering Data Challenge in 2015, have been implemented to encourage the use of publicly available data to model or discover new material properties, additional open data policies are needed to accelerate discovery. There is a need to develop multidisciplinary international teams of scientists and engineers with expertise in chemistry, materials science, advanced computing, robotics, AI, and other relevant disciplines (Aspuru-Guzik and Persson, 2018; Persson, 2020a)
From page 47...
... Real-time primary data sharing from individual labs, and the infrastructure to do so, are also needed. National research resources that execute particular types of automated experiments on request -- analogous to what exists in astronomy and physics -- are also needed.
From page 48...
... . A tighter coupling between chemical synthesis and data science would accelerate this process.
From page 49...
... In this respect, it is important to recognize that the appearance of COVID-19 coincides with the rise of cheap ubiquitous sensors and the big data revolution, such that actions are increasingly
From page 50...
... For instance, in the context of clinical care, the gold standard by which care routines, interventions, and treatments are assessed is the randomized clinical trial, which reduces sample bias. Yet the data currently being collected in the clinical domain with respect to COVID-19 are not the outcome of a randomized clinical trial and are thus subject to certain biases.
From page 51...
... Rather, it provides an illustration of what can go wrong when a system is hastily erected. As we work toward the development of automated methods for pandemic detection and subsequent mitigation, it will be critical to ensure that the data have clear provenance, that the workflows and analytics conducted within them are verifiable, and that plans for data sharing, access, and accountability for abuse and misuse of the data, as well as findings from such workflows, are established from the outset.
From page 52...
... The acceleration in the rate of improvement of climate models has the potential to lead to a qualitative leap in the accuracy of climate projections. Unlike in other fields, climate sciences and weather forecasting data have been open, accessible, and widely shared worldwide, going back to global data-sharing frameworks developed in the 1950s.
From page 53...
... Wildfire Detection There is an urgent need for better modeling of a range of environmental hazards, including societal impacts, and innovative approaches combining advanced computing, remote sensing, data science, and the social sciences hold the promise of mitigating hazards and improving responses. In pursuing these advances, "there are still challenges and opportunities in integration of the scientific discoveries and data-driven methods for detecting hazards with the advances in technology and computing in a way that provides and enables different modalities of sensing and computing" (Altintas, 2019)
From page 54...
... The WIFIRE project is used for data-driven knowledge and decision support by a wide range of public- and private-sector users for scientific, municipal, and educational purposes. The integrating factor in WIFIRE is the use of scientific workflow engines as a part of the cyberinfrastructure to bring together steps involving AI techniques on data from networked observations (e.g., heterogeneous satellite data and real-time remote sensor data and computational techniques in signal processing, visualization, fire simulation, and data assimilation)
From page 55...
... within the WIFIRE cyberinfrastructure. WIFIRE's dynamic data-driven fire modeling workflows depend on continuous adjustment of fire modeling ensembles using observations on fire perimeters generated from imagery captured by a variety of data sources including ground-based cameras, satellites, and aircraft.
From page 56...
... DIGITAL HUMANITIES Digital humanities make use of computational tools to conduct textual search, visual analytics, data mining, statistics, and natural language processing (Biemann et al., 2014)
From page 57...
... It now comprises 10 professional societies worldwide, including the Association of Computers and the Humanities, based in the United States, and the European Association for Digital Humanities, among others. Its mission is to promote and support "digital research and teaching across all arts and humanities disciplines, acting as a community-based advisory force, and supporting excellence in research, publication, collaboration, and training." ADHO members publish peer-reviewed journals (such as DSH: Digital Scholarship in the Humanities)
From page 58...
... At the same time, researchers in the social and behavioral sciences face some of the same barriers to the advance of ARWs as those faced in other domains, as well as some distinct issues. Traditionally, researchers in the social and behavioral sciences have mainly worked with small or medium-size data sets, such as survey data collected by researchers themselves, or data generated by government statistical agencies on a scale allowing their downloading and analysis by the researchers themselves using standard statistical software (Turner and Lambert, 2014)
From page 59...
... . Data sits at the core of what federal agencies, and state and local agencies, are asked to do." The following examples illustrate how new data resources and advanced analytics are being applied in the social and behavioral sciences.
From page 60...
... As in other disciplines, the amount of data becoming available to social and behavioral scientists presents challenges related to the size of the data sets, as well as ensuring the integrity of identifiable personal information, reproducibility of results, and archiving. According to Lane (2020)
From page 61...
... The social and behavioral sciences have several existing practices and institutions that can help facilitate the development and implementation of ARWs. For example, there are established organizations charged with data stewardship and related training such as the Inter-university Consortium for Political and Social Research and the National Opinion Research Center.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.