Skip to main content

Currently Skimming:

3 Automated Research Workflows in Action
Pages 57-96

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 57...
... In some cases, the implementation of ARWs is fairly advanced, whereas in others only certain components or aspects of ARWs -- such as advanced computation, use of workflow management systems and notebooks, laboratory automation, and use of AI as a workflow component as well as in directing the "outer loop" of the research process -- are currently being utilized. The purpose of examining these specific areas of research was not to develop a comprehensive census of projects and initiatives, and these examples do not represent a complete picture of relevant work in general or in the disciplines that are represented.
From page 58...
... cyberinfrastructure is learning are stewardship of existing data creating the potential for expanding • New data infrastructure and ARWs services are needed to provide access to FAIR data Particle physics Established history of New • Planning and construction of using big data and creating opportunities large facilities to enable broader sophisticated exist for artificial use of data outputs needs to e cyberinfrastructure exists intelligence (AI) rethought approaches, such • Reuse of workflows needs to be as simulation- encouraged based inference • Incentives and culture around data sharing vary by subfield Materials science Approaches are appearing Opportunities • Ability of humans to make that integrate robotic exist to link unexpected observations needs laboratory instruments, workflows and to be preserved rapid characterization, and implement • Data sharing and access are AI closed-loop inadequate systems • Shortage of researchers who can bridge the gap between experimentation and software development exists • Lack of a supportive culture within the community translates into weak incentives 58 PREPUBLICATION COPY -- Uncorrected Proofs
From page 59...
... Biology Biomedical research has Potential for drug • Shortage of experts who can been overturning discovery bridge the gap between reductionist paradigms as approaches using disciplinary and lab automation they have lost predictive automated expertise exists power, paving the way for experiments and • Capability for real-time data empirical data to guide AI is growing sharing is needed discovery Biochemistry Tighter coupling between Automation of • Nonproprietary data are data science and chemical high-throughput insufficiently characterized, so synthesis can accelerate synthesis and labs making advances in this optimization in drug screening can area have to generate their own discovery accelerate the • Cultural barriers exist in the design–make–test field (e.g., chemical synthesis cycle has been an artisan process) Epidemiology COVID-19 experience has New data • Data that are not generated catalyzed initiatives to resources could through a randomized controlled implement workflows be used to better trial might be biased understand • Confirming the quality and disease provenance of clinical data is interactions and difficult improve treatments Climate science Improving climate models ARWs hold the • Climate sciences and weather partly depends on the promise of data have traditionally been ability to understand and rapidly improving open, but wider use of simulate small-scale the accuracy of commercial data obtained processes model predictions through restrictive licenses is a threat Wildfire detection Interdisciplinary field New tools and • Development of an integrated combines modeling, capabilities using environment is needed to pull remote sensing, data AI can support data from various resource science, and other fields decision making monitoring tools and convert by public- and them into predictive intelligence private-sector users 59 PREPUBLICATION COPY -- Uncorrected Proofs
From page 60...
... Astronomical surveys, such as the Large Synoptic Survey Telescope, Palomar Transient Factory, Catalina Real-Time Transient Survey, and Zwicky Transient Facility, have demonstrated the effectiveness of machine learning (ML) for extracting knowledge from astronomical data sets and streams (Juric et al., 2019)
From page 61...
... There is a need for more relevant data (instead of just larger amounts of data) and dramatically improving experimental design for using AI to conduct large-scale scientific experiments.
From page 62...
... SOURCE: Szalay, 2020. There is also a need for active curated services to enable scalable data access and analysis.
From page 63...
... stated that it would be useful to consider active learning for experimental design from planning to execution, the use of AI in analyses with explainable inference, and automated workflows for rapid follow-up of transients. Barriers to progress include the challenge of ensuring steady, long-term support for preserving irreplaceable data.
From page 64...
... A technical solution enabling collaborative statistical modeling was developed, and teams were able to combine their data to estimate the probability of generating the actual experimental results given their preexisting theoretical assumptions. This approach allowed for rapid confirmation of the existence of the Higgs boson and publication of results.
From page 65...
... However, there are barriers to progress in advancing active learning approaches that utilize workflows to physics discovery. For example, sharing data can power these new 65 PREPUBLICATION COPY -- Uncorrected Proofs
From page 66...
... This will catalyze a transition from an Edisonian approach to scientific discovery to an era of inverse design, where the desired property drives the rapid exploration, with the aid of advanced computing and AI, of materials design space and the synthesis of targeted materials. 66 PREPUBLICATION COPY -- Uncorrected Proofs
From page 67...
... Source data in materials research generally are not shared. This makes it difficult for the community to develop larger-scale shared data resources, as opposed to individual labs relying 67 PREPUBLICATION COPY -- Uncorrected Proofs
From page 68...
... Another factor that limits data sharing and reuse is that many instruments used in materials research, such as electron microscopes, generate data only in the proprietary formats unique to each manufacturer. Regarding human resource needs, there is a gap between the scientific questions being asked and the questions that the data can answer.
From page 69...
... The Clean Energy Materials Innovation Challenge Expert Workshop in 2017 highlighted the need to develop the materials discovery acceleration platforms that integrate automated robotic machinery with rapid characterization and AI (Aspuru-Guzik and Persson, 2018)
From page 70...
... , closed-loop experimental systems relying on automation and AI can advance experimental biomedical research. The last several decades have overturned reductionist paradigms in biology, challenging the belief that the functioning of living organisms can best be understood by breaking them down into systems and components that operate according to fixed rules.
From page 71...
... Given the inherent complexity, absence of fixed rules, and inability to characterize and measure the effect of every biological change on every variable, human understanding is not possible. Thus a new approach is needed in which empirical data are gathered through automated experiments selected by AI.
From page 72...
... Real-time primary data sharing from individual labs, and the infrastructure to do so, are also needed. National research resources that execute particular types of automated experiments on request -- analogous to what exists in astronomy and physics -- are also needed.
From page 73...
... Biochemistry Chemical synthesis is a two centuries-old empirical science and is the bottleneck in the optimization of drug discovery, a process that typically takes years (Cernak, 2020)
From page 74...
... Manufacturing, in particular, is a critical bottleneck in this cycle. Researchers are working to accelerate chemical synthesis through automated reaction equipment and in-line/in situ analysis tools.
From page 75...
... HEALTH AND ENVIRONMENT Epidemiology The pandemic induced by COVID-19 has touched all aspects of society. It has also created opportunities to assess the capabilities of modern scientific workflows and to innovate 6 See https://ccas.nd.edu/.
From page 76...
... By combining modern workflow systems with large amounts of personal data, drawn from a wide variety of domains, there is an expectation for breakthroughs in public health policy and assessment, rapid refinement of guidelines for clinical care, repurpose of known drugs for treatment, and crafting of novel vaccines. Many examples of rapidly organized scientific endeavors associated with COVID-19 have emerged.
From page 77...
... For instance, in the context of clinical care, the gold standard by which care routines, interventions, and treatments are assessed is the randomized clinical trial, which reduces sample bias. Yet the data currently being collected in the clinical domain with respect to COVID-19 are not the outcome of a randomized clinical trial and are thus subject to certain biases.
From page 78...
... As we work toward the development of automated methods for pandemic detection and subsequent mitigation, it will be critical to ensure that the data have clear provenance, that the workflows and analytics conducted within them are verifiable, and that plans for data sharing, access, and accountability for abuse and misuse of the data, as well as findings from such workflows, are established from the outset. In this respect, it is critical to continue to create common data models, promote approaches to mitigate bias in data collection and analysis, and support the infrastructure necessary to support -- if not in real time, at least in near real time -- scientific investigation that leads to guidance on how best to mitigate public health threats.
From page 79...
... , can be combined with newer techniques from ML to accelerate learning about small-scale process models in computationally expensive climate models (Cleary et al., 2021)
From page 80...
... . More generally, automating the workflow for learning from observational data makes it possible to quantitatively pose the experimental design question about what kind of observations would be maximally informative to reduce climate model uncertainties further.
From page 81...
... The acceleration in the rate of improvement of climate models has the potential to lead to a qualitative leap in the accuracy of climate projections. Unlike in other fields, climate sciences and weather forecasting data have been open, accessible, and widely shared worldwide, going back to global data-sharing frameworks 81 PREPUBLICATION COPY -- Uncorrected Proofs
From page 82...
... However, the openness and the benefits that accrue from it are now under threat because government agencies are beginning to purchase data from commercial providers under restrictive licenses. Wildfire Detection There is an urgent need for better modeling of a range of environmental hazards, including societal impacts, and innovative approaches combining advanced computing, remote sensing, data science, and the social sciences hold the promise of mitigating hazards and improving responses.
From page 83...
... FIGURE 3-4 Dynamic data-driven fire modeling workflows in WIFIRE. SOURCE: Ilkay Altintas.
From page 84...
... A typical example of these applications is the role of real-time edge processing and the use of ML and big data in wildfire behavior modeling applications within the WIFIRE cyberinfrastructure. WIFIRE's dynamic data-driven fire modeling workflows depend on continuous adjustment of fire modeling ensembles using observations on fire perimeters generated from imagery captured by a variety of data sources including ground-based cameras, satellites, and aircraft.
From page 85...
... to scale and manage the components of these application workflows homogeneously. However, these application workflows are composed of steps that require a 85 PREPUBLICATION COPY -- Uncorrected Proofs
From page 86...
... DIGITAL HUMANITIES Digital humanities make use of computational tools to conduct textual search, visual analytics, data mining, statistics, and natural language processing (Biemann et al., 2014)
From page 87...
... . Another relevant project is the Machine Learning for Music project, "a community of composers, musicians, and audiovisual artists, exploring the creative use of emerging Artificial Intelligence and Machine Learning technologies in music." 13 As the field has emerged, the Alliance of Digital Humanities Organizations (ADHO)
From page 88...
... 16 In the United States, the Office of Digital Humanities within the National Endowment for the Humanities offers grants to digital projects, many of which produce white papers for further knowledge sharing. These efforts could contribute to the integration of resources necessary to apply next-generation workflows to humanities research.
From page 89...
... Similar to the sciences and engineering, ARWs in the humanities result in a hybrid environment that integrates human feedback and contributions with ongoing automated analysis of linguistic sources. The machines can mine data, but a human in the loop must provide the training material that drives the artificial intelligence systems and corrects raw material that gets 17 See https://viraltexts.org/.
From page 90...
... Traditionally, researchers in the social and behavioral sciences have mainly worked with small or medium-size data sets, such as survey data collected by researchers themselves, or data generated by government statistical agencies on a scale allowing their downloading and analysis by the researchers themselves using standard statistical software (Turner and Lambert, 2014)
From page 91...
... . Data sits at the core of what federal agencies, and state and local agencies, are asked to do." The following examples illustrate how new data resources and advanced analytics are being applied in the social and behavioral sciences.
From page 92...
... Some social and behavioral sciences researchers are explicitly applying ARWs in their work.
From page 93...
... . The great challenges of our time are human in nature -- climate change, terrorism, overuse of natural resources, the nature of work, and so on -- and these require robust social science to understand their causes and consequences.
From page 94...
... The social and behavioral sciences have several existing practices and institutions that can help facilitate the development and implementation of ARWs. For example, there are established organizations charged with data stewardship and related training such as the Inter 94 PREPUBLICATION COPY -- Uncorrected Proofs
From page 95...
... university Consortium for Political and Social Research and the National Opinion Research Center. Finally, the American Journal of Political Science requires that the data supporting published articles undergo an independent verification process.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.