Skip to main content

Currently Skimming:

2 Context for Automated Research Workflows
Pages 27-56

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 27...
... Pairing advances in artificial intelligence (AI) , computing, and automation of laboratories and observations can also lead to a qualitative step change.
From page 28...
... The process of starting from a model and devising an observational or experimental way of generating new data is called experimental design (with an experiment understood broadly to include, for example, the collection of observational data)
From page 29...
... FIGURE 2-1 Knowledge discovery loop. NOTE: Automated research workflows can automate and close the loop of scientific discovery.
From page 30...
... To cast the discussion in modern ML terms, the closed-loop research workflow in Figure 2-1 encapsulates a form of reinforcement learning (Sutton and Barto, 1998) , in which a model is used to design a manipulation or observation of an environment to generate data (experimental design)
From page 31...
... . AI and ML techniques deployed within ARWs not only can drive an experiment and mine the literature to suggest future experiments, but also may enhance research reliability and productivity by facilitating the reuse of workflows and improving the ability of researchers to monitor workflow execution and detect anomalies (Deelman et al., 2019)
From page 32...
... Federal financial support can stimulate related support from the private sector, nongovernmental foundations, and universities. Examples include cyberinfrastructure research and development (R&D)
From page 33...
... . In the context of this report, we use the term machine learning broadly, to comprise any form of learning from data, be that Bayesian learning about parameters, parametric functions, or nonparametric functions in scientific models or learning with artificial neural networks.
From page 34...
... BUILDING AUTOMATED RESEARCH WORKFLOWS: CURRENT STATE OF THE ART As outlined above, the confluence of several technological advancements is driving the development and implementation of ARWs. Fully realized ARWs are not common at present, and so this study examines how and where progress is being made in areas such as advanced computation, use of workflow management systems and notebooks, laboratory automation, and use of AI as a workflow component as well as in directing the "outer loop" of the research 34 PREPUBLICATION COPY -- Uncorrected Proofs
From page 35...
... As the nature of research problems and the cyberinfrastructure platform for exploring them have become more powerful and complex, scientific workflow engines have played a crucial role in harnessing and coordinating distributed computing and data resources. Scientific workflow engines are software tools that capture the computational analysis pipeline of a research project, providing provenance tracking and other functions that facilitate automation, reproducibility, and reusability.
From page 36...
... There are also several distributed computing and automation frameworks with a narrower focus that capture specific execution patterns. These may also be intrinsically part of the "workflow." Examples include tools such as Spark or Hadoop that enable a large number of data processing tasks at scale, or cloud data stores such as BigTable that can execute queries across a large distributed data set.
From page 37...
... . As the next generation of scientific workflow engines expands, automation of the scientific process can lead to a step change in the rate of discovery in many fields.
From page 38...
... TABLE 2-1 Examples of Workflow Engines and Related Tools Airflow https://airflow.apache.org Bigtable https://cloud.google.com/bigtable Chimera https://github.com/hysds/chimera Cromwell http://cromwell.readthedocs.io/ Cyverse Discovery Environment https://cyverse.org/discovery-environment Fireworks https://materialsproject.github.io/fireworks Hadoop https://hadoop.apache.org Galaxy https://galaxyproject.org iRODS https://irods.org Jupyter https://jupyter.org Kepler https://kepler-project.org Nextflow https://www.nextflow.io Open Science Framework https://osf.io Luigi https://luigi.readthedocs.io/en/stable/workflows.html Parsl http://parsl-project.org Pegasus https://pegasus.isi.edu Snakemake https://snakemake.readthedocs.io/en/stable Spark https://spark.apache.org Starfish Storage https://starfishstorage.com Wolfram https://www.wolframcloud.com 38 PREPUBLICATION COPY -- Uncorrected Proofs
From page 39...
... Making more FAIR data available allows ARWs to find the data that are relevant to a research task in question and incorporate these data into the analysis. Wider reuse encourages researchers to make more FAIR data available, creating a virtuous cycle.
From page 40...
... . Properly designed ARWs support FAIR data principles since they can capture the associated metadata and provenance necessary to describe their data products in a formalized and completely traceable way.
From page 41...
... Understanding and managing the interplay between models derived from domain knowledge, ML, and how the system iteratively drives experimental design constitute a continuing task for ARW development across domains. IMPLEMENTING AUTOMATED RESEARCH WORKFLOWS: A CHANGING SCIENTIFIC PARADIGM Over the past two decades, scientific workflow systems have matured as powerful tools, especially for "resource allocation, task scheduling, performance optimization, and static coordination of tasks on a potentially heterogeneous set of resources" (Altintas et al., 2019)
From page 42...
... The developments that are combining to encourage greater use include the maturation of the systems, AI as a component in the workflow, better interoperability of systems and components, and the promise of open science and FAIR to raise the value of workflows broadly and evolve into ARWs. Scientific workflow engines have historically targeted applications in scalable computing where users chain together multiple steps in a complex computational process (e.g., job submission to a supercomputer, access to a database, execution of a web service)
From page 43...
... . Team science requires tools for managing, capturing, and advancing team collaboration, contribution, and communication as an open process, in addition to the discovery process and its reproducibility.
From page 44...
... FIGURE 2-2 Team science workflow process. 44 PREPUBLICATION COPY -- Uncorrected Proofs
From page 45...
... There may be hybrid roles such as "workflow system administration" that could span the tasks of a data engineer and software developer. The broader point is that multiple people are involved in the scientific discovery process across the workflow.
From page 46...
... From Exploratory Activity to Scale Once a research team agrees on its research methods through exploration, there is often a need to scale up execution processes with more data or for larger parameter sets requiring automation and control. A big challenge in building ARWs is to sustain the linkage between the exploratory activities and the automated scalable process.
From page 47...
... However, the shift from individual workflow development to team science also creates the need for workflow systems to capture the process for validation, seamless integration, and repeatability of the team's activity. Figure 2-3 illustrates in lighter blue the system hierarchy supporting the discovery loop by which the research team interacts with the scientific workflow engine and other software tools to run ML or AI algorithms or methods in a computing infrastructure using data to learn about the model and then to design new experiments based on what is learned.
From page 48...
... POLICY AND INDUSTRY CONTEXT FOR AUTOMATED RESEARCH WORKFLOWS Public Policy Readiness Policy makers and funding agencies in the United States and Europe have articulated a research vision at a scale and complexity that implies robust support for the development and 48 PREPUBLICATION COPY -- Uncorrected Proofs
From page 49...
... For example, at DOE, the Artificial Intelligence & Technology Office has been established to "transform DOE into a world-leading AI enterprise by accelerating the research, development, and adoption of AI," with ARWs as a core part of its efforts (Kusnezov, 2020)
From page 50...
... In June 2021, OSTP announced the formation of the National Artificial Intelligence Research Resource Task Force as part of implementing the NAIIA. The Task Force will provide recommendations for establishing and sustaining the NAIRR, including technical capabilities, governance, administration, and assessment, as well as 4 15 U.S.C.
From page 51...
... . To back up policy priorities, the recently concluded European Union's research and innovation programme, Horizon 2020, provided €80 billion in funding between 2014 and 2020, including emerging technologies, e-infrastructure, and advanced computing.
From page 52...
... In January 2020, the government allocated £300 million to UKRI to fund research infrastructure. Many UK research institutes and infrastructures are also playing key positions and providing pivotal input into the EOSC and have led the initial computational development work via the Science and Technology Facilities Council, a part of UKRI (UKRI, 2017)
From page 53...
... is emerging as a national infrastructure of data and computation for accelerating science discovery. Industrial Use of Workflows This discussion of industrial use of workflows focuses primarily on research applications.
From page 54...
... Companies may also use proprietary workflow tools that store and manage data in nonstandard proprietary formats. Since there is little incentive for toolmakers to agree to standards among themselves, researchers may be unable to access or utilize data even if they are technically open.
From page 55...
... sector and provided ideas on how to strengthen links across different organizational cultures and constraints. 55 PREPUBLICATION COPY -- Uncorrected Proofs


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.