Skip to main content

Currently Skimming:


Pages 27-56

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 27...
... Progress is being made in the number and diversity of domain-specific and general data repositories that support FAIR principles and provide archival functionality for long-term access to data and related research objects. Examples can be found in the Registry of Research Data Repositories.3 Progress in Domain-Relevant Artificial Intelligence and Machine Learning Another key factor in building ARWs is the continued advances in learning algorithms for specific domains.
From page 28...
... Understanding and managing the interplay between models derived from domain knowledge, ML, and how the system iteratively drives experimental design constitute a continuing task for ARW development across domains. IMPLEMENTING AUTOMATED RESEARCH WORKFLOWS: A CHANGING SCIENTIFIC PARADIGM Over the past two decades, scientific workflow systems have matured as powerful tools, especially for "resource allocation, task scheduling, performance optimization, and static coordination of tasks on a potentially heterogeneous set of resources" (Altintas et al., 2019)
From page 29...
... . For example, the data and computational scientists might collaborate on parameter estimation, ML, or data assimilation methods so that the computational model benefits from the data analysis.
From page 30...
... Most workflow systems require that the collaboration adopt a specific set of tools and specific methodology for its research. That is, the workflow engines or other enabling tools may embody ways of conducting the work that need to be aligned with the human participants.
From page 31...
... Reproducibility of the Process and Team Science Scientific workflow engines potentially provide "a programming model for deployment of computational and data science applications on all scales of computing and provide a platform for system integration of data, modeling tools, and computing while making the applications reusable and reproducible" (Altintas, 2018)
From page 32...
... POLICY AND INDUSTRY CONTEXT FOR AUTOMATED RESEARCH WORKFLOWS Public Policy Readiness Policy makers and funding agencies in the United States and Europe have articulated a research vision at a scale and complexity that implies robust support for the development and sustainability of ARWs. That is, while not explicitly singling out "support for ARWs," they point to the societal and economic benefits that AI and ML can bring about.
From page 33...
... It also recognizes the need to educate an AI-savvy scientific workforce. In June 2021, OSTP announced the formation of the National Artificial Intelligence Research Resource Task Force as part of implementing the NAIIA.
From page 34...
... . This has led to development of the EOSC as a shared infrastructure to provide access to data repositories and resources such as cloud services, high-performance computing, and data analysis tools (EOSC, 2020)
From page 35...
... . There are barriers to translating practices and tools developed for computational workflows used to automate business processes to research applications.
From page 36...
... Companies may also use proprietary workflow tools that store and manage data in nonstandard proprietary formats. Since there is little incentive for toolmakers to agree to standards among themselves, researchers may be unable to access or utilize data even if they are technically open.
From page 37...
... In some cases, the implementation of ARWs is fairly advanced, whereas in others only certain components or aspects of ARWs -- such as advanced computation, use of workflow management systems and notebooks, laboratory automation, and use of AI as a workflow component as well as in directing the "outer loop" of the research process -- are currently being utilized. The purpose of examining these specific areas of research was not to develop a comprehensive census of projects and initiatives, and these examples do not represent a complete picture of relevant work in general or in the disciplines that are represented.
From page 38...
... broader use of data outputs cyberinfrastructure approaches, such needs to be rethought exists as simulation-based • Reuse of workflows needs inference to be encouraged • Incentives and culture around data sharing vary by subfield Materials Approaches are Opportunities exist • Ability of humans to make science appearing that integrate to link workflows unexpected observations robotic laboratory and implement needs to be preserved instruments, rapid closed-loop systems • Data sharing and access are characterization, and AI inadequate • Shortage of researchers who can bridge the gap between experimentation and software development exists • Lack of a supportive culture within the community translates into weak incentives Biology Biomedical research Potential for drug • Shortage of experts who has been overturning discovery approaches can bridge the gap between reductionist paradigms using automated disciplinary and lab as they have lost experiments and AI automation expertise exists predictive power, is growing • Capability for real-time data paving the way for sharing is needed empirical data to guide discovery Biochemistry Tighter coupling Automation of high- • Nonproprietary data are between data science throughput synthesis insufficiently characterized, and chemical synthesis and screening can so labs making advances in can accelerate accelerate the design– this area have to generate optimization in drug make–test cycle their own discovery • Cultural barriers exist in the field (e.g., chemical synthesis has been an artisan process)
From page 39...
... AUTOMATED RESEARCH WORKFLOWS IN ACTION 39 TABLE 3-1 Continued General Use Case Characterization Opportunities Challenges and Barriers Epidemiology COVID-19 experience New data resources • Data that are not generated has catalyzed could be used to through a randomized initiatives to better understand controlled trial might be implement workflows disease interactions biased and improve • Confirming the quality and treatments provenance of clinical data is difficult Climate science Improving climate ARWs hold the • Climate sciences and models partly depends promise of rapidly weather data have on the ability to improving the traditionally been open, but understand and accuracy of model wider use of commercial simulate small-scale predictions data obtained through processes restrictive licenses is a threat Wildfire Interdisciplinary field New tools and • Development of an detection combines modeling, capabilities using AI integrated environment is remote sensing, data can support decision needed to pull data from science, and other making by public- and various resource monitoring fields private-sector users tools and convert them into predictive intelligence Digital Increasing use of New digital data and • Human in the loop is needed humanities computational tools AI tools can be used to provide training material and large data sets is to analyze big data for AI expanding the sorts (e.g., Latin texts) • Gap exists between of research questions traditional research and that can be addressed new tools needed by next-generation scholars Social and Access to large Real-time access to • Growing size and dispersed behavioral amounts of high- data and analysis can nature of data sets and sciences quality data is deliver actionable need to ensure integrity of available at relatively information personal information pose low cost, transforming challenges a number of fields • Improving metadata is necessary for robust findings and reuse of data
From page 40...
... discussed how automated workflows in astronomical research have evolved over the last 20 years and might advance in the future. The Sloan Digital Sky Survey (SDSS)
From page 41...
... . With regard to using automated workflows and data science tools, Szalay (2020)
From page 42...
... Funding agencies could become proactive rather than reactive with a 10-year time delay. Although more are needed, he pointed to an increasing number of trusted, archival data repositories that have developed succession plans for stewardship of their data beyond the life of the specific repository and in some cases are certified using the CoreTrustSeal.2 Particle Physics Modern particle physics involves large collaborations of up to 10,000 researchers structured around very expensive instruments such as the CERN Large Hadron Collider (LHC)
From page 43...
... However, there are barriers to progress in advancing active learning approaches that utilize workflows to physics discovery. For example, sharing data can power these new approaches, but attitudes about data sharing vary widely across subfields (Nature Physics, 2019)
From page 44...
... shortage of researchers who can bridge the gap between materials research experimentalists and those who are developing the necessary software tools, and (3) inertia within the community and resistance to pursuing automated approaches to research powered by AI.
From page 45...
... The Clean Energy Materials Innovation Challenge Expert Workshop in 2017 highlighted the need to develop the materials discovery acceleration platforms that integrate automated robotic machinery with rapid characterization and AI (Aspuru-Guzik and Persson, 2018)
From page 46...
... Although several federal government initiatives, such as the Material Genome Initiative in 2011 and the Materials Science and Engineering Data Challenge in 2015, have been implemented to encourage the use of publicly available data to model or discover new material properties, additional open data policies are needed to accelerate discovery. There is a need to develop multidisciplinary international teams of scientists and engineers with expertise in chemistry, materials science, advanced computing, robotics, AI, and other relevant disciplines (Aspuru-Guzik and Persson, 2018; Persson, 2020a)
From page 47...
... Real-time primary data sharing from individual labs, and the infrastructure to do so, are also needed. National research resources that execute particular types of automated experiments on request -- analogous to what exists in astronomy and physics -- are also needed.
From page 48...
... Researchers are working to accelerate chemical synthesis through automated reaction equipment and in-line/in-situ analysis tools. For example, Timothy Cernak's lab at the University of Michigan is able to perform 1,500 experiments rapidly through nanoscale synthesis using robotics (Cernak, 2020)
From page 49...
... HEALTH AND ENVIRONMENT Epidemiology The pandemic induced by COVID-19 has touched all aspects of society. It has also created opportunities to assess the capabilities of modern scientific workflows and to innovate new paradigms.
From page 50...
... For instance, in the context of clinical care, the gold standard by which care routines, interventions, and treatments are assessed is the randomized clinical trial, which reduces sample bias. Yet the data currently being collected in the clinical domain with respect to COVID-19 are not the outcome of a randomized clinical trial and are thus subject to certain biases.
From page 51...
... Rather, it provides an illustration of what can go wrong when a system is hastily erected. As we work toward the development of automated methods for pandemic detection and subsequent mitigation, it will be critical to ensure that the data have clear provenance, that the workflows and analytics conducted within them are verifiable, and that plans for data sharing, access, and accountability for abuse and misuse of the data, as well as findings from such workflows, are established from the outset.
From page 52...
... The acceleration in the rate of improvement of climate models has the potential to lead to a qualitative leap in the accuracy of climate projections. Unlike in other fields, climate sciences and weather forecasting data have been open, accessible, and widely shared worldwide, going back to global data-sharing frameworks developed in the 1950s.
From page 53...
... Wildfire Detection There is an urgent need for better modeling of a range of environmental hazards, including societal impacts, and innovative approaches combining advanced computing, remote sensing, data science, and the social sciences hold the promise of mitigating hazards and improving responses. In pursuing these advances, "there are still challenges and opportunities in integration of the scientific discoveries and data-driven methods for detecting hazards with the advances in technology and computing in a way that provides and enables different modalities of sensing and computing" (Altintas, 2019)
From page 54...
... The WIFIRE project is used for data-driven knowledge and decision support by a wide range of public- and private-sector users for scientific, municipal, and educational purposes. The integrating factor in WIFIRE is the use of scientific workflow engines as a part of the cyberinfrastructure to bring together steps involving AI techniques on data from networked observations (e.g., heterogeneous satellite data and real-time remote sensor data and computational techniques in signal processing, visualization, fire simulation, and data assimilation)
From page 55...
... Typically, perimeter generation is performed in a big data and/or edge computing environment while fire modeling is performed in an HPC or HTC environment, depending on which fire modeling codes are executed (Altintas, 2020b)
From page 56...
... DIGITAL HUMANITIES Digital humanities make use of computational tools to conduct textual search, visual analytics, data mining, statistics, and natural language processing (Biemann et al., 2014)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.