National Academies Press: OpenBook
« Previous: 6 Conclusion
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page144
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page145
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page146
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page147
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page148
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page149
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page150
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page151
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page152
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page153
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page154
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page155
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page156
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page157
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page158
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page159
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page160
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page161
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page162
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page163
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page164
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page165
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page166
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page167
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. doi: 10.17226/26532.
×
Page168

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

REFERENCES AGU (American Geophysical Union). 2021. Guidance for AGU authors—Jupyter Notebooks. Available at https://data.agu.org/resources/jupyter-notebooks-guidance. Accessed November 23, 2021. Alberts, B. M. 2013. Impact factor distortions. Science 340(6134):787. doi: 10.1126/science.1240319. Alberts, B. M., M. W. Kirschner, S. Tilghman, and H. Varmus. 2014. Rescuing US biomedical research from its systemic flaws. Proceedings of the National Academy of Sciences of the United States of America 111(16):5773–5777. doi: 10.1073/pnas.1404402111. ALLEA. 2017. The European code of conduct for research integrity. Available at https://allea.org/code-of-conduct/. Accessed January 21, 2021. Altintas, I., J. Block, R. de Callafon, D. Crawl, C. Cowart, A. Gupta, M. Nguyen, H. W. Braun, J. Schulze, M. Gollner, A. Trouve, and L. Smarr. 2015. Towards an integrated cyberinfrastructure for scalable data-driven monitoring, dynamic prediction and resilience of wildfires. Procedia Computer Science 51:1633–1642. doi: 10.1016/j.procs.2015.05.296. Altintas, I., S. Purawat, D. Crawl, A. Singh, and K. Marcus. 2019. Toward a methodology and framework for workflow-driven team science. Computing in Science & Engineering 21:37–48. doi: 10.48550/arXiv.1903.01403. Altintas, I. 2018. Evolving Role of Scientific Workflows in a Highly Networked, Collaborative and Dynamic Data-Driven World. Keynote Talk, Works 2018 Workshop, November 11, 2018. Available at http://works.cs.cf.ac.uk/2018/program.php. Accessed April 13, 2022. Altintas, I. 2019. SC19: Next Generation Disaster Intelligence Using the Continuum of Computing and Data Technologies. Address at SC19, November 21, 2019. Available at https://wifire.ucsd.edu/node/115. Accessed April 13, 2022. Altintas, I. 2020a. Challenges and Opportunities for Composable AI-Integrated Applications at the Digital Continuum: Keynote. 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA). doi: 10.1109/AICCSA50499.2020.9316494. 144 PREPUBLICATION COPY—Uncorrected Proofs

Altintas, I. 2020b. Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence. In F. Darema, E. Blasch, S. Ravela, and A. Aved (Eds.). Dynamic Data Driven Applications Systems: Third International Conference, DDDAS 2020, Boston, MA, USA, October 2-4, 2020, Proceedings (18-21). Handel, Switzerland: Springer Nature Switzerland. doi: 0.1007/978-3-030-61725-7. AI/ML workflows on OpenShift. Red Hat OpenShift. Available at https://demo.openshift.com/en/latest/aiml-workflows/. Accessed June 2020. Anthony, K. 2020. New open release allows theorists to explore LHC data in a new way. CERN Accelerating Science. Available at https://home.cern/news/news/knowledge-sharing/new- open-release-allows-theorists-explore-lhc-data-new-way. Accessed January 9, 2020. Aspuru-Guzik, A., and K. Persson. 2018. Materials Acceleration Platform: Accelerating advanced energy materials discovery by integrating high-throughput methods and artificial intelligence. Mission Innovation: Innovation Challenge 6. Available at http://nrs.harvard.edu/urn-3:HUL.InstRepos:35164974. Accessed June 17, 2020. Atkinson, M., S. Gesing, J. Montagnat, and I. Taylor. 2017. Scientific workflows: past, present, and future. Future Generation Computer Systems 75:216–227. Aucamp, I. 2020. Computational data analysis workflow systems. Available at https://github.com/common-workflow-language/common-workflow- language/wiki/Existing-Workflow-systems. Accessed April 27, 2021. Barber, G. 2019. Artificial intelligence faces reproducibility crisis. Wired. Available at https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/. Accessed January 12, 2021. Barga, R., and D. Gannon. 2007. Scientific versus business workflows. In Workflows for e- science, I. J. Taylor, E. Deelman, D. B.Gannon, and M. Shields (eds.). London: Springer. doi: 10.1007/978-1-84628-757-2_2. Bauer, P., A. Thorpe, and G. Brunet. 2015. The quiet revolution of numerical weather prediction. Nature 525:47–55. doi: 10.1038/nature14956. Beckman, P. 2020. Supporting tools and systems. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. 145 PREPUBLICATION COPY—Uncorrected Proofs

Begley, C. G., and L. M. Ellis. 2012. Raise standards for preclinical cancer research. Nature 483:531–533. doi: 10.1038/483531a. Bellissimo, J. 2019. Intelligent workflows 101: Revolutionizing the way your business works. IBM Smarter Business Review. Available at https://www.ibm.com/blogs/services/2019/04/29/intelligent-workflows-101- revolutionizing-the-way-your-business-works/. Accessed November 30, 2021. Bergmann, U., L. Deer, and J. Langer. 2021. Reproducible data analytic workflows for economics: An introduction to Snakemake. Available at https://lachlandeer.github.io/snakemake-econ-r-tutorial/. Accessed April 14, 2021. Bhattacharya, S. 2019. The new dawn of AI: Federated learning. Towards Data Science. January 27. Available at https://towardsdatascience.com/the-new-dawn-of-ai-federated-learning- 8ccd9ed7fc3a. Accessed November 30, 2021. Biemann, C., G. R. Crane, C. D. Fellbaum, and A. Mehler. 2014. Computational humanities— bridging the gap between computer science and digital humanities. Dagstuhl Reports. 4(7):80–111. doi: 10.4230/DagRep.4.7.80. Brennan, P., E Green, and B. Tromberg. Concept clearance for artificial intelligence for biomedical excellence (AIBLE). Available at https://dpcpsi.nih.gov/sites/default/files/CoC_May_2020_1.05PM_CF_Concept_Clearan ce_AIBLE_Background_Brennan_508.pdf. Accessed April 14, 2021. Bucci, E. 2018. Automatic detection of image manipulations in the biomedical literature. Nature: Cell Death and Disease 9:400. doi: 10.1038/s41419-018-0430-3. Burgelman, J. 2020. Emerging policy (pre) conditions for research data management. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Caliskan, A., J. Bryson, and A. Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356:183–186. doi: 10.1126/science.aal4230. Casadevall, A., and F. C. Fang. 2012. Reforming science: Methodological and cultural reforms. Infection and Immunity 80(3):891–896. doi: 10.1128/IAI.06183-11. 146 PREPUBLICATION COPY—Uncorrected Proofs

Cedeno-Mieles, V., Z. Hu, Y. Ren, X. Deng, N. Contractor, S. Ekanayake, J. Epstein, B. Goode, G. Korkmaz, C. Kuhlman, D. Machi, M. Macy, and M. V. Marathe, N. Ramakrishnan, P. Saraf, and N. Self. 2020. Data analysis and modeling pipeline for controlled networked social science experiments. PLoS ONE. 5(11):e0242453. doi: 10.1371/journal.pone.0242453. Cernak, T. 2016. Synthesis in the chemical space age. Chem 1(1):6–9. doi: 10.1016/j.chempr.2016.06.002. Cernak, T. 2020. Opportunities in automated synthesis of small molecules. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Chen, X., S. Dallmeier-Tiessen, R. Dasler. S. Feger, P. Fokianos, J. B. Gonzalez, H. Hirvonsalo, D. Kousidis, A. Lavasa, S. Mele, D. R. Rodriguez, T. Šimko, T. Smith, A. Trisovic, A. Trzcinska, I. Tsanaktsidis, M. Zimmermann, K. Cranmer, L. Heinrich, G. Watts, M. Hildreth, L. L. Iglesias, K. Lassila-Perini, and S. Neubert. 2019. Open is not enough. Nature Physics 15:113–119. doi: 10.1038/s41567-018-0342-2. Chetty, R., and J. Friedman. 2019. A practical method to reduce privacy loss when disclosing statistics based on small samples. Journal of Privacy and Confidentiality 9(2). doi: 10.29012/jpc.716. CSTCloud (China Science and Technology Cloud). 2020. About the CSTCloud. Available at https://www.cstcloud.net/cstcloud.htm. Accessed April 27, 2021. Christian, T. M., S. Lafferty-Hess, W. G. Jacoby, and T. M. Carsey. 2018. Operationalizing the republication standard: A case study of the data curation and verification workflow for scholarly journals. Available at https://osf.io/preprints/socarxiv/cfdba/. Accessed March 18, 2021. Cleary, E., A. Garbuno-Inigo, S. Lan, T. Schneider, and A. M. Stuart. 2021. Calibrate, emulate, sample. Journal of Computational Physics 424:109716. doi: 10.1016/j.jcp.2020.109716. Cohen-Boulakia, S., K. Belhajjame, O. Collin, J. Chopard, C. Froidevaux, A. Gaignard, K. Hinsen, P. Larmande, Y. Le Bras, F. Lemoine, F. Mareuil, H. Ménager, C. Pradal, and C. Blanchet. 2017. Scientific workflows for computational reproducibility in the life 147 PREPUBLICATION COPY—Uncorrected Proofs

sciences: Status, challenges and opportunities. Future Generation Computer Systems 75:284–298. doi: 10.1016/j.future.2017.01.012. COPDESS (Coalition for Publishing Data in the Earth and Space Sciences). 2021. Enabling FAIR Data Project. Available at http://www.copdess.org/enabling-fair-data-project/. Accessed November 8, 2021. Crane, G. 2020. Philosophy at scale. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. CRS (Congressional Research Service). 2020. Federal research and development (R&D) funding: FY2021. December 17. Available at https://fas.org/sgp/crs/misc/R46341.pdf. Accessed May 15, 2021. Cranmer, K. 2020. Accelerating physics with advanced, automated workflows. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Cranmer, K., and I. Yavin. 2010. RECAST: Extending the impact of existing analyses. Preprint. Available at https://arxiv.org/abs/1010.2506. Accessed June 19, 2020. Cranmer, K., J. Brehmer, and G. Louppec. 2020. The frontier of simulation-based inference. Preprint. Available at https://arxiv.org/pdf/1911.01429.pdf. Accessed June 19, 2020. Crosas, M., G. King, J. Honaker, and L. Sweeney. 2015. Automating open science for big data. Annals of the American Academy of Political and Social Science 659(1):260–273. doi: 10.1177/0002716215570847. (CWL) Common Workflow Language user guide. 2020. Available at https://www.commonwl.org/user_guide/. Accessed June 2020. Davidson, S. B., S. Khanna, S. Roy, and S. C. Boulakia. 2010. Privacy issues in scientific workflow provenance. Available at https://repository.upenn.edu/cis_papers/669. Accessed June 19, 2020. de Carvalho, E. C. A., M. K. Jayanti, A. P. Batilana, A. M. O. Kozan, M. J. Rodrigues, J. Shah, M. R. Loures, S. Patil, P. Payne, and R. Pietrobon. 2010. Standardizing clinical trials workflow representation in UML for international site comparison. PLoS ONE 5(11):e13893. doi: 10.1371/journal.pone.0013893. 148 PREPUBLICATION COPY—Uncorrected Proofs

Deelman E., A. Mandal, and M. Jiang. 2019. The role of machine learning in scientific workflows. International Journal of High Performance Computing Applications 33(6):1128–1139. doi: 10.1177/1094342019852127. Deelman E., R. Ferreira da Silva, K. Vahi, M. Rynge, R. Mayani, R. Tanaka, W. Whitcup, and M. Livny. 2020. The Pegasus workflow management system: Translational computer science in practice. Journal of Computational Science 52:101200. doi: 10.1016/j.jocs.2020.101200. Dockstore. 2021. Dockstore: An app store for bioinformatics. Available at https://dockstore.org/; Accessed November 23, 2021. DOE (U.S. Department of Energy). 2019. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence. Office of Scientific and Technical Information. Available at https://www.osti.gov/servlets/purl/1478744. Accessed April 4, 2021. DORA (Declaration on Research Assessment). 2013. San Francisco declaration on research assessment. Available at https://sfdora.org/read. Accessed January 12, 2021. Duke University. 2021. Data management plan. Duke University Office of Scientific Integrity. Available at https://dosi.duke.edu/advancing-scientific-integrity-services-and- training/accountability-research/data-management-plan. Accessed November 15, 2021. EC (European Commission). 2016. Realising the European Open Science Cloud: First report and recommendations. Available at https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_clou d_2016.pdf. Accessed June 19, 2020. EC. 2019. Cost-benefit analysis for FAIR research data. Available at https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04- 01aa75ed71a1. Accessed June 19, 2020. EC. 2020. Progress on open science: towards a shared research knowledge system: final report of the open science policy platform. Available at https://data.europa.eu/doi/10.2777/00139. Accessed January 12, 2021. EC. 2021a. Artificial intelligence (AI): Artificial intelligence research, funding, policy, and related publications. Available at https://ec.europa.eu/info/research-and- 149 PREPUBLICATION COPY—Uncorrected Proofs

innovation/research-area/industrial-research-and-innovation/key-enabling- technologies/artificial-intelligence-ai_en. Accessed January 21, 2021. EC. 2021b. European Open Science Cloud. Available at https://ec.europa.eu/info/research-and- innovation/strategy/goals-research-and-innovation-policy/open-science/european-open- science-cloud-eosc_en. Accessed January 12, 2021. Einav, L., and J. Levin. 2014. Economics in the age of big data. Science 346(6210). doi: 10.1126/science.1243089. EOSC (European Open Science Cloud). 2020. SRIA (Strategic Research and Innovation Agenda) of the European Open Science Cloud (EOSC). Available at https://www.eoscsecretariat.eu/sites/default/files/eosc-sria-v09.pdf. Accessed July 19, 2021. EU (European Union). 2018. General data protection regulation (GDPR). Available at https://gdpr-info.eu/. Accessed December 10, 2021. European Parliament Panel for the Future of Science and Technology. 2019. How the general data protection regulation changes the rules of scientific research. Available at https://www.europarl.europa.eu/RegData/etudes/STUD/2019/634447/EPRS_STU(2019)6 34447_EN.pdf. Accessed June 19, 2020. Feingenbuam, J., and D. J. Weitzner. 2018. On the incommensurability of laws and technical mechanisms: Or, what cryptography can’t do. In Security Protocols 2018. Lecture Notes in Computer Science, V. Matyáš, P. Švenda, F. Stajano, B. Christianson, and J. Anderson (eds.). Cham: Springer, pp. 266-279. Available at https://link.springer.com/chapter/10.1007/978-3-030-03251-7_31#citeas. Accessed June 19, 2020. Ferreira da Silva, R., H. Casanova, K. Chard, T. Coleman, D. Laney, D. Ahn, S. Jha, D. Howell, S. Soiland-Reyes, I. Altintas, D. Thain, R. Filgueira, Y. Babuji, R. Badia, B. Balis, S. Caino-Lores, S. Callaghan, F. Coppens, M. Crusoe, K. De, F. Di Natale, T. M. A. Do; B. Enders, T. Fahringer, A. Fouilloux, G. Fursin, A. Gaignard, A. Ganose, D. Garijo, S. Gesing, C. Goble, A. Hassan, S. Huber, D. S. Katz, U. Leser, D. Lowe, B. Ludascher, K. Maheshwari, M. Malawski, R. Mayani, K. Mehta, A. Merzky, T. Munson, J. Ozik, L. Pottier, S. Ristov, M. Roozmeh, R. Souza, F. Suter, B. Tovar, M. Turilli, K, Vahi, A. 150 PREPUBLICATION COPY—Uncorrected Proofs

Vidal-Torreira, W. Witcup, M. Wilde, A. Williams, M. Wolf, J. Wozniak. 2021a. Workflows Community Summit: Advancing the state-of-the-art of scientific workflows management systems research and development. Technical Report. Zenodo. doi: 10.5281/zenodo.4915801. Ferreira da Silva, R., H. Casanova, K. Chard, D. Laney, D. Ahn, S. Jha, C. Goble, L. Ramakrishnan, L. Peterson, B. Enders, D. Thain, I. Altintas, Y. Babuji, R. Badia, V. Bonazzi, T. Coleman, M. Crusoe, E. Deelman, F. Di Natale, P. Di Tommaso, T. Fahringer, R. Filgueira, G. Fursin, A. Ganose, B. Gruning, D. S. Katz, O. Kuchar, A. Kupresanin, B. Ludascher, K. Maheshwari, M. Mattoso, K. Mehta, T. Munson, J. Ozik, T. Peterka, L. Pottier, T. Randles, S. Soiland-Reyes, B. Tovar, M. Turilli, T. Uram, K. Vahi, M. Wilde, M. Wolf, and J. Wozniak. 2021b. Workflows Community Summit: Bringing the scientific workflows community together. Technical Report. Zenodo. doi: 10.5281/zenodo.4606958. Fox, G. 2020. Status and trajectory of supporting tools and systems. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Freedman, A. 2019. Weather is turning into big business. And that could be trouble for the public. Washington Post. Available at https://www.washingtonpost.com/business/2019/11/25/weather-is-big-business-its- veering-toward-collision-with-federal-government. Accessed April 14, 2021. Galaxy. 2021a. 30,000 users. Blog post. Available at https://galaxyproject.eu/posts/2021/04/25/30000user/. Accessed December10, 2021. Galaxy. 2021b. Global platform for the analysis of SARS-CoV-2 data: Genomics, cheminformatics, and proteomics. 2021. Available at https://covid19.galaxyproject.org/. Accessed November 23, 2021. Gil, Y., D. Garijo, V. Ratnakar, R. Mayani, R. Adusumilli, H. Boyce, A. Srivastava, and P. Mallick. 2017. Towards continuous scientific data analysis and hypothesis evolution. Proceedings of the AAAI Conference on Artificial Intelligence 31(1). Available at https://ojs.aaai.org/index.php/AAAI/article/view/11157. Accessed May 21, 2021. 151 PREPUBLICATION COPY—Uncorrected Proofs

Gil, Y., S. A. Pierce, H. Babaie, A. Banerjee, K. Borne, G. Bust, M. Cheatham, I. Ebert-Uphoff, C. Gomes, M. Hill, J. Horel, L. Hsu, J. Kinter, C. Knoblock, D. Krum, V. Kumar, P. Lermusiaux, Y. Liu, C. North, V. Pankratius, S. Peters, B. Plale, A. Pope, S. Ravela, J. Restrepo, A. Ridley, H. Samet, S. Shekhar, K. Skinner, P. Smyth, B. Tikoff, L. Yarmey, and J. Zhang. 2019. Intelligent systems for geosciences. Communications of the ACM 62(1). doi: 10.1145/3192335. Gillespie, T., P. J. Boczkowski, and K. A. Foot. 2014. Media technologies: Essays on communication, materiality, and society. Cambridge, MA: MIT Press. GitHub. 2021. Existing workflow systems. Available at https://github.com/common-workflow- language/common-workflow-language/wiki/Existing-Workflow-systems. Accessed September 9, 2020. Global Commission on Adaptation. 2019. Adapt now: A global call for leadership on climate resilience. Available at https://gca.org/wp- content/uploads/2019/09/GlobalCommission_Report_FINAL.pdf. Accessed August 23, 2021. GO FAIR. 2016. FAIR Principles. Available at https://www.go-fair.org/fair-principles. Accessed June 19, 2020. Goble, C., S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, M. R. Crusoe, K. Peters, and D. Schober. 2020. FAIR computational workflows. Data Intelligence 2(1-2):108–121. doi: 10.1162/dint_a_00033. Google. 2021. Artificial intelligence at Google: Our principles. Available at https://ai.google/principles/. Accessed November 29, 2021. Google Cloud. 2020. Introduction to AI Platform. Available at https://cloud.google.com/ai- platform/docs/technical-overview. Accessed June 22, 2020. Granger, B. 2020. Status and Trajectory of Supporting Tools and Systems. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Hacking, I. 1983. Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge University Press. 152 PREPUBLICATION COPY—Uncorrected Proofs

Hardisty, A., and P. Wittenburg. 2020. Canonical Workflow Framework for Research (CWFR), version 2: Position Paper. Working Paper. December. Center for Open Science. https://osf.io/9e3vc/. Hattrick-Simpers, J. 2020. How robots could teach us to trust AI. NIST Taking Measure blog. Available at https://www.nist.gov/blogs/taking-measure/how-robots-could-teach-us-trust- ai. Accessed June 18, 2020. Hein Lab. 2020. Designing and applying in situ analysis to enable chemical discovery. Available at https://groups.chem.ubc.ca/jheints1. Accessed June 18, 2020. Hepworth, K. J., and C. Church. 2018. Racism in the machine: Visualization ethics in the digital humanities. Digital Humanities Quarterly 12(4). Available at http://www.digitalhumanities.org/dhq/vol/12/4/000408/000408.html. Accessed June 22, 2020. Hey, T., K. Butler, S. Jackson, and J. Thiyagalingam. 2020. The Royal Society Publishing. Available at https://doi.org/10.1098/rsta.2019.0054. Accessed November 21, 2020. Hill, A. C. 2021. COVID’s lesson for climate research: Go local. Nature 595:9. doi: 10.1038/d41586-021-01747-9. Hill, J., G. Mulholland, K. Persson, R. Seshadri, C. Wolverton, and B. Meredig. 2016. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bulletin 41:399–409. Available at https://perssongroup.lbl.gov/papers/hill2016- mrsbull.pdf. Accessed June 18, 2020. Hinsen, K. 2019. Dealing with software collapse. Computing in Science & Engineering 21(3):104-108. doi: 10.1109/MCSE.2019.2900945. Holdren, J. P. 2013. Increasing access to the results of federally funded scientific research. Memorandum to Heads of Executive Departments and Agencies. Washington, DC: Office of Science and Technology Policy. Hunt, E. 2016. Tay, Microsoft’s AI chatbot, gets a crash course in racism from Twitter. The Guardian. Available at https://www.theguardian.com/technology/2016/mar/24/tay- microsofts-ai-chatbot-gets-a-crash-course-in-racism-from-twitter. Accessed June 22, 2020. 153 PREPUBLICATION COPY—Uncorrected Proofs

Hyysalo, J., M. Oivo. and P. Kuvaja. 2017. A design theory for cognitive workflow systems. International Journal of Software Engineering and Knowledge Engineering 27(1):125– 151. doi: 10.1142/S0218194017500061. IBM. 2021. Workflow. Available at https://www.ibm.com/cloud/learn/workflow. Accessed December 3, 2021. IEEE. 2020. IEEE 2791-2020—IEEE standard for bioinformatics analyses generated by high- throughput sequencing (HTS) to facilitate communication. Available at https://standards.ieee.org/standard/2791-2020.html. Accessed November 24, 2021. IMI (Innovative Medicines Initiative). 2021. MELLODDY project factsheet. Available at: https://www.imi.europa.eu/projects-results/project-factsheets/melloddy. Accessed November 30, 2021. Institute for Ethical AI & Machine Learning. 2021. The responsible machine learning principles. Available at https://ethical.institute/principles.html. Accessed November 29, 2021. IPCC (Intergovernmental Panel on Climate Change). 2021. Climate change 2021: The physical science basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, V. Masson-Delmotte, P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J. B. R. Matthews, T. K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.). Cambridge University Press. Ioannidis, J. P. A., K. W. Boyack, H. Small, A. A. Sorensen, and R. Klavans. 2014. Bibliometrics: Is your most cited work your best? Nature 514:561–562. doi: 10.1038/514561a Jones, S. 2021. A typology of the components of the Global Open Research Commons. Research Data Alliance. Available at https://www.rd-alliance.org/plenaries/rda-17th-plenary- meeting-edinburgh-virtual/typology-components-global-open-research. Accessed November 20, 2020. Juric, M., E. Bellm, and L. Guy. 2019. Machine learning applications with LSST: From data processing to knowledge discovery. American Astronomical Society Meeting Abstracts No. 233. Available at https://ui.adsabs.harvard.edu/abs/2019AAS...23312601J/abstract. Accessed June 22, 2020. 154 PREPUBLICATION COPY—Uncorrected Proofs

Kahkoska, A. R., T. J. Abrahamsen, G. C. Alexander, T. D. Bennett, C. G. Chute, M. A. Haendel, K. R. Klein, H. Mehta, J. D. Miller, R. A. Moffitt, T. Stürmer, K. Kvist, J. B. Buse, and N3C Consortium, 2021. Association between glucagon-like peptide 1 receptor agonist and sodium–glucose cotransporter 2 inhibitor use and COVID-19 outcomes. Diabetes Care 44(7):1564–1572. doi: 10.2337/dc21-0065. Kalnay, E. 2003. Atmospheric modeling, data assimilation and predictability. Cambridge, UK: Cambridge University Press. Kangas, J. D., A. W. Naik, and R. F. Murphy. 2014. Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15(143). doi: 10.1186/1471- 2105-15-143. Kim, B., M. Wattenberg, J. Gilmer, C. J. Cai, J. Wexler, F. Viegas, and R. A. Sayres. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden. PMLR 80:2668–2677. Available at https://research.google/pubs/pub47077. Accessed June 22, 2020. Kizilcec, R., J. Reich, M. Yeomans, C. Dann, E. Brunskill, G. Lopez, S. Turkay, J. J. Williams, and D. Tingley. 2020. Scaling up behavioral science interventions in online education. Proceedings of the National Academy of Sciences of the United States of America 117(26):14900–14905. doi: 10.1073/pnas.1921417117. Koolen, M., S. Kumpulainen, and L. Melgar-Estrada. 2020. A workflow analysis perspective to scholarly research tasks. In 2020 Conference on Human Information Interaction and Retrieval (CHIIR ’20). March 14–18, 2020, Vancouver, BC, Canada. doi: 10.1145/3343413.3377969. Kusnezov, D. 2020. National Academies workflow discussion. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Lane, J. 2020. Status and trajectory of supporting tools and systems. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17.. 155 PREPUBLICATION COPY—Uncorrected Proofs

Lane, J., I. Mulvany, and P. Nathan. 2020. Rich search and discovery for research datasets: Building the next generation of scholarly infrastructure. Sage Press. Available at https://wagner.nyu.edu/impact/research/publications/rich-search-and-discovery-for- research-datasets-building-next. Accessed April 14, 2021. Lawrence, P. A. 2007. The mismeasurement of science. Current Biology 17:R583–R585. doi: 10.1016/j.cub.2007.06.014. L’Heureux, A., K. Grolinger, H. F. Elyamany and M. A. M. Capretz. 2017. Machine learning with big data: Challenges and approaches. IEEE Access 5:7776–7797. doi: 10.1109/ACCESS.2017.2696365. Maier, W., S. Bray, M. van den Beek, D. Bouvier, N. Coraor, M. Miladi, B. Singh, J. Rambla De Argila, D. Baker, N. Roach, S. Gladman, F. Coppens, D. P. Martin, A. Lonie, B. Grüning, S. L. Kosakovsky Pond, and A. Nekrutenko. 2021. Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring. bioRxiv (preprint). doi: 10.1101/2021.03.25.437046. Martinez, P. A. 2021. FAIR principles for research software (FAIR4RS principles). Available at https://rd-alliance.org/group/fair-research-software-fair4rs-wg/outcomes/fair-principles- research-software-fair4rs. Accessed July 19, 2021. McNutt, M. 2017. Convergence in the geosciences. GeoHealth 1:2– 3. doi: 10.1002/2017GH000068. McNutt, M., M. Bradford, J. M. Drazen, B. Hanson, B. Howard, K. H. Jamieson, V. Kiermer, E. Marcus, B. K. Pope, R. Schekman, S. Swaminathan, P. J. Stang, and I. M. Verma. 2018. Proceedings of the National Academy of Sciences of the United States of America 115(11):2557–2560. doi: 10.1073/pnas.1715374115. McPhillips, T., T. Song, T. Kolisnik, S. Aulenbach, K. Belhajjame, K. Bocinsky, Y. Cao, F. Chirigati, S. Dey, J. Freire, D. Huntzinger, C. Jones, D. Koop, P. Missier, M. Schildhauer, C. Schwalm, Y. Wei, J. Cheney, M. Bieda, and B. Ludaescher. 2015. YesWorkflow: A user-oriented, language-independent tool for recovering workflow information from scripts. arXiv:1502.02403. doi: 10.48550/arXiv.1502.02403. 156 PREPUBLICATION COPY—Uncorrected Proofs

McQueen, T. 2020. Materials discovery. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Mehra, M. R., S. S. Desai, F. Ruschitzka, and A. N. Patel. 2020. RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. The Lancet. doi: 10.1016/S0140- 6736(20)31180-6. Mello, M. M., G. Triantis, R. Stanton, E. Blumenkranz, and D. M. Studdert. 2020. Waiting for data: Barriers to executing data use agreements. Science 367(6474):150–152. doi: 10.1126/science.aaz7028. Mons, B. 2020. Invest 5% of research funds in ensuring data are reusable. Nature 578(491). doi: https://doi.org/10.1038/d41586-020-00505-7 Moundas, C., and D. Peloquin. 2020. Insight: California bill clarifies privacy law’s ambiguities for medical, research communities. Bloomberg Law. Available at https://news.bloomberglaw.com/health-law-and-business/insight-california-bill-clarifies- privacy-laws-ambiguities-for-medical-research-communities. Accessed June 22, 2020. Murphy, R. F. 2011. An active role for machine learning in drug development. Nature Chemical Biology 7(6):327–330. doi: 10.1038/nchembio.576. Murphy, R. F. 2020. Self-driving instruments: The need for closed loop, AI-driven biomedical research. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16– 17. Naik, A. W., J. D. Kangas, D. P. Sullivan, and R. F. Murphy. 2016. Active machine learning- driven experimentation to determine compound effects on protein patterns. eLife 5. doi: 10.7554/eLife.10047. Nangia, U., and D. S. Katz. 2017. Understanding software in research: Initial results from examining nature and a call for collaboration. In IEEE 13th International Conference on e-Science (e-Science), pp. 486–487. doi: 10.1109/eScience.2017.78. 157 PREPUBLICATION COPY—Uncorrected Proofs

NAS (National Academy of Sciences). 2018. The frontiers of machine learning: 2017 Raymond and Beverly Sackler U.S.-U.K. scientific forum. Washington, DC: The National Academies Press. doi: 10.17226/25021. NAS, NAE, and IOM (National Academy of Sciences, National Academy of Engineering, and Institute of Medicine). 2009. Ensuring the integrity, accessibility, and stewardship of research data in the digital age. Washington, DC: The National Academies Press. doi: 10.17226/12615. NASEM (National Academies of Sciences, Engineering, and Medicine). 2012. A National strategy for advancing climate modeling. Washington, DC: The National Academies Press. Doi: 10.17226/13430. NASEM (National Academies of Sciences, Engineering, and Medicine). 2015. Enhancing the effectiveness of team science. Washington, DC: The National Academies Press. doi: 10.17226/19007. NASEM. 2017. Fostering integrity in research. Washington, DC: The National Academies Press. doi: 10.17226/21896. NASEM. 2018a. Artificial intelligence and machine learning to accelerate translational research. Washington, DC: The National Academies Press. NASEM. 2018b. Open science by design: Realizing a vision for 21st century research. Washington, DC: The National Academies Press. doi: 10.17226/25116. NASEM. 2018c. The frontiers of machine learning: 2017 Raymond and Beverly Sackler U.S.- U.K. scientific forum. Washington, DC: The National Academies Press. doi: 10.17226/25021. NASEM. 2019a. Frontiers of materials research: A decadal survey. Washington, DC: The National Academies Press. doi: 10.17226/25244. NASEM. 2019b. Reproducibility and replicability in science. Washington, DC: The National Academies Press. doi: 10.17226/25303. NASEM. 2020a. Long-term use of biomedical research. Washington, DC: The National Academies Press. doi: 10.17226/25653. NASEM. 2020b. Neuroscience data in the cloud. Washington, DC: The National Academies Press. doi: 10.17226/25653. 158 PREPUBLICATION COPY—Uncorrected Proofs

NASEM. 2021. Developing a toolkit for fostering open science practices: Proceedings of a workshop. Washington, DC: The National Academies Press. doi: 10.17226/26308. Nathan, P. 2020. The future of AI in rich context. Chapter 12 in Rich search and discovery for research datasets: Building the next generation of scholarly infrastructure, J. Lane, I. Mulvany, and P. Nathan (eds). Sage Press. Nature. 2017. Integrity starts with the health of research groups. 545:5–6. Available at https://www.nature.com/news/integrity-starts-with-the-health-of-research-groups- 1.21921. Accessed January 13, 2021. Nature. 2021. Reporting standards and availability of data, materials, code and protocols. Available at https://www.nature.com/nature-research/editorial-policies/reporting- standards. Accessed February 13, 2021. Nature Physics. 2019. A problem shared is a problem halved (editorial). 15(107). doi: 10.1038/s41567-019-0434-7. NCATS (National Center for Advancing Translational Sciences). 2021. About the National COVID Cohort Collaborative. National Institutes of Health . Available at https://ncats.nih.gov/n3c/about. Accessed January 13, 2021. NIH (National Institutes of Health). 2018. NIH Data Commons Pilot Phase Consortium. Available at https://commonfund.nih.gov/commons/awardees. Accessed August 21, 2021. NIH. 2020. Artificial Intelligence for Biomeical Excellence (AIBLE). Available at https://dpcpsi.nih.gov/sites/default/files/CoC_May_2020_1.05PM_Concept_Clearance_A IBLE_Brennan_508.pdf. Accessed April 4, 2020. NIH. 2021. Common Fund Homepage. Available at https://commonfund.nih.gov. Accessed April 14, 2021. NISO (National Information Standards Organization). 2021. Reproducibility badging and definitions. Available at https://www.niso.org/publications/rp-31-2021-badging. Accessed November 20, 2020. NIST (National Institute of Standards and Technology). 2021. Research Data Framework (RDaF). Available at https://www.nist.gov/programs-projects/research-data-framework- rdaf. Accessed June 3, 2020. 159 PREPUBLICATION COPY—Uncorrected Proofs

NITRD (Networking and Information Technology Research and Development Program). 2020. Pioneering the future advanced computing ecosystem: A strategic plan. Available at https://www.nitrd.gov/pubs/Future-Advanced-Computing-Ecosystem-Strategic-Plan- Nov-2020.pdf. Accessed January 18, 2021. NSF (National Science Foundation). 2007. Cyberstructure vision for 21st century discovery. Arlington, VA: National Science Foundation. Available at https://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf. Accessed June 22, 2020. NSF, 2020a. NSF big idea: Growing convergence research. Available at https://www.nsf.gov/od/oia/convergence/additional-resources/GCR-Powerpoint- Webinar-Jan-2020.pdf. Accessed November 15, 2020. NSF. 2020b. NSF’s ten big ideas. Available at https://www.nsf.gov/news/special_reports/big_ideas. Accessed June 22, 2020. NSF. 2020c. Workshop on Smart Cyberinfrastructure. Available at http://smartci.sci.utah.edu/. Accessed February 20, 2021. NSTL (National Science & Technology Council). 2019. National strategic computing initiative update: Pioneering the future of computing. Available at https://www.whitehouse.gov/wp-content/uploads/2019/11/National-Strategic-Computing- Initiative-Update-2019.pdf. Accessed June 22, 2020. Nugent, R. 2020. Automating data science: Think about the human-machine interface. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. O’Brien, D. T., D. Offenhuber, J. Baldwin-Philippi, M. Sands, and E. Gordon. 2017. Uncharted territoriality in coproduction: The motivations for 311 reporting. Journal of Public Administration Research & Theory 27:320–335. doi: 10.1093/jopart/muw046 OECD (Organisation for Economic Co-operation and Development). 2020. Building digital workforce capacity and skills for data-intensive science. OECD Science, Technology, and Innovation Policy Papers 90. Available at https://www.oecd-ilibrary.org/science-and- technology/building-digital-workforce-capacity-and-skills-for-data-intensive- science_e08aa3bb-en. Accessed July 19, 2021. 160 PREPUBLICATION COPY—Uncorrected Proofs

O’Hara, A. 2020. Model data use agreements: A practical guide. In Handbook on using administrative data for research and evidence-based policy, S. Cole, I. Dhaliwal, A. Sautmann, and L. Vilhuber (eds). Abdul Latif Jameel Poverty Action Lab. Available at https://admindatahandbook.mit.edu/book/v1.0-rc3/dua.html. Accessed December 12, 2020. OSTP. 2020. Public responses received for request for information 85 FR 3085: Draft desirable characteristics of repositories for managing and sharing data resulting from federally funded research. Available at: https://www.whitehouse.gov/wp- content/uploads/2017/11/Desirable-Characteristics-RFC-Comments.pdf. Accessed November 17, 2021. OSTP. 2021. The Biden Administration launches the National Artificial Intelligence Research Resource Task Force. Press release. Available at https://www.whitehouse.gov/ostp/news- updates/2021/06/10/the-biden-administration-launches-the-national-artificial- intelligence-research-resource-task-force/. Accessed March 3, 2022. ODSC Community (Open Data Science Conference Community). 2021. Building a Robust Data Pipeline with the “dAG Stack”: dbt, Airflow, and Great Expectations. Blog post. March 1. Available at https://opendatascience.com/building-a-robust-data-pipeline-with-the-dag- stack-dbt-airflow-and-great-expectations/. Accessed April 19, 2022. Owens, B. 2011. Reliability of ‘new drug target’ claims called into question. Nature News Blog. Available at http://blogs.nature.com/news/2011/09/reliability_of_new_drug_target.html. Accessed November 14, 2020. Perkel, J. M. 2019. Workflow systems turn raw data into scientific knowledge. Nature 573:149– 150. doi: 10.1038/d41586-019-02619-z. Persson, K. 2020a. Conversation with Shreyas Cholia and Tapio Schneider, May 27, 2020. Persson, K. 2020b. Making a material world better, faster now: Q&A with materials project director Kristin Persson. Available at https://eta.lbl.gov/news/article/making-material- world-better-faster. Accessed June 17, 2020. Pew Research Center. 2019. Climate change still seen as the top global threat, but cyberattacks a rising concern. Findings of Pew Research Center survey. Available at 161 PREPUBLICATION COPY—Uncorrected Proofs

https://www.pewresearch.org/global/2019/02/10/climate-change-still-seen-as-the-top- global-threat-but-cyberattacks-a-rising-concern. Accessed June 22, 2020. Pfeiffer, J. K., and T. S. Dermody. 2021. Are too many scientists studying covid? Knowable Magazine. Available at https://knowablemagazine.org/article/health-disease/2021/are- too-many-scientists-studying-covid. Accessed November 14, 2020. Plale, B. 2020. AI and workflows for accelerated science. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Plemmons, D. K., E. N. Baranski, K. Harp, D. D. Lo, C. K. Soderberg, T. M. Errington, B. A. Nosek, and K. M. Esterling. 2020. A randomized trial of a lab-embedded discourse intervention to improve research ethics. Proceedings of the National Academy of Sciences of the United States of America 117(3):1389–1394. doi: 10.1073/pnas.1917848117. Project Jupyter. 2020. Estimate of public Jupyter Notebooks on GitHub. Available at https://nbviewer.jupyter.org/github/parente/nbestimate/blob/master/estimate.ipynb. Accessed June 3, 2020. Queralt-Rosinach, N., R. Kaliyaperumal, C. Bernabé, Q. Long, S. A. Joosten, H. Jan van der Wijk, E. L. A. Flikkenschild, K. Burger, A. Jacobsen, B. Mons, M. Roos, BEAT-COVID Group, COVID-19 LUMC Group. 2021. Applying the FAIR principles to data in a hospital: Challenges and opportunities in a pandemic. medRxiv 2021.08.13.21262023 (Preprint). doi: 0.1101/2021.08.13.21262023. Rasp, S., M. S. Pritchard, and P. Gentine. 2018. Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences of the United States of America 115(39):9684–9689. doi: 10.1073/pnas.1810286115. Red Hat OpenShift. 2020. AI/ML Workflows on OpenShift. Demonstration Video. Available at https://demo.openshift.com/en/latest/aiml-workflows/. Accessed April 19, 2022. RDA (Research Data Alliance). 2020. RDA COVID-19 guidelines and recommendations (draft versions). Available at https://doi.org/10.15497/RDA00046. Accessed September 7, 2021. RDA. 2021a. Defining FAIR for machine learning (ML). Available at https://www.rd- alliance.org/defining-fair-machine-learning-ml. Accessed August 23, 2021. 162 PREPUBLICATION COPY—Uncorrected Proofs

RDA. 2021b. FAIR Principles for Research Software (FAIR4RS Principles). Available at https://doi.org/10.15497/RDA00065. Accessed August 23, 2021. RDA-CODATA Legal Interoperability Interest Group. 2016. Legal interoperability of research data: Principles and implementation guidelines. Zenodo. Available at http://doi.org/10.5281/zenodo.162241. Reinsel, D., J. Gantz, and J. Rydning. 2018. The digitization of the world: From edge to core. IDC White Paper. Available at https://www.seagate.com/files/www-content/our- story/trends/files/idc-seagate-dataage-whitepaper.pdf. Accessed June 22, 2020. Reiter, T., P. T. Brooks, L. Irber, S. E. K. Joslin, C. M. Reid, C. Scott, C. T. Brown, and N. T. Pierce-Ward. 2021. Streamlining data-intensive biology with workflow systems. Gigascience 10(1):giaa140. doi: 10.1093/gigascience/giaa140. Retraction Watch. 2021. Retracted coronavirus (COVID-19) papers. Available at https://retractionwatch.com/retracted-coronavirus-covid-19-papers/. Accessed June 3, 2020. Rothstein, M. A., and S. A. Tovino. 2019. Privacy risks of interoperable electronic health records: Segmentation of sensitive information will help. Journal of Law, Medicine & Ethics 47(4):771–777. doi:10.1177/1073110519897791. Royal Society and Alan Turing Institute. 2019. The AI revolution in scientific research. Discussion paper. Available at https://royalsociety.org/-/media/policy/projects/ai-and- society/AI-revolution-in-science.pdf. Accessed December 7, 2021. Rubin, R. 2020. NIH launches platform to serve as depository for COVID-19 medical data. Journal of the American Medical Association 324(4):326. doi: 10.1001/jama.2020.12646. Rudin, C., and J. Radin. 2019. Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition. Harvard Data Science Review 1(2). doi: 10.1162/99608f92.5a8a3a3d. Russell, A. 2020. Remarks at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16– 17. 163 PREPUBLICATION COPY—Uncorrected Proofs

Sansone, S. A., P. McQuilton, P. Rocca-Serra, A. Gonzalez-Beltran, M. Izzo, A. L. Lister, and M. Thurston. 2019. FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology 37:358–367. doi: 10.1038/s41587-019-0080-8. Schneider, T., S. Lan, A. Stuart, and J. Teixeira. 2017a. Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophysical Research Letters 44:12396–12417. doi: 10.1002/2017GL076101 Schneider, T., J. Teixeira, C. S. Bretherton, F. Brient, K. G. Pressel, C. Schär, and A. P. Siebesma. 2017b. Climate goals and computing the future of clouds. Nature Climate Change 7:3–5. doi: 10.1038/nclimate3190. Scholtens, S., M. Jetten, J. Böhmer, C. Staiger, I. Slouwerhof, M. van der Geest, and C.W.G. van Gelder. 2019. Final report: Towards FAIR data steward as profession for the lifesciences. Report of a ZonMw funded collaborative approach built on existing expertise. Zenodo. doi:10.5281/zenodo.3471707 Service, R. F. 2019. AIs direct search for materials breakthroughs. Science 366(6471):1295– 1296. doi: 10.1126/science.366.6471.1295. Sharafeldin, N., B. Bates, Q. Song, V. Madhira, Y. Yan, S. Dong, E. Lee, N. Kuhrt, Y. R. Shao, F. Liu, T. Bergquist, J. Guinney, J. Su, and U. Topaloglu. 2021. Outcomes of COVID-19 in patients with cancer: Report From the National COVID Cohort Collaborative (N3C). Journal of Clinical Oncology 39(20):2232–2246. doi: 10.1200/JCO.21.01074. Siebesma, A. P., C. S. Bretherton, A. Brown, A. Chlond, J. Cuxart, P. G. Duynkerke, and D. E. Stevens. 2003. A large eddy simulation intercomparison study of shallow cumulus convection. Journal of Atmospheric Science 60(10):1201–1219. doi: 10.1175/1520- 0469(2003)60<1201:ALESIS>2.0.CO;2. Smithies, J., C. Westling, A-M. Sichani, P. Mellen and A. Ciula. 2019. Managing 100 Digital Humanities Projects: Digital scholarship and archiving in King’s Digital Lab. Digital Humanities Quarterly, 13 (1). Available at http://www.digitalhumanities.org/dhq/vol/13/1/000411/000411.html. Accessed April 19, 2022. Somnath, S., C. R. Smith, N. Laanait, R. K. Vasudevan, A. Levlev, A. Belianinov, A. R. Lubini, M. Shankar, S. V. Kalinin, and S. Jesse. 2019. USID and pycroscopy—Open frameworks 164 PREPUBLICATION COPY—Uncorrected Proofs

for storing and analyzing spectroscopic and imaging data. xrxiv doi: 10.48550/ARXIV.1903.09515. Sonntag, M., D. Karastoyanova, and E. Deelman. 2010. Bridging the gap between business and scientific workflows: Humans in the loop of scientific workflows. In Proceedings of the 6th IEEE (Institute of Electrical and Electronics Engineers) International Conference on e-Science. Available at https://ieeexplore.ieee.org/document/5693919?arnumber=5693919&tag=1. Accessed June 22, 2020. Stall, S., L. Yarmey, J. Cutcher-Gershenfeld, B. Hanson, K. Lehnert, B. Nosek, M. Parsons, E. Robinson, and L. Wyborn. 2019. Make scientific data FAIR. Nature 570:27–29. doi: 10.1038/d41586-019-01720-7. Stephan, P. 2012. Research efficiency: Perverse incentives. Nature 484:29–31. doi: 10.1038/484029a Stephens, T. 2020. Powerful new AI technique detects and classifies galaxies in astronomy image data. ScienceDaily. Available at https://phys.org/news/2020-05-powerful-ai- technique-galaxies-astronomy.html. Accessed May 21, 2020. Stevens, B., C.-H. Moeng, A. S. Ackerman, C. S. Bretherton, A. Chlond, S. de Roode, and P. Zhu. 2005. Evaluation of large-eddy simulations via observations of nocturnal marine stratocumulus. Monthly Weather Review 133:1443–1462. doi: 0.1175/MWR2930.1. Stodden, V. 2020. Cyberinfrastructure shapes scientific outcomes in crucial and largely unrecognized ways. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Stoyanovich, J., B. Howe, and H.V. Jagadish. 2020a. Responsible data management. Available at http://www.vldb.org/pvldb/vol13/p3474-stoyanovich.pdf. Proceedings of the VLDB Endowment 13(12):3474–3488. doi: 10.14778/3415478.3415570. Stoyanovich, J., J. J. Van Bavel, and T. West. 2020b. The imperative of interpretable machines. Nature Machine Intelligence 2:197–199. doi: 10.1038/s42256-020-0171-8. Strassler, M., and J. Thaler. 2019. Slow and steady. Nature Physics 15(725). doi: 10.1038/s41567-019-0628-z. 165 PREPUBLICATION COPY—Uncorrected Proofs

Sun, Q., Y. Liu, W. Tian, Y. Guo, and B. Li. 2019. UMDISW: A universal multi-scientific workflow framework for the whole life cycle of scientific data. In Lecture notes in computer science, Vol. 11459. Available at https://link.springer.com/chapter/10.1007/978-3-030-32813-9_14. Accessed February 16, 2021. Sutton, R., and A. Barto. 2018. Reinforcement learning: An introduction, 2nd ed. Cambridge, MA: MIT Press. Szalay, A. S. 2017. From SkyServer to SciServer. Annals of the American Academy of Political and Social Science 675(1):202–220. doi: 10.1177/0002716217745816 Szalay, A. 2019. The era of surveys and the fifth paradigm of science. Available at https://ui.adsabs.harvard.edu/abs/2019AAS...23340001S/abstract. Accessed May 19, 2020. Szalay, A. 2020. Scalable data aggregation for science. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Taylor, I. J., E. Deelman, D. B. Gannon, and M. Shields (eds.). 2007. Workflows for e-Science: Scientific workflows for grids. Springer. Teitelbaum, M. S. 2008. Structural disequilibria in biomedical research. Science 321(5859):644– 645. doi: 10.1126/science.1160272. Turner, K., and P. Lambert. 2014. Workflows for quantitative data analysis in the social sciences. Los Angeles: Sage. Available at https://core.ac.uk/download/pdf/20323257.pdf. Accessed April 14, 2021. UKRI (UK Research and Innovation). 2016. Concordat on open research data. Available at https://www.ukri.org/wp-content/uploads/2020/10/UKRI-020920- ConcordatonOpenResearchData.pdf. Accessed April 14, 2021. UKRI. 2017. UK leads on new 8.5m European scheme to improve access to research data. Available at https://webarchive.nationalarchives.gov.uk/20200923140750/https://stfc.ukri.org/news/u k-leads-on-new-european-scheme. Accessed April 14, 2021. 166 PREPUBLICATION COPY—Uncorrected Proofs

UKRI. 2020. The UK’s research and innovation infrastructure opportunities to grow our capability. Available at https://www.ukri.org/wp-content/uploads/2020/10/UKRI- 201020-UKinfrastructure-opportunities-to-grow-our-capacity-FINAL.pdf. Accessed April 14, 2021. Van der Aalst, W. M. P., and K. van Hee. 2002. Workflow management: Models, methods, and systems. Cambridge, MA: MIT Press. Vidal, R. 2020. Opportunities for accelerating discovery: Mathematical and algorithmic issues. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Wasserstein, R. L., and N. A. Lazar. 2016 The ASA statement on p-values: Context, process, and purpose. American Statistician 70(2):129–133. doi: 10.1080/00031305.2016.1154108. Wiebels, K., and D. Moreau. 2021. Leveraging containers for reproducible psychological research. Advances in Methods and Practices in Psychological Science. 4(2). doi: 10.1177/25152459211017853. WCRI (World Conferences on Research Integrity). 2019. Hong Kong principles. Available at https://www.wcrif.org/guidance/hong-kong-principles. Accessed February 8, 2021. Waibel, G. 2018. Letter to the Community: CDL and Dryad partnership. Available at https://cdlib.org/cdlinfo/2018/05/30/letter-to-the-community-cdl-and-dryad-partnership/. Accessed November 9, 2021. Wang, L. L., K. Lo, Y. Chandrasekhar, R. Reas, J. Yang, D. Eide, K. Funk, R. M. Kinney, Z. Liu, W. C. Merrill, P. Mooney, D. A. Murdick, D. Rishi, J. Sheehan, Z. Shen, B. Stilson, A. D. Wade, K. Wang, C. Wilhelm, B. Xie, D. A. Raymond, D. S. Weld, O. Etzioni, and S. Kohlmeier. 2020. CORD-19: The COVID-19 Open Research Dataset. ArXiv. https://arxiv.org/pdf/2004.10706.pdf. Weitzner, D. J. 2020. Challenges of policy-aware data processing. Presentation at the Workshop on Opportunities for Accelerating Scientific Discovery: Realizing the Potential of Advanced and Automated Workflows, March 16–17. Wilkinson, M., M. Dumontier, I. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. Bonino da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez- 167 PREPUBLICATION COPY—Uncorrected Proofs

Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. C. 't Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons. 2016. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3:160018. doi: 10.1038/sdata.2016.18. Wood, A., M. Altman, A. Bembenek, M. Bun, M. Gaboardi, J. Honaker, K. Nissim, D. R. OBrien, T. Steinke, and S. Vadhan. 2018. Differential privacy: A primer for a non- technical audience. Vanderbilt Journal of Entertainment & Technology Law 21(1):209– 275. Available at https://salil.seas.harvard.edu/publications/differential-privacy-primer- non-technical-audience. Accessed March 5, 2021. Woodie, A. 2022. Big growth forecasted for big data. Datanami. January 11. Available at https://www.datanami.com/2022/01/11/big-growth-forecasted-for-big-data/. Accessed April 20, 2022. Wu, C., G. Wang, S. Hu, Y. Liu, H. Mi, Y. Zhou, Y-K. Guo, and T. Song. 2020. A data driven methodology for social science research with left-behind children as a case study. PLoS ONE 15(11):e0242483. doi: 10.1371/journal.pone.0242483. Yarkoni, T., D. Eckles, J. A. J. Heathers, M. C. Levenstein, P. E. Smaldino, and J. Lane. 2021. Enhancing and accelerating social science via automaton: Challenges and opportunities. Harvard Data Science Review. doi: 10.1162/99608f92.df2262f5. Yuval, J., and P. A. O’Gorman. 2020. Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions. Nature Communications 11(1):3295. doi: 10.1038/s41467-020-17142-3. 168 PREPUBLICATION COPY—Uncorrected Proofs

Next: Appendix A: Workshop Agenda »
Automated Research Workflows For Accelerated Discovery: Closing the Knowledge Discovery Loop Get This Book
×
Buy Paperback | $45.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The needs and demands placed on science to address a range of urgent problems are growing. The world is faced with complex, interrelated challenges in which the way forward lies hidden or dispersed across disciplines and organizations. For centuries, scientific research has progressed through iteration of a workflow built on experimentation or observation and analysis of the resulting data. While computers and automation technologies have played a central role in research workflows for decades to acquire, process, and analyze data, these same computing and automation technologies can now also control the acquisition of data, for example, through the design of new experiments or decision making about new observations.

The term automated research workflow (ARW) describes scientific research processes that are emerging across a variety of disciplines and fields. ARWs integrate computation, laboratory automation, and tools from artificial intelligence in the performance of tasks that make up the research process, such as designing experiments, observations, and simulations; collecting and analyzing data; and learning from the results to inform further experiments, observations, and simulations. The common goal of researchers implementing ARWs is to accelerate scientific knowledge generation, potentially by orders of magnitude, while achieving greater control and reproducibility in the scientific process.

Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop examines current efforts to develop advanced and automated workflows to accelerate research progress, including wider use of artificial intelligence. This report identifies research needs and priorities in the use of advanced and automated workflows for scientific research. Automated Research Workflows for Accelerated Discovery is intended to create awareness, momentum, and synergies to realize the potential of ARWs in scholarly discovery.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!