Skip to main content

Currently Skimming:

4 Data Curation
Pages 29-40

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 29...
... The Accelerated Metallurgy Project Only 61 of the 87 known metals are commonly available commercially, and identifying promising new alloys takes a years-long process of trial and error, plus more years to be developed into a commercial product. There is a huge number of potentially useful alloys -- roughly 32,000 potential ternary alloy systems alone -- yet because the exploration process is so daunting, 90 percent of them have never been explored.
From page 30...
... Each iteration expanded the knowledge base within the project, enabling generation of promising alloy compositions. This accumulated knowledge was then pooled into the new Virtual Alloy Library, created and hosted by Granta.
From page 31...
... Q&A June Lau, NIST, asked how often data standards were changed, and if that re quired updating data models. Goddin replied that Granta develops standards for particular domains, such as metals or composites, which incorporate established best practice from across multiple leading manufacturers in each domain, although a certain degree of customization always remains, owing to individual data l­ egacies.
From page 32...
... Similarly, NASA's Extragalactic Database houses and links observational galaxy data to facilitate combining and sharing data to yield new insights. Major telescopes, such as the Chandra or the Hubble, also have major data repositories associated with them.
From page 33...
... To facilitate data sharing, Peek suggested it would be useful to find common con ceptual grids through which materials data can be exchanged. In addition, he said it is important to use data models to enhance data discovery and physical under standing, to recognize the roles that culture and personalities play in the process of defining common modes, and to create and implement useful standards through an iterative deploy-and-revise process.
From page 34...
... He argued that using AI within the MGI framework will create a new materials science paradigm, dramatically decreasing the costs and time needed for new materials discovery. Green focused on three specific areas that are fueling MGI: advancements in materials data, AI-driven materials science, and the High-Throughput Experi­ mental Materials Collaboratory (HTE-MC)
From page 35...
... Compared to astronomy, Green said that materials science has many more variables that must be taken into account, different communities who have to agree on data standards, and more types of data that have to be standardized to create the much-needed paradigm shift that will enhance materials science data sharing. The Value of AI AI-driven materials science presents the opportunity to create a fully robotic system for discovery and validation of new materials, Green said.
From page 36...
... Green also pointed to the dramatic accelera tion that occurred with genomic sequencing, suggesting that a similar acceleration could be achievable in materials discovery, although creating the tools to make that possible will require significant investment. PANEL DISCUSSION ON DATA CURATION Susan Sinnott, Pennsylvania State University, introduced the speakers for the workshop's second panel discussion: B.S.
From page 37...
... Neural networks within data management platforms, especially those that include image synthesis and generation tools, hold promise for enhanc ing materials science research and discovery, although Manjunath cautioned that neural networks can still be fooled by malicious actors. Ichiro Takeuchi, University of Maryland Takeuchi spoke about the evolution of data challenges facing high-throughput and combinatorial materials science.
From page 38...
... OPEN DISCUSSION Following the opening remarks, panelists and participants discussed several challenges to applying data analytics and ML in materials science, ideas for im provements to current data sharing practices, and ways to ensure data accuracy. Challenges to Applying Data Analytics and ML Hull kicked off the discussion by asking each panelist to name the single big gest challenge to successfully applying data analytics and ML to new materials discovery.
From page 39...
... Another challenge, pointed out by Brian Storey, Olin College and Toyota ­Research Institute, is that, unlike genomic data, materials data sets are isolated for different applications, such as batteries, semiconductors, or glasses, which adds to the complexity. Takeuchi agreed that this is an inherent difficulty in materials science, and added that there are financial incentives, such as patents and profits, to avoid sharing data.
From page 40...
... Green noted that it is difficult to validate data without metadata, that curat ing data is prohibitively expensive, and that users should determine for themselves whether the data are sound. One participant added that comparing first-principles calculations can help ensure data reliability, and another stated that the metadata and data documentation are especially important for validation and interoper ability of materials data.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.