added a research item
Materials Data Analytics
Slides from an overview talk I gave to the NIST-CHiMaD Workshop on Materials Informatics and Industry (https://sites.northwestern.edu/chimadmaterialsinformatics/). My goal was to illustrate why machine learning is increasingly important to materials design, and to discuss how to use (and, potentially, misuse) machine learning.
This talk is an overview of work I've been involved in on designing metallic glasses with machine learning. I cover how we approached designing glasses for sputtering and injection molding, and the lessons we learned from our successes and failures.
The focus of this talk was on how machine learning and high-throughput/automated science work well together. The slides include discussion of some of my recent collaborations in this area, and conclude with an overview of the roadblocks preventing machine learning from being used more broadly in materials science.
This is a talk I gave at a workshop at Aalto University on the use of machine learning in materials design. It covers work from my PhD on the development of general-purpose approaches for machine learning on materials data, how these methods were used to design metallic glasses, and more recent work on building data infrastructure to support new machine learning efforts.
This presentation is a combination of several short talks I have given on the lessons I have learned about the practical aspects of doing data-driven materials science. It describes tools to support each step of a materials informatics project, how best to use them to make your work communicable and reproducible, and an outline on how to publish the software/data that support a paper.
While high-throughput density functional theory (DFT) has become a prevalent tool for materials discovery, it is limited by the relatively large computational cost. In this paper, we explore using DFT data from high-throughput calculations to create faster, surrogate models with machine learning (ML) that can be used to guide new searches. Our method works by using decision tree models to map DFT-calculated formation enthalpies to a set of attributes consisting of two distinct types: (i) composition-dependent attributes of elemental properties (as have been used in previous ML models of DFT formation energies), combined with (ii) attributes derived from the Voronoi tessellation of the compound's crystal structure. The ML models created using this method have half the cross-validation error and similar training and evaluation speeds to models created with the Coulomb matrix and partial radial distribution function methods. For a dataset of 435 000 formation energies taken from the Open Quantum Materials Database (OQMD), our model achieves a mean absolute error of 80 meV/atom in cross validation, which is lower than the approximate error between DFT-computed and experimentally measured formation enthalpies and below 15% of the mean absolute deviation of the training set. We also demonstrate that our method can accurately estimate the formation energy of materials outside of the training set and be used to identify materials with especially large formation enthalpies. We propose that our models can be used to accelerate the discovery of new materials by identifying the most promising materials to study with DFT at little additional computational cost.
With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method–dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, but there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method–sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path–dependent and that current physiochemical theories find challenging to predict.
In recent years, there has been a large effort in the materials science community to employ materials informatics to accelerate materials discovery or to develop new understanding of materials behavior. Materials informatics methods utilize machine learning techniques to extract new knowledge or predictive models out of existing materials data. In this review, we discuss major advances in the intersection between data science and atom-scale calculations with a particular focus on studies of solid-state, inorganic materials. The examples discussed in this review cover methods for accelerating the calculation of computationally-expensive properties, identifying promising regions for materials discovery based on existing data, and extracting chemical intuition automatically from datasets. We also identify key issues in this field, such as limited distribution of software necessary to utilize these techniques, and opportunities for areas of research that would help lead to the wider adoption of materials informatics in the atomistic calculations community.
A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method for partitioning the data set into groups of similar materials in order to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.