Article

Unsupervised machine learning to classify crystal structures according to their structural distortion: A case study on Li-argyrodite solid-state electrolytes

Authors:
Article

Unsupervised machine learning to classify crystal structures according to their structural distortion: A case study on Li-argyrodite solid-state electrolytes

If you want to read the PDF, try requesting it from the authors.

Abstract

High-throughput approaches in computational materials discovery often yields a combinatorial explosion that makes the exhaustive rendering of complete structural and chemical spaces impractical. A common bottleneck when screening new compounds with archetypal crystal structures is the lack of fast and reliable decision-making schemes to quantitatively classify the computed candidates as inliers or outliers (too distorted structures). Machine learning-aided workflows can solve this problem and make geometrical optimization procedures more efficient. However, for this to occur, there is still a lack of appropriate combinations of suitable geometrical descriptors and accurate unsupervised models which are capable of accurately differentiating between systems with subtle structural changes. Here, considering as a case study the compositional screening of cubic Li-argyrodites solid electrolytes, we tackle this problem head on. We find that Steinhardt order parameters are very accurate descriptors of the cubic argyrodite structure to train a range of common unsupervised outlier detection models. And, most importantly, the approach enables us to automatically classify crystal structures with uncertainty control. The resulting models can then be used to screen computed structures with respect to an user-defined error threshold and discard too distorted structures during geometrical optimization procedures. Implemented as a decision node in computer-aided materials discovery workflows, this approach can be employed to perform autonomous high-throughput screening methods and make the use of computational and data storage resources more efficient.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
BATTERY 2030+ targets the development of a chemistry neutral platform for accelerating the development of new sustainable high-performance batteries. Here, a description is given of how the AI-assisted toolkits and methodologies developed in BATTERY 2030+ can be transferred and applied to representative examples of future battery chemistries, materials, and concepts. This perspective highlights some of the main scientific and technological challenges facing emerging low-technology readiness level (TRL) battery chemistries and concepts, and specifically how the AI-assisted toolkit developed within BIG-MAP and other BATTERY 2030+ projects can be applied to resolve these. The methodological perspectives and challenges in areas like predictive long time- and length-scale simulations of multi-species systems, dynamic processes at battery interfaces, deep learned multi-scaling and explainable AI, as well as AI-assisted materials characterization, self-driving labs, closed-loop optimization, and AI for advanced sensing and self-healing are introduced. A description is given of tools and modules can be transferred to be applied to a select set of emerging low-TRL battery chemistries and concepts covering multivalent anodes, metal-sulfur/oxygen systems, non-crystalline, nano-structured and disordered systems, organic battery materials, and bulk vs. interface-limited batteries.
Article
Full-text available
This is a critical review of artificial intelligence/machine learning (AI/ML) methods applied to battery research. It aims at providing a comprehensive, authoritative, and critical, yet easily understandable, review of general interest to the battery community. It addresses the concepts, approaches, tools, outcomes, and challenges of using AI/ML as an accelerator for the design and optimization of the next generation of batteries—a current hot topic. It intends to create both accessibility of these tools to the chemistry and electrochemical energy sciences communities and completeness in terms of the different battery R&D aspects covered.
Article
Full-text available
Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the development of tools and techniques to deal with the generation, storage and processing of large amounts of data. In this work we present an in-depth look at the workflow engine powering AiiDA, a widely adopted, highly flexible and database-backed informatics infrastructure with an emphasis on data reproducibility. We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness. In short, AiiDA aims to be a Swiss army knife for high-throughput computational science. As well as the architecture, we outline important API design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, ultimately enabling them to encode their scientific knowledge to the benefit of the wider scientific community.
Article
Full-text available
Innovations in batteries can require years of experimentation for design and optimization. We report an autonomous approach to the optimization of a battery electrolyte that uses machine learning coupled to a robotic test-stand to perform hundreds of sequential experiments. We search for mixtures of salts in aqueous electrolytes with high electrochemical stability using Bayesian optimization. In 40 hours of experimentation testing for 140 electrolyte formulas, we converge on a non-intuitive optimal electrolyte. The optimum is a mixed-anion sodium electrolyte that is more stable than a benchmark electrolyte, despite lower salt content, contrary to the known design principle. The precision and repeatability of the robotic test-stand distinguishes formulations that human-guided design may have missed. Our result demonstrates the possibility of integrating robotics with machine learning to discover novel battery materials. We provide a dataset characterizing 251 aqueous electrolytes containing LiNO3, LiClO4, Li2SO4, NaNO3, NaClO4, and Na2SO4 that includes conductivities, pHs, and electrochemical responses on platinum.
Article
Full-text available
In the recent past, optoelectronic semiconductors have attracted significant research attention both experimentally and theoretically toward large‐scale applications in energy conversion, lighting, imaging, detection, and so on. With advancement in computing power and rapid development of computational algorithms, scientific community resorts to materials simulation to explore the hidden potential behind thousands of potentially unknown materials within short timeframes that the real experiments might take a long time. Within this context, the high‐throughput (HT) computational materials screening has emerged as a useful tool to accelerate materials discovery, especially in the field of optoelectronic semiconductors. One of the important consequences is the construction of a number of material databases containing wide range of functional materials with their diverse physical properties and applications. Herein, we reviewed the recent progress on HT computational screening of optoelectronic semiconductors, with focus on photovoltaic solar absorbers, photoelectrochemical cells, semiconductor light‐emitting diodes, and transparent conducting materials. We have also summarized the general workflow of HT computational screening, released workhorse models, and existing material databases. Finally, we offer perspectives for future research with a hope that this study could inspire new ideas for computational‐driven optoelectronic semiconductor discovery in the HT routine. This article is categorized under: • Structure and Mechanism > Computational Materials Science
Article
Full-text available
Selecting proper descriptors or features is one of the central problems in exploring structure–activity relationships of materials using machine learning models. The current feature selection algorithms usually require tedious hyperparameter tuning and do not actively consider the prior knowledge of domain experts about the features. Here, this work proposes a data‐driven multi‐layer feature selection method incorporating domain expert knowledge named DML‐FSdek, which is automated, with users entering training data without manual tuning of the hyperparameters. The domain expert knowledge is quantified by means of weighted scoring and integrated into the selection process to eliminate the risk of crucial features being removed. The test studies on ten material properties datasets demonstrate the potential of the approach to automatically search for a reduced feature set with lower root mean square errors than those for the initial feature set. Essentially, the most relevant material features, the number of which is much smaller than that in the original feature set, are automatically selected to establish a closer and more accurate structure–activity relationship for the materials of interest. As a result, the method represents the targeted properties of materials with a smaller and more interpretable set of features while ensuring equal or better prediction accuracy. A data‐driven multi‐layer feature selection method incorporating domain expert knowledge named DML‐FSdek is proposed, which is automated, with users entering training data without manual tuning of the hyperparameters. The DML‐FSdek represents the targeted properties of materials with a smaller and more interpretable set of features while ensuring equal or better prediction accuracy.
Article
Full-text available
The bond orientational order parameters originally introduced by Steinhardt et al. (Phys. Rev. B 28, 784 (1983)) are a common tool for local structure characterization in soft matter studies. Recently, Mickel et al. (J. Chem. Phys. 138, 044501 (2013)) highlighted problems of the bond orientational order parameters due to the ambiguity of the underlying neighbourhood definition. Here we show the difficulties to distinguish common structures like FCC- and BCC-based structures with the suggested neighbourhood definitions when noise is introduced. We propose a simple improvement to the neighbourhood definition that results in robust and continuous bond orientational order parameters with which we can accurately distinguish crystal structures even when noise is present. Graphical abstract Open image in new window
Article
Full-text available
Identifying local structure in molecular simulations is of utmost importance. The most common existing approach to identify local structure is to calculate some geometrical quantity referred to as an order parameter. In simple cases order parameters are physically intuitive and trivial to develop (e.g., ion-pair distance), however in most cases, order parameter development becomes a much more difficult endeavor (e.g., crystal structure identification). Using ideas from computer vision, we adapt a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. A primary challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This challenge is system-specific and requires significant human input and intuition. In contrast, our approach is a generic framework that requires no system-specific feature engineering and operates on the raw output of the simulations, i.e., atomic positions. We demonstrate the method on crystal structure identification in Lennard-Jones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method achieves as high as 99.5% accuracy in crystal structure identification. The method is applicable to heterogeneous nucleation and it can even predict the crystal phases of atoms near external interfaces. We demonstrate the versatility of our approach by using our method to identify surface hydrophobicity based solely upon positions and orientations of surrounding water molecules. Our results suggest the approach will be broadly applicable to many types of local structure in simulations.
Article
Full-text available
PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox's development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod.
Article
Full-text available
Understanding and controlling the complex and dynamic processes at battery interfaces holds the key to developing more durable and ultra high performance secondary batteries. Interfacial processes like dendrite and Solid Electrolyte Interphase (SEI) formation span numerous time- and length scales, and despite decades of research, their formation, composition,structure and function still pose a conundrum. Consequently, ”inverse design” of high-performance interfaces and interphases like the SEI, remains an elusive dream. Here, we present a perspective and possible blueprint for a future battery research strategy to reach this ambitious goal. Semi-supervised generative deep learning models trained on all sources of available data, i.e., extensive multi-fidelity datasets from multi-scale computer simulations and databases, operando characterization from large-scale research facilities, high-throughput synthesis and laboratory testing, need to work closely together to unlock this dream. We show how understanding and tracking different types of uncertainties in the experimental and simulation methods, as well as the machine learning framework for the generative model, is crucial for controlling and improving the fidelity in the predictive design of battery interfaces and interphases. We argue that simultaneous utilization of data from multiple domains, including data from failed experiments, will play a critical role in accelerating the development of reliable generative models to enable accelerated discovery and inverse design of durable ultra high performance batteries based on novel materials, structures and designs.
Article
Full-text available
Predicting crystal structure has always been a challenging problem for physical sciences. Recently, computational methods have been built to predict crystal structure with success but have been limited in scope and computational time. In this paper, we review computational methods such as density functional theory and machine learning methods used to predict crystal structure. We also explored the breadth versus accuracy of building a model to predict across any crystal structure using machine learning. We extracted 24,913 unique chemical formulae existing between 290 K and 310 K from the Pearson Crystal Database. Of these 24,913 formulae, there exists 10,711 unique crystal structures referred to as entry prototypes. Common entries might have hundreds of chemical compositions while the vast majority of entry prototypes are represented by fewer than ten unique compositions. To include all data in our predictions, entry prototypes that lacked a minimum number of representatives were relabeled as ‘Other’. By selecting the minimum numbers to be 150, 100, 70, 40, 20, and 10, we explored how limiting class sizes affected performance. Using each minimum number to reorganize the data, we looked at the classification performance metrics: accuracy, precision, and recall. Accuracy ranged from 97±2% to 85±2%, average precision ranged from 86±2% to 79±2%, while average recall ranged from 73±2% to 54±2% for minimum-class representatives from 150 to 10, respectively.
Article
Full-text available
An UltraScale System (USS) joins parallel and distributed computing systems that will be two to three orders of magnitude larger than today's infrastructure regarding scale, performance, the number of components and their complexity. For such systems to become a reality, however, advances must be made in HPC, large-scale distributed systems, and big data solutions, also tackling challenges such as improving the energy efficiency of the IT infrastructure. Monitoring the power consumed by underlying IT resources is essential towards optimising the manner IT resources are used and hence improve the sustainability of such systems. Nevertheless, monitoring the energy consumed by USS is a challenging endeavour as the system can comprise thousands of heterogeneous server resources spanning multiple data centres. Moreover, the amount of monitoring data, its gathering, and processing, should never become a bottleneck nor profoundly impact the energy efficiency of the overall system. This work surveys state of the art on energy monitoring of large-scale systems and methodologies for monitoring the power consumed by large systems and discusses some of the challenges to be addressed towards monitoring and improving the energy efficiency of USS. Next, we present efforts made on designing monitoring solutions. Finally, we discuss potential gaps in existing solutions when tackling emerging large-scale monitoring scenarios and present some directions for future research on the topic.
Article
Full-text available
It is of great importance to develop inorganic solid electrolytes with high ionic conductivity, thus enabling solid state Li-ion batteries to address the notorious safety issue about the current technology due to use of highly flammable liquid organic electrolytes. On the basis of systematic first principles modelling, we have formulated new inorganic electrolytes with ultra-low activation energies for long-distance diffusion of Li+ ions, through alloying in the cubic argirodite Li6PA5X chalcogenides (chalcogen A; halogen X). We find that the long-distance transportation of Li+ is dictated by interoctahedral diffusion, as the activation energy for Li+ to migrate over a Li6A octahedron is minimal. The inter-octahedral diffusion barrier for Li+ is largely dependent on the interaction with chalcogen anions in the compound. Radical reduction of diffusion barrier for Li+ ions can be realized through isovalent substitution of S using elements of lower electronegativity, together with smaller halogen ions on the X site.
Article
Full-text available
The screening of novel materials with good performance and the modelling of quantitative structure-activity relationships (QSARs), among other issues, are hot topics in the field of materials science. Traditional experiments and computational modelling often consume tremendous time and resources and are limited by their experimental conditions and theoretical foundations. Thus, it is imperative to develop a new method of accelerating the discovery and design process for novel materials. Recently, materials discovery and design using machine learning have been receiving increasing attention and have achieved great improvements in both time efficiency and prediction accuracy. In this review, we first outline the typical mode of and basic procedures for applying machine learning in materials science, and we classify and compare the main algorithms. Then, the current research status is reviewed with regard to applications of machine learning in material property prediction, in new materials discovery and for other purposes. Finally, we discuss problems related to machine learning in materials science, propose possible solutions, and forecast potential directions of future research. By directly combining computational studies with experiments, we hope to provide insight into the parameters that affect the properties of materials, thereby enabling more efficient and target-oriented research on materials discovery and design.
Article
Full-text available
The limited number of known low-band-gap photoelectrocatalytic materials poses a significant challenge for the generation of chemical fuels from sunlight. Using high-throughput ab initio theory with experiments in an integrated workflow, we find eight ternary vanadate oxide photoanodes in the target band-gap range (1.2–2.8 eV). Detailed analysis of these vanadate compounds reveals the key role of VO_4 structural motifs and electronic band-edge character in efficient photoanodes, initiating a genome for such materials and paving the way for a broadly applicable high-throughput-discovery and materials-by-design feedback loop. Considerably expanding the number of known photoelectrocatalysts for water oxidation, our study establishes ternary metal vanadates as a prolific class of photoanode materials for generation of chemical fuels from sunlight and demonstrates our high-throughput theory–experiment pipeline as a prolific approach to materials discovery.
Conference Paper
Full-text available
In data mining, anomaly detection aims to identify the data samples that do not conform to an expected behavior. Anomaly detection has successfully been applied to many real world applications such as fraud detection for credit cards and intrusion detection in security. However, there are very little research on using anomaly detection techniques to detect cheating in online games. In this paper, we present an empirical study of anomaly detection in online games. Four unsupervised anomaly detection techniques were used to detect abnormal players. A method for evaluating the performance these detection techniques was introduced and analysed. The experiments were conducted on one artificial dataset and two real online games at VNG company. The results show the good capability of detection techniques used in this paper in detecting abnormal players in online games.
Article
Full-text available
The modelling of materials properties and processes from first principles is becoming sufficiently accurate as to facilitate the design and testing of new systems in silico. Computational materials science is both valuable and increasingly necessary for developing novel functional materials and composites that meet the requirements of next-generation technology. A range of simulation techniques are being developed and applied to problems related to materials for energy generation, storage and conversion including solar cells, nuclear reactors, batteries, fuel cells, and catalytic systems. Such techniques may combine crystal-structure prediction (global optimisation), data mining (materials informatics) and high-throughput screening with elements of machine learning. We explore the development process associated with computational materials design, from setting the requirements and descriptors to the development and testing of new materials. As a case study, we critically review progress in the fields of thermoelectrics and photovoltaics, including the simulation of lattice thermal conductivity and the search for Pb-free hybrid halide perovskites. Finally, a number of universal chemical-design principles are advanced.
Article
Full-text available
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.
Article
Full-text available
In the search for new functional materials, quantum mechanics is an exciting starting point. The fundamental laws that govern the behaviour of electrons have the possibility, at the other end of the scale, to predict the performance of a material for a targeted application. In some cases, this is achievable using density functional theory (DFT). In this Review, we highlight DFT studies predicting energy-related materials that were subsequently confirmed experimentally. The attributes and limitations of DFT for the computational design of materials for lithium-ion batteries, hydrogen production and storage materials, superconductors, photovoltaics and thermoelectric materials are discussed. In the future, we expect that the accuracy of DFT-based methods will continue to improve and that growth in computing power will enable millions of materials to be virtually screened for specific applications. Thus, these examples represent a first glimpse of what may become a routine and integral step in materials discovery.
Article
Full-text available
Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.
Article
Full-text available
A centroidal Voronoi tessellation is a Voronoi tessellation whose generating points are the centroids (centers of mass) of the corresponding Voronoi regions. We give some applica- tions of such tessellations to problems in image compression, quadrature, finite difference methods, distribution of resources, cellular biology, statistics, and the territorial behavior of animals. We discuss methods for computing these tessellations, provide some analyses concerning both the tessellations and the methods for their determination, and, finally, present the results of some numerical experiments.
Article
Full-text available
Bond-orientational order in molecular-dynamics simulations of supercooled liquids and in models of metallic glasses is studied. Quadratic and third-order invariants formed from bond spherical harmonics allow quantitative measures of cluster symmetries in these systems. A state with short-range translational order, but extended correlations in the orientations of particle clusters, starts to develop about 10% below the equilibrium melting temperature in a supercooled Lennard-Jones liquid. The order is predominantly icosahedral, although there is also a cubic component which we attribute to the periodic boundary conditions. Results are obtained for liquids cooled in an icosahedral pair potential as well. Only a modest amount of orientational order appears in a relaxed Finney dense-random-packing model. In contrast, we find essentially perfect icosahedral bond correlations in alternative "amorphon" cluster models of glass structure.
Article
Identifying descriptors linked to Li⁺ conduction enables rational design of solid state electrolytes (SSEs) for advanced lithium ion batteries, but it is hindered by the diverse and confounding descriptors. To address this, by integrating global and local effects of Li⁺ conduction environment, we develop a generic method of hierarchically encoding crystal structure (HECS) and inferring causality to identify descriptors for Li⁺ conduction in SSEs. Taking the cubic Li-argyrodites as an example, 32 HECS-descriptors are constructed, encompassing composition, structure, conduction pathway, ion distribution, and special ions derived from the unit cell information. Partial correlation analysis reveals that the smaller anion size plays a significant role in achieving lower activation energy, which results from the competing effects between the lattice space and bottleneck size controlled by framework site disorder. Moreover, the promising candidates are suggested, in which Li6-xPS5-xCl1+x (e.g., Li5.5PS4.5Cl1.5 with the room ionic conductivity of 9.4mS cm⁻¹ and the activation energy of 0.29eV) have been experimentally evaluated as excellent candidates for practical SSEs and the rest are novel compositions waiting for validation. Our work establishes a rational correlation between the HECS-descriptors and Li⁺ conduction and the proposed approach can be extended to other types of SSE materials.
Article
The analysis of the structural formation of colloidal systems using machine learning techniques has recently attracted much attention. In many of these studies, local bond-order parameters (LBOPs) were employed as descriptors, where such LBOPs are suitable mainly for the detection of crystal structures. On the other hand, image-based convolutional neural networks (CNNs) are quite effective in detecting not only crystals but also random structures, and the author demonstrated their efficiency in a previous paper. However, in supervised learning, it is difficult to obtain a correct result when there is an unexpected new phase that was unknown when training the CNN. In this paper, we propose a hybrid scheme that consists of supervised and unsupervised learning techniques, employing two different approaches: image-based CNN and generalized LBOP (gLBOP). The proposed method was applied to two-dimensional colloidal systems, and its efficiency was demonstrated.
Article
Identifying molecular structures of water and ice helps reveal the chemical nature of liquid and solid water. Real-space geometrical information on molecular systems can be precisely obtained from molecular simulations, but classifying the resulting structure is a non-trivial task. Order parameters are ordinarily introduced to effectively distinguish different structures. Many order parameters have been developed for various kinds of structures, such as body-centered cubic, face-centered cubic, hexagonal close-packed, and liquid. Order parameters for water have also been suggested but need further study. There has been no thorough investigation of the classification capability of many existing order parameters. In this work, we investigate the capability of 493 order parameters to classify the three structures of ice: Ih, Ic, and liquid. A total of 159 767 496 combinations of the order parameters are also considered. The investigation is automatically and systematically performed by machine learning. We find the best set of two bond-orientational order parameters, Q4 and Q8, to distinguish the three structures with high accuracy and robustness. A set of three order parameters is also suggested for better accuracy.
Article
I present a strategy for unsupervised manifold learning on local atomic environments in molecular simulations based on simple rotation- and permutation-invariant three-body features. These features are highly descriptive, generalize to multiple chemical species, and are human-interpretable. The low-dimensional embeddings of each atomic environment can be used to understand and quantify messy crystal structures such as those near interfaces and defects or well-ordered crystal lattices such as in bulk materials without modification. The same method can also yield collective variables describing collections of particles such as for an entire simulation domain. I demonstrate the method on colloidal crystallization, ice crystals, and binary mesophases to illustrate its broad applicability. In each case, the learned latent space yields insights into the details of the observed microstructures. For ices and mesophases, supervised classifiers are trained based on the learned manifolds and directly compared against a recent neural-network-based approach. Notably, while this method provides comparable classification performance, it can also be deployed on even a handful of observed environments without labels or a priori knowledge. Thus, the current approach provides an incredibly versatile strategy to characterize and classify local atomic environments, and may unlock insights in a wide variety of molecular simulation contexts.
Article
Rational design of solid-state electrolytes (SSEs) with high ionic conductivity and low activation energy (Ea) is vital for all solid-state batteries. Machine learning (ML) techniques have recently been successful in predicting Li⁺ conduction property in SSEs with various descriptors and accelerating the development of SSEs. In this work, we extend the previous efforts and introduce a framework of ML prediction for Ea in SSEs with hierarchically encoding crystal structure-based (HECS) descriptors. Taking cubic Li-argyrodites as an example, an Ea prediction model is developed to the coefficient of determination (R²) and root-mean-square error (RMSE) values of 0.887 and 0.02 eV for training dataset, and 0.820 and 0.02 eV for test dataset, respectively by partial least squares (PLS) analysis, proving the prediction power of HECS-descriptors. The variable importance in projection (VIP) scores demonstrate the combined effects of the global and local Li⁺ conduction environments, especially the anion size and the resultant structural changes associated with anion site disorder. The developed Ea prediction model directs us to optimize and design new Li-argyrodites with lower Ea, such as Li6–xPS5–xCl1+x (<0.322 eV), Li6+xPS5+xBr1–x (<0.273 eV), Li6+xPS5+xBr0.25I0.75–x (<0.352 eV), Li6+(5–n)yP1–yNyS5I (<0.420 eV), Li6+(5–n)yAs1–yNyS5I (<0.371 eV), Li6+(5–n)yAs1–yNySe5I (<0.450 eV), by broadening bottleneck size, invoking site disorder and activating concerted Li⁺ conduction. This analysis shows great potential in promoting rational design of advanced SSEs and the same approach can be applied to other types of materials.
Article
Machine learning plays an important role in accelerating the discovery and design process for novel electrochemical energy storage materials. This review aims to provide the state-of-the-art and prospects of machine learning for the design of rechargeable battery materials. After illustrating the key concepts of machine learning and basic procedures for applying machine learning in rechargeable battery materials science, we focus on how to obtain the most important features from the specific physical, chemical and/or other properties of material by using wrapper feature selection method, embedded feature selection method, and the combination of these two methods. And then, the applications of machine learning in rechargeable battery materials design and discovery are reviewed, including the property prediction for liquid electrolytes, solid electrolytes, electrode materials, and the discovery of novel rechargeable battery materials through component prediction and structure prediction. More importantly, we discuss the key challenges related to machine learning in rechargeable battery materials science, including the contradiction between high dimension and small sample, the conflict between the complexity and accuracy of machine learning models, and the inconsistency between learning results and domain expert knowledge. In response to these challenges, we propose possible countermeasures and forecast potential directions of future research. This review is expected to shed light on machine learning in rechargeable battery materials design and property optimization.
Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain, this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of outlier detection results. Specifically, we describe an index called Internal, Relative Evaluation of Outlier Solutions (IREOS) that can evaluate and compare different candidate outlier detection solutions. Initially, the index is designed to evaluate binary solutions only, referred to as top - n outlier detection results. We then extend IREOS to the general case of non-binary solutions, consisting of outlier detection scorings. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real datasets.
Article
The freud Python package is a library for analyzing simulation data. Written with modern simulation and data analysis workflows in mind, freud provides a Python interface to fast, parallelized C++ routines that run efficiently on laptops, workstations, and supercomputing clusters. The package provides the core tools for finding particle neighbors in periodic systems, and offers a uniform API to a wide variety of methods implemented using these tools. As such, freud users can access standard methods such as the radial distribution function as well as newer, more specialized methods such as the potential of mean force and torque and local crystal environment analysis with equal ease. Rather than providing its own trajectory data structure, freud operates either directly on NumPy arrays or on trajectory data structures provided by other Python packages. This design allows freud to transparently interface with many trajectory file formats by leveraging the file parsing abilities of other trajectory management tools. By remaining agnostic to its data source, freud is suitable for analyzing any particle simulation, regardless of the original data representation or simulation method. When used for on-the-fly analysis in conjunction with scriptable simulation software such as HOOMD-blue, freud enables smart simulations that adapt to the current state of the system, allowing users to study phenomena such as nucleation and growth. Program summary Program Title: freud Program Files doi: http://dx.doi.org/10.17632/v7wmv9xcct.1 Licensing provisions: BSD 3-Clause Programming language: Python, C++ Nature of problem: Simulations of coarse-grained, nano-scale, and colloidal particle systems typically require analyses specialized to a particular system. Certain more standardized techniques – including correlation functions, order parameters, and clustering – are computationally intensive tasks that must be carefully implemented to scale to the larger systems common in modern simulations. Solution method: freud performs a wide variety of particle system analyses, offering a Python API that interfaces with many other tools in computational molecular sciences via NumPy array inputs and outputs. The algorithms in freud leverage parallelized C++ to scale to large systems and enable real-time analysis. The library’s broad set of features encode few assumptions compared to other analysis packages, enabling analysis of a broader class of data ranging from biomolecular simulations to colloidal experiments. Additional comments including restrictions and unusual features: 1.freud provides very fast, parallel implementations of standard analysis methods like RDFs and correlation functions. 2.freud includes the reference implementation for the potential of mean force and torque (PMFT). 3.freud provides various novel methods for characterizing particle environments, including the calculation of descriptors useful for machine learning. The source code is hosted on GitHub (https://github.com/glotzerlab/freud), and documentation is available online (https://freud.readthedocs.io/). The package may be installed via pip install freud-analysis or conda install -c conda-forge freud.
Article
We introduce a simple, fast, and easy to implement unsupervised learning algorithm for detecting different local environments on a single-particle level in colloidal systems. In this algorithm, we use a vector of standard bond-orientational order parameters to describe the local environment of each particle. We then use a neural-network-based autoencoder combined with Gaussian mixture models in order to autonomously group together similar environments. We test the performance of the method on snapshots of a wide variety of colloidal systems obtained via computer simulations, ranging from simple isotropically interacting systems to binary mixtures, and even anisotropic hard cubes. Additionally, we look at a variety of common self-assembled situations such as fluid-crystal and crystal-crystal coexistences, grain boundaries, and nucleation. In all cases, we are able to identify the relevant local environments to a similar precision as “standard,” manually tuned, and system-specific, order parameters. In addition to classifying such environments, we also use the trained autoencoder in order to determine the most relevant bond orientational order parameters in the systems analyzed.
Article
Achieving the 2016 Paris agreement goal of limiting global warming below 2 °C and securing sustainable energy future require materials innovations in renewable energy technologies. While the window of opportunity is closing, meeting these goals necessitate deploying new research concepts and strategies to accelerate materials discovery. Recent advancements in machine learning have offered the science and engineering community with a flexible and rapid prediction framework, making a tremendous impact. Here we summarize the recent progress in machine learning approaches for developing renewable energy materials. We summarize applications of machine learning methods for the theoretical approaches in the key renewable energy technologies including catalysis, battery, solar cell, and crystal discovery. We also analyze notable applications resulting in significant discovery and discuss critical gaps to further accelerate materials discovery.
Article
Progress in the discovery of new materials has been accelerated by the development of reliable quantum-mechanical approaches to crystal structure prediction. The properties of a material depend very sensitively on its structure; therefore, structure prediction is the key to computational materials discovery. Structure prediction was considered to be a formidable problem, but the development of new computational tools has allowed the structures of many new and increasingly complex materials to be anticipated. These widely applicable methods, based on global optimization and relying on little or no empirical knowledge, have been used to study crystalline structures, point defects, surfaces and interfaces. In this Review, we discuss structure prediction methods, examining their potential for the study of different materials systems, and present examples of computationally driven discoveries of new materials — including superhard materials, superconductors and organic materials — that will enable new technologies. Advances in first-principle structure predictions also lead to a better understanding of physical and chemical phenomena in materials.
Article
In this paper we introduce OpenEnsembles, a Python toolkit for performing and analyzing ensemble clustering. Ensemble clustering is the process of creating many clustering solutions for a given dataset and utilizing the relationships observed across the ensemble to identify final solutions, which are more robust, stable or better than the individual solutions within the ensemble. The OpenEnsembles library provides a unified interface for applying transformations to data, clustering data, visualizing individual clustering solutions, visualizing and finishing the ensemble, and calculating validation metrics for a clustering solution for any given partitioning of the data. We have documented examples of using OpenEnsembles to create, analyze, and visualize a number of different types of ensemble approaches on toy and example datasets. OpenEnsembles is released under the GNU General Public License version 3, can be installed via Conda or the Python Package Index (pip), and is available at https://github.com/NaegleLab/OpenEnsembles. © 2018 Tom Ronan, Shawn Anastasio, Zhijie Qi, Pedro Henrique S. Vieira Tavares, Roman Sloutsky, and Kristen M. Naegle.
Article
The discovery and development of novel materials in the field of energy are essential to accelerate the transition to a low-carbon economy. Bringing recent technological innovations in automation, robotics and computer science together with current approaches in chemistry, materials synthesis and characterization will act as a catalyst for revolutionizing traditional research and development in both industry and academia. This Perspective provides a vision for an integrated artificial intelligence approach towards autonomous materials discovery, which, in our opinion, will emerge within the next 5 to 10 years. The approach we discuss requires the integration of the following tools, which have already seen substantial development to date: high-throughput virtual screening, automated synthesis planning, automated laboratories and machine learning algorithms. In addition to reducing the time to deployment of new materials by an order of magnitude, this integrated approach is expected to lower the cost associated with the initial discovery. Thus, the price of the final products (for example, solar panels, batteries and electric vehicles) will also decrease. This in turn will enable industries and governments to meet more ambitious targets in terms of reducing greenhouse gas emissions at a faster pace.
Article
All-solid-state (ASS) lithium-ion battery has attracted great attention due to its high safety and increased energy density. One of key components in the ASS battery (ASSB) is solid electrolyte that determines performance of the ASSB. Many types of solid electrolytes have been investigated in great detail in the past years, including NASICON-type, garnet-type, perovskite-type, LISICON-type, LiPON-type, Li3N-type, sulfide-type, argyrodite-type, anti-perovskite-type and many more. This paper aims to provide comprehensive reviews on some typical types of key solid electrolytes and some ASSBs, and on gaps that should be resolved.
Article
Solid-state chemists have been consistently successful in envisioning and making new compounds, often enlisting the tools of theoretical solid-state physics to explain some of the observed properties of the new materials. Here, a new style of collaboration between theory and experiment is discussed, whereby the desired functionality of the new material is declared first and theoretical calculations are then used to predict which stable and synthesizable compounds exhibit the required functionality. Subsequent iterative feedback cycles of prediction–synthesis–characterization result in improved predictions and promise not only to accelerate the discovery of new materials but also to enable the targeted design of materials with desired functionalities via such inverse design.
Article
Vertical hetero-structures made from stacked monolayers of transition metal dichalcogenides (TMDC) are promising candidates for next-generation optoelectronic and thermoelectric devices. Identification of optimal layered materials for these applications requires the calculation of several physical properties, including electronic band structure and thermal transport coefficients. However, exhaustive screening of the material structure space using ab initio calculations is currently outside the bounds of existing computational resources. Furthermore, the functional form of how the physical properties relate to the structure is unknown, making gradient-based optimization unsuitable. Here, we present a model based on the Bayesian optimization technique to optimize layered TMDC hetero-structures, performing a minimal number of structure calculations. We use the electronic band gap and thermoelectric figure of merit as representative physical properties for optimization. The electronic band structure calculations were performed within the Materials Project framework, while thermoelectric properties were computed with BoltzTraP. With high probability, the Bayesian optimization process is able to discover the optimal hetero-structure after evaluation of only ∼20% of all possible 3-layered structures. In addition, we have used a Gaussian regression model to predict not only the band gap but also the valence band maximum and conduction band minimum energies as a function of the momentum.
Article
As computers get faster, researchers -- not hardware or algorithms -- become the bottleneck in scientific discovery. Computational study of colloidal self-assembly is one area that is keenly affected: even after computers generate massive amounts of raw data, performing an exhaustive search to determine what (if any) ordered structures occur in a large parameter space of many simulations can be excruciating. We demonstrate how machine learning can be applied to discover interesting areas of parameter space in colloidal self assembly. We create numerical fingerprints -- inspired by bond orientational order diagrams -- of structures found in self-assembly studies and use these descriptors to both find interesting regions in a phase diagram and identify characteristic local environments in simulations in an automated manner for simple and complex crystal structures. Utilizing these methods allows analysis methods to keep up with the data generation ability of modern high-throughput computing environments.
Article
A method is given for generating sets of special points in the Brillouin zone which provides an efficient means of integrating periodic functions of the wave vector. The integration can be over the entire Brillouin zone or over specified portions thereof. This method also has applications in spectral and density-of-state calculations. The relationships to the Chadi-Cohen and Gilat-Raubenheimer methods are indicated.
Article
This paper introduces FireWorks, a workflow software for running high-throughput calculation workflows at supercomputing centers. FireWorks has been used to complete over 50 million CPU-hours worth of computational chemistry and materials science calculations at the National Energy Research Supercomputing Center. It has been designed to serve the demanding high-throughput computing needs of these applications, with extensive support for (i) concurrent execution through job packing, (ii) failure detection and correction, (iii) provenance and reporting for long-running projects, (iv) automated duplicate detection, and (v) dynamic workflows (i.e., modifying the workflow graph during runtime). We have found that these features are highly relevant to enabling modern data-driven and high-throughput science applications, and we discuss our implementation strategy that rests on Python and NoSQL databases (MongoDB). Finally, we present performance data and limitations of our approach along with planned future work.
Article
Typically, computational screens for new materials sharply constrain the compositional search space, structural search space, or both, for the sake of tractability. To lift these constraints, we construct a machine learning model from a database of thousands of density functional theory (DFT) calculations. The resulting model can predict the thermodynamic stability of arbitrary compositions without any other input and with six orders of magnitude less computer time than DFT. We use this model to scan roughly 1.6 million candidate compositions for novel ternary compounds (AxByCz), and predict 4500 new stable materials. Our method can be readily applied to other descriptors of interest to accelerate domain-specific materials discovery.
Article
Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.
Article
In his Bakerian Lecture, Bernal (1964) discussed those ideas of restricted irregularity which are physically realized in random packings of equal hard spheres, with particular reference to the structure of simple liquids. He stressed the need for a science of 'statistical geometry', and took the first steps himself by proposing possible ways of describing such arrays. In this paper, these and other associated ideas are briefly described, and extended by deriving an equivalent set of polyhedral subunits essentially inverse to the packing in real space. Examination of two independent high density arrays demonstrates the reproducibility of certain metrical and topological properties of these polyhedra, and their correlations over larger elements of volume. As a result, several possible 'descriptive parameters' are proposed. Although these essentially 'numerical' characteristics facilitate sensitive structural descriptions of any assembly of micro- and macroscopic subunits, we are still unable to characterize an irregular array in formal mathematical terms. Such a formulation of statistical geometry could be a powerful tool for tackling important problems in many branches of science and engineering.
Article
With the use of the molecular dynamics method the crystallization process from supercooled fluid states is studied for the soft-core system of the pair potential φ(r) = ε(σ/ r)12, which has a simple property to characterize the relaxation towards crystalline states. The Voronoi polyhedron is introduced to examine local atomic configurations from topological point of view. Certain classes of polyhedra well characterize various phases, i.e., fluid, and bcc and fcc solids. The final relaxed state becomes a bcc crystalline state, when the system relaxes incompletely, while it becomes an fcc when the system relaxes perfectly. A unified way of defining a nucleus during the both crystallization processes is proposed. Growth of the nucleus suffers the effect of the periodic boundary condition imposed on the system.