Content uploaded by Manoj Kumar Deka
Author content
All content in this area was uploaded by Manoj Kumar Deka on Nov 13, 2023
Content may be subject to copyright.
ARTIFICIAL INTELLIGENCE IN MATERIAL ENGINEERING
A PREPRINT
Lipichanda Goswami1, Manoj kumar Deka1, and Mohendra Roy2,*
1Department of Computer Science and Technology, Bodoland University, BTC, Assam, India
2Department of Information and Communication Technology, Pandit Deendayal Energy University, Gandhinagar
382007, India
*Corresponding author: mohendra.roy@ieee.org
April 28, 2023
Keywords
Artificial Intelligence, Density Functional Theory, Material Engineering, Deep Learning, Graph Neural
Network,
Note: This article is available in the Journal of Advanced Engineering Materials from 11 April 2023.
ABS TRAC T
Abstract:
The role of artificial intelligence (AI) in material science and engineering (MSE) is
becoming increasingly important as AI technology advances. The development of high-performance
computing has made it possible to test deep learning (DL) models with significant parameters,
providing an opportunity to overcome the limitation of traditional computational methods, such as
density functional theory (DFT), in property prediction. Machine Learning (ML) based methods are
faster and more accurate than DFT based methods. Furthermore, the generative adversarial networks
(GANs ) has facilitated the generation of chemical compositions of inorganic materials without
using crystal structure information. These developments have significantly impacted the material
engineering (ME) and research. Some of the latest developments in AI in ME herein are reviewed.
First, the development of AI in the critical areas of ME, such as in material processing, the study of
structure and material property, and measuring the performance of materials in various aspects, is
discussed. Then, the significant methods of AI and their uses in MSE, such as graph neural network,
generative models, transfer of learning etc are discussed. The use of AI to analyze the results from
existing analytical instruments is also discussed. Finally, AI’s advantages, disadvantages, and future
in ME are discussed.
1 Introduction:
Material Science and Engineering (MSE) is mainly concerned with four characteristics of a material. These are
processing, structure, property, and performance. The key to material engineering lies in the interrelation of these
four characteristics. In short, the combinations of processing, structure, property, and performance are key in material
engineering.[
1
] Here the structure represents the atomic arrangements of the material. Performance defines how well
the material plays its role in a particular task. Properties like hardness/softness, the density of the particles, fracture
toughness, resistivity, and thermal expansion are determined by the structure. These properties can be engineered by
adopting appropriate processing methods. Here, the processing is a series of steps that are involved to convert a material
to some useful form by tweaking the properties of the material. Engineered materials such as metals, polymers, liquid
crystals, and composites are widely used in the fields such as medicine, energy, manufacturing, biotechnology, etc.
Therefore, MSE is an emerging area applicable in a variety of materials for multiple disciples like medical science,
biotechnology, nanotechnology, drug discovery, energy storage materials, etc.
Some of the traditional techniques used in MSE are i) Density Functional Theory (DFT) which is a simulation method
that uses the quantum mechanical laws to find out the electrical properties of atoms, molecules, and solids. [
2
]. ii)
Density Functional Perturbation Theory (DFPT) where the quantum system is studied in small perturbation mostly
arXiv:2209.11234v3 [cs.LG] 27 Apr 2023
Artificial Intelligence in Material Engineering A PREPRINT
used in calculating vibrational energies of phonon that can further be used to find out the physical properties. [
3
] iii)
Classical force-field inspired descriptors (CFID)that represent the chemistry-structure-charge data of a material. But
these traditional methods take a considerable amount of time in processing and analysis. Also, these can not be applied
in all types of structures[4].
With years of study and experimentation on the conventional methods of property prediction, such as the empirical
trial-error method, and density functional theory (DFT), researchers have collected huge data in the field of MSE. These
Big data may help in designing a data-driven approach for MSE. In this regard, Artificial Intelligence (AI) can play a
major role. AI is an area of computer science that leads a system to learn from data and improves performance in every
subsequent iteration. The learning process starts with the observation of data to find out the meaningful features to
attain the set objectives. In the last few years, with the increasing experimental and simulation-based dataset, AI and
machine learning(ML) have been widely used to gain a deeper insight into the material.
1.1 A Brief discussion on existing literature on AI in material engineering:
A handful of reviews have already been made in the context of applications of AI in material engineering. For example,
Kamal Choudhary et al. have discussed the available ML techniques and their libraries [
5
], Chi Chen et al. have
discussed the ML methods that are specifically used for energy materials [
6
], Jonathan Schmid et al. have discussed
the ML algorithms for crystal structure prediction [
7
] and Valentin Stanev et al. have discussed the AI models that are
used in quantum materials.[
8
]. Daniel P Tabor and co-authors provided a fruitful discussion on discoveries in the clean
energy sector along with the state-of-the-art procedures for organic, inorganic, and nanomaterials [
9
]. Table 4 of the
supplementary document contains the details of models that are built particularly in the field of organic, inorganic,
energy storage, drug and pharmaceutical, and biomaterials. In the same work, high throughput virtual screening, genetic
algorithms in the synthesis of catalysts, and ML algorithms in perovskite synthesis for photovoltaic are discussed.
The challenges that are still found in the automatic synthesis of inorganic and organic molecules, the autonomous
laboratories fueled by AI models for chemical synthesis, the techniques used for automatic and rapid characterization of
materials, the use of autonomous robots in the laboratories for speeding up the experimentation are also mentioned
with state-of-the-art works. The use of generative and discriminative neural networks in photonic devices, the types of
datasets particularly used for electromagnetics, and some of the dimensionality reduction techniques are elaborately
discussed by Jiaqi Jiang et al.[
10
]. Mohit Pandey et al. have discussed the use of Graphics Processing Units (GPU),
Deep Generative Networks, and transfer learning models have shown significant acceleration in the field of drug
discovery [
11
]. The GPU-based systems can reduce the computational cost to a great extent in comparison to the CPU
in the simulation of molecular dynamics. GPU-specific quantum chemistry codes such as TeraChem are developed for
simulating the entire protein structure using DFT.
1.2 AI in structural, elemental, electronic and thermal, di-electric and mechanical property prediction:
Extending the existing review of AI in MSE, we focused on AI and ML methods that are developed for various
material types like organic, inorganic, energy-storage, bio, and pharmaceutical materials to predict Stoichiometric,
electronic, elemental, ionic, optical, structural, and thermal properties. Recently, the Scanning Tunneling Microscope
(STM) images have been used in convolutional neural networks (JARVIS-STMNet) to classify the structure of Bravais-
lattices. This classification is used for phase identification, information extraction from poor resolution, etc. [
12
]. ML
models designed using Gradient Boosted Trees (GBT) algorithm has found to be superior as compared to many other
classification models in predicting the topological structure of materials .[
13
]. The famous Random Forest algorithm
is used to find out the critical temperature (Tc) value of superconductivity.[
14
] Again graph neural network named
Atomistic Line Graph Neural Network (ALIGNN) is used by Kamal Choudhary and his team to predict the structural
and electronic properties and also for the quantities like adsorption isotherm of
CO2
for various pressure[
15
]. A
schematic of the ALIGNN is shown in Fig 1. The same model is used to train the Density of Space Spectra (DOS) in
two different representations, such as discretized representation and a low-dimensional representation of the crystalline
materials, by training two models, AE-ALIGNN and D-ALIGNN separately.[
16
] The derived DOS is helpful in gaining
a deeper insight into the electronic properties of the materials and their relationship with ingredient species. For the
electronic property prediction, a recent deep learning (DL) model ElemNet is used by leveraging the concept of transfer
learning (see the transfer learning architecture in figure 6 ). The stability of a compound (by predicting the formation
energy) could be successfully determined for both DFT computed dataset and the experimental-based smaller dataset.
The model is also applicable in predicting thermal, mechanical, and magnetic properties, which are expensive to
calculate experimentally.[
17
] In a modified version of the ElemNet model, the concept of cross-property deep transfer
learning is incorporated for predicting the electronic properties.[
18
]. The Localized Gaussian Process Regression
(L-GPR) model that uses a smaller dataset is applied to screen the materials based on formation energy.[
19
] For the
same task, Crystal Graph Convolutional Network (CGNN) with slight modification in the convolutional layer has shown
improved accuracy compared to DFT-based methods. The modification is done by ignoring the difference in interaction
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 2
Artificial Intelligence in Material Engineering A PREPRINT
strength between neighbor nodes in the convolutional layer of the network. The same framework can also predict
electrical and physical properties with improved accuracy.[
20
] Again, Embedding the concept of multi-task learning in
the CGCNN model, the relative stability of different materials based on their formation energy and classification task
like metal/non-metal based on the acquired bandgap is carried out by Soumya Sanya et al. [
21
]. Another recent model
developed by Mohammadreza Karamad et al. [
22
] is the Orbital Graph Convolutional Neural Network (OGCNN) that
uses the orbital field matrix (OFM) descriptor, a data representation method taken from the one-hot vector concept of
natural language processing. Here, the representation of an atom is embedded with the orbital-orbital interaction of
atoms and the long-range interactions in the local structure. The incorporation of OFM descriptor in the GCNN has
improved prediction accuracy for electronic properties like bandgap, Fermi Energy, and Formation energy as compared
to state-of-the-art GCNN models. A description of these models is given in detail in Table 2 of Supplementary materials.
In [
23
], the existing SchNet model is extended by including an edge update network because of which the hidden
state of receiving atom is responsible for the information interchange among the atoms. This extension improved the
accuracy in formation energy prediction. Atom2Vec model is designed for the prediction of the formation energy
of elpasolite crystals that are used for radiation detection.[
24
] The CGCNN and Materials Graph Neural Network
(MegNet) are used in predicting magnetic moment and formation energy. Though the models performed well in
predicting formation energy, they showed poor performance in the magnetization of data.[
25
] For bandgap prediction of
a class of complex oxide, Double perovskite structure, a statistical learning model, Kernel ridge regression (KRR), is
cross-validated by the Linear Least Square fit (LLSF) model. These models have given more profound insights into
double perovskite structure.[
26
] But the KRR model is limited to a considerable chemical space and the nonmagnetic
perovskite
AA0BB0O6
. DARWIN: Deep Adaptive Regressive Weighted Intelligent Network designed in the work of
[
27
] uses Graph Convolutional Neural Network(GCNN) with some edge attributes for learning from a smaller dataset.
The model has been successfully used to predict UV materials’ bandgap and the energy to estimate structural stability.
The DeeperGATGNN is designed using a global attention graph neural network with the inclusion of residual skip
connection and differentiable group normalization to allow the network to go deeper. The model showed outstanding
performance in bandgap prediction by increasing the hidden layers above 20, whereas other state-of-the-art models tend
to crash with more hidden layers. It is also free from overfitting.[
28
] In addition to property prediction, uncertainty
evaluation of ML models in order to determine the trustworthiness of computed data is carried out in [
29
]. Here three
different approaches are used. The first one is the GBDT method with quantile loss function, defined as in equation (1).
L(xp
i, xi) = max[q(xi, xp
i),(q−1) ∗(xi−xp
i)] (1)
where q is the quantile that gives a value to a group for the observations to fall within that value.
xp
i
is the prediction
and
xi
is the outcome. In the second approach, GBDT is used as a base model for the prediction of property, and
Gaussian Processes (GP) is used as an error model for finding the prediction intervals. In the third approach, GP is used
for determining the uncertainty of the trained model. These models are used for electrical, energetic, mechanical, and
optical properties, and the obtained results are compared with each other. The recent progress of ML models has paved
the way for solving a basic problem of quantum mechanics: the Schrodinger equation (SE), defined as
Hψ =Eψ (2)
The Kernel Ridge Regression Model is used in this task that can establish a non-linear relationship between the
atomization energy of a molecule and its characteristic, thereby solving the molecular SE. The trained model is
successful in finding new molecular systems with different geometry and composition.[
30
] The concept of the descriptor
’bag of words, used in natural language processing, performs the encoding of the frequency of a particular word in
a text, which is mimicked in the work of Hansen et al. [
31
] by encoding the interatomic distances in Bag of Bond
(BoB) descriptor in the chemical compound space. The model performed well in atomization energy and total energy
prediction of molecules. The ALIGNN model in [
32
] is modified by incorporating the angular information of the atoms
in the line graph to get a better atomic structure and is applied for various electronic and molecular properties of the
DFT computed dataset.[32].
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 3
Artificial Intelligence in Material Engineering A PREPRINT
Figure 1: Two of the mostly used Graph Neural Networks in the literature. a)Schematic of the CGCNN model: The crystal structure is converted to a crystal graph by
taking atoms as nodes and atomic bonds as edges from the unit cell. The nodes of the graph go through R convolution layers and L1 hidden layers to produce a resulting
graph that considers the local environment of each of the atoms. After the pooling layer, a vector of the whole graph is produced and sent to L2 hidden layer for further
processing. The L2 layer then produces the predicted property as output.
b) Schematic of the ALIGNN model. The graph on the left is the bond graph of a crystal structure. The nodes of the graph are analogous to the atoms, and the edges are
analogous to interatomic bonds. From this graph, another graph, L(g) (right), known as the line graph, is derived by considering the edges of the bond graph as nodes
and the interatomic bond pair or the triplet of atoms as edges. Message passing is performed between the bond graph and the line graph in the convolution layer.
Molecular properties are also predicted in [
33
] by Hierarchically Interacting Particle Neural Network (HIP-NN) that
uses Linear regression for modeling the local hierarchical energies and Adaptive Moment Estimation (Adam) algorithm
for training the model.
For the prediction of dielectric and mechanical properties like Born-effective charge (BEC) tensor, piezoelectric
(PZ) tensor, and IR frequency, a research group in [
34
] have used structural descriptors such as Classical force-field
inspired descriptors (CFID) and ML models based on gradient-boosting decision tree (GBDT) for both classification and
regression task. The regression model yields good accuracy in finding the highest infrared frequency and maximum BEC,
whereas the classification models are used for classifying high PZ and dielectric materials. (CFIDs) and (GBDT) are also
used to speed up the screening process in predicting magnetic properties of materials.[
35
] A recently developed model,
deeper graph neural networks (deeperGATGNN), that uses ResNet structure and differentiable group normalization in
the graph attention layers can learn the relationship between crystal structure and their vibrational energy. The accuracy
of the model was suitable for rhombohedral crystals, but it reported high MAE for cubic structures while training on
mixed samples revealing the low structural transferability of the model.[36]
1.3 AI in energy material engineering:
A handful of works in energy materials have also been carried out in recent years. Linear regression, Reduced error
pruning (REP) tree, Rotation forest+REP tree, and Random subspace+REP tree are explored for predicting band gap of
solar cells and metallic glass alloys, giving remarkable accuracy. In this work, a diverse set of descriptors are generated
by creating Stoichiometric attributes, electronic structure attributes, elemental property statistics, and Ionic compound
attributes.[
37
] The Gradient Boosting Decision Tree (GBDT) is also used as a binary classification model to find out the
promising solar absorber materials by classifying the data based on spectroscopic limited maximum efficiency (SLME)
for quick pre-screening of the materials.[
38
] For screening the high-pressure alloys that are used for hydrogen storage,
ML models like RepTree, RFR, and Neural Networks are used. These models are trained to predict the thermodynamic
properties, such as hydride enthalpies and entropy of hydrogenation of the alloys.[
39
] In the same work, low-energy
Ti–Mn–Fe structures are detected by predicting the structure and phase with the help of a genetic algorithm. However,
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 4
Artificial Intelligence in Material Engineering A PREPRINT
while comparing this structure with DFT and CALPHAD studies, contradictory results are found. In [
40
], a dataset for
LiSi battery materials is designed by finding the random structure relaxations of the material using DFT. The datasets
used in the state-of-the-art models are described in table 3 of supplementary materials. For measuring the quantity of
dendrite growth in the initiation phase of Li metal anode, ML frameworks have been used recently.[
41
] In this task,
different ML models are used in different phases. In the Isotropic material screening phase, where the stability of
electrodeposition is determined for solid electrolytes, the shear and bulk moduli are calculated using the crystal graph
convolutional neural network (CGCNN), and in the second phase, that is, anisotropic material screening, AdaBoost,
Lasso, and Bayesian ridge regression are used for prediction of C11,C12 and C44 elastic constants.
The Open Catalyst 2020 (OC20), a DFT-based catalyst dataset, is designed by L Chanussot et al. [
42
] that contains
surfaces and adsorbates for renewable energy storage. The dataset is used to train deep learning models such as
CGCNN, SchNet, and DimeNet++ for random structure relaxation, Structure to Energy and Forces (S2EF) prediction,
and relax state energy prediction from an initial structure. In the field of renewable energy materials, researchers have
worked to develop suitable ML models for alkaline ion batteries like electrodes, electrolytes, photovoltaic materials,
catalytic materials, etc.[
43
] A unified statistical model for potential energy estimation named PreFerred Potential (PFP)
is designed to control any combination of 45 elements of the periodic table and applied for a variety of tasks such as
calculating the activation energy of battery materials without including the material in the training dataset to optimize
the crystal structure of MOF and in finding the transition temperature of Cu-Au alloys.[
44
] In thermoelectric property
prediction, Laugier et al. [
45
] tried the CGCNN model but was not successful in bypassing DFT. In this work, a
Fully Connected Neural Network (FCNN) is trained on thermoelectric power factor obtained from DFT + Boltzmann
Transport Equations (BTE) and found that though the convergence time is more in FCNN, it showed good prediction
accuracy. A modified version of CGCNN, where the local environments of each crystal are represented by vectors that
are dependent on the composition and structure and independent of human-designed features, is used in perovskites,
elemental Boron, and inorganic crystals. This model is capable of predicting Electronegativity, group number, radius,
and element blocks of perovskites with remarkable accuracy. Also, complex Boron configurations are successfully
explored by the model. But in local energy prediction of inorganic materials, it showed poor performance[
46
] The
recent progress in ML models has shown great potential in predicting multi-property in a single model. Materials Graph
Network (MEGNet), an upgraded version of Graph Neural Network, is used for predicting internal energy for multiple
temperatures, enthalpy, entropy, and Gibbs free energy with temperature within the same model.[
47
] The model also
showed good accuracy in electrical property prediction. An important strategy included here is the unification of
multiple free energy MEGNet models into a single one by integrating global state variables such as pressure, entropy,
and temperature.
Because of this advancement, the model is capable of predicting many important properties like internal energy for
multiple temperatures, enthalpy, entropy, and Gibbs free energy with temperature and can eliminate the need to design a
separate model for each of the properties. Another multi-task learning model HydraGNN developed by M L Pasini et al.
[
48
], is capable of predicting atomic magnetic moment, mixing enthalpy and atomic charge transfer at the same time.
The model architecture consists of two sets of layers, the first set of layers learns the standard features, and the second set
learns the specific features of any material property. In the convolutional layers of HydraGNN, a variation of the Graph
convolutional neural network (GCNN) known as Principal Neighbourhood Aggregation (PNA) is included, which makes
it easier for the model to classify two different graphs. ML is fueling the growth of polymer industries and nanoparticles
also. In polymer laboratories, AI-powered autonomous robots are developed that are capable of performing structure-
function testing.[
49
] In nanoparticle synthesis, stable noisy optimization by branch and fit (SNOBFIT) algorithm,
Microfluidic, and Robotic based systems integrated with ML techniques have gained importance. Whereas Neural
Networks are used in carbon nano particles.[
50
] In [
51
], a CNN with a U-net architecture is used for finding the region
of interest by performing segmentation on high-resolution transmission electron microscopy (HRTEM). Here, defect
detection for individual nanoparticle regions is performed by random forest (see figure 2 for Random forest architecture).
Other significant works in numerous fields include: feedforward neural network that is used in X-Ray diffraction (XRD)
patterns to solve the space group determination problem[
52
],Teacher-student deep neural networks (TSDNN) for new
stable materials with negative formation energy and synthesizability for both labelled and unlabelled data with better
accuracy than prior models[
53
], Optimization of semiconductor dot devices using ML algorithms[
54
], supervised ML
models in interlayer energy and elastic constant prediction of essential heterostructure for solid and super lubricant
materials[
55
], Deep Potential Generator (DP-GEN) framework for statistical, mechanical and dynamical properties
prediction of Al, Mg and Al-Mg alloys and for the modelling of Potential Energy Surface (PES) using the molecular
dynamics (MD) simulation and without inclusion of structural information[
56
], 3-D chemical structure prediction
using Variational Autoencoder and a 3-D U-Net segmentation network with an attention mechanism.[
57
], prediction of
compounds having highest melting temperature using ordinary least squares regression (OLSR), partial least-squares
regression (PLSR), support vector regression (SVR),and Gaussian process regression (GPR) models by preparing
separate predictor sets, where first set contains the physical properties and the second one contains both physical and
elemental properties[58]
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 5
Artificial Intelligence in Material Engineering A PREPRINT
New Residuals
New Residuals
Base Model
Figure 2: Two frequently used Machine Learning MOdels a)Schematic of the Gradient Boosted Decision tree regression model: From a given set of data, a base model
finds out the initial prediction and sets the first residuals. The residual is sent to a weak model known as a residual model. Each of the residual models updates its
prediction by the new residuals, and finally, a strong model, which is an ensemble of all the weak models, gives the final prediction value.
b)Schematic of a Random Forest Regression (RFR). From a set of Input Variables, random samples are fed to a set of decision trees. The output generated by the trees
is averaged and considered as a final result.
1.4 AI in predicting molecular property:
In molecular property prediction, frameworks like MoleculeNet are designed for predicting four categories of molecular
properties: quantum mechanics, physical chemistry, biophysics, and physiology by preparing separate datasets for
each of these properties and creating a separate metric and splitting pattern for each of these datasets.[
59
] Graph
Convolutional Networks (GCN) and Graph Isomorphism Networks (GIN) with self-supervised learning mechanisms
in order to improve the classification and regression task are used to design the framework Molecular Contrastive
Learning of Representations (MoLCLR) for molecular property prediction.[
60
] The deep tensor neural network (DTNN)
designed for quantum chemical property prediction is scalable according to the number of atoms in a molecule. It
is capable of predicting data beyond the training set. The DTNN is successfully used in isomer energy prediction,
in the classification of molecules based on carbon ring stability, and in some peculiar electronic structure prediction
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 6
Artificial Intelligence in Material Engineering A PREPRINT
giving uniform accuracy for intermediate-size molecules.[
61
] The KV-PLM model is capable of learning the co-relation
between biomedical text and molecular structures and hence can assist the discovery of the drug.[
62
] The Schnet model
designed to search the chemical space and energy surfaces was successfully used for predicting quantum mechanical
property for
C20
-fullerene, which was not possible with the simulation method.[
63
] For predicting physicochemical
properties from molecular structures, a multiplex graph neural network named Multiplex Molecular Graph Neural
Network (MMGNN) is designed and found to be efficient.[
64
] Beyond the concept of property prediction for some
existing materials, an advanced concept in material science is the inverse design framework. Wherein new materials
are discovered for a given target property. This is manifested by a deep variational autoencoder neural network with a
supervised mechanism called active learning and a generative adversarial deep neural network and have discovered
eleven different semiconductor materials and two materials with high bandgap. [
65
]. In [
66
], a high-speed inverse
design framework named as Fourier-Transformed Crystal Properties (FTCP) for inorganic crystals that can predict the
structure and chemistry of a material upon giving some targeted property is designed. The framework is then used
for targeted formation energies, bandgap, and for thermoelectric power factor. Conditional GAN (CondGAN) and
conditional VAE (CondVAE) models are used to implement the inverse design concept where inorganic compositions
are generated from the target property without including the crystal information.[
67
] Various machine Learning models
are also used for engineering new concrete formulas for desirable properties.[
68
] The realistic samples are generated by
conditional distribution
p(y|x)
so that the properties of the generated data can be controlled by changing the value of
x
. Where
x
contains the information about strength, age, and environmental impact, and
y
represents the number of
constituent materials. The reduced environmental effect and strength of the formulas are then verified using a regression
model by estimating the similarity between generated properties with the desired properties. Some of the tools and
frameworks used in implementing DL and ML algorithms in state-of-the-art works are described in table 1
Table 1: Tools and Frameworks
Tools and
Frameworks
Description Link
Scikit-learn
An Open source easy to use and efficient
tool for predictive data analysis https://scikit-learn.org/stable/
Tensorflow
A free and open source library specifically
designed for DL models https://www.tensorflow.org/
Theano
The Python library for DL that can perform
fast numerical computations that includes
multi-dimensional arrays
https://pypi.org/project/Theano/
Caffe
A deep learning framework having an ex-
pression architecture that allows the users
to switch between CPU and GPU
https://caffe.berkeleyvision.org/installation.html
MXNet
It is a fast and scalable DL framework that
allows training of fast model and supports
multiple programming language.
https://mxnet.apache.org/versions/1.9.0/
Keras
An open source library that acts as an inter-
face for the tensorflow library https://keras.io/
Pytorch Pytorch is a python package that gives two
features tensor computation with strong ac-
celeration by GPU and deep neural networ
k built on a automatic differentiation system
https://pytorch.org/
CNTK
A unified deep learning framework that uses
a directed graph and series of computational
steps to describe neural netowrk
https://docs.microsoft.com/en-us/cognitive-
toolkit/
PyCaret
A machine learning framework for automa-
tion of machine learning workflows https://pycaret.org/
DeepChem
A deep learning framework for Drug Dis-
covery, Quantum Chemistry, Materials Sci-
ence and Biology
https://deepchem.io/
Deep Docking
A deep learning framework for molecular
docking https://github.com/zhenglz/dockingML
MolPAL
Active learning framework for high
throughput virtual screeening https://github.com/coleygroup/molpal
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 7
Artificial Intelligence in Material Engineering A PREPRINT
Hugging face
A machine learning library that creates base
model to build on top of tensorflow and
Pytorch
https://huggingface.co/
GraphInvent
A platform to generate graph based
molecules using GNN
https://github.com/MolecularAI/GraphINVENT
ATOMAI
Deep learning framework for microscopy
data https://atom.io/packages/ide-python
Veles A distributed DL framework https://github.com/Samsung/veles
2 Important methods:
The recent surge in Artificial Intelligence can be attributed to the availability of datasets and computing power such
as Graphics Processing Units, Tensor Processing Units, and other hardware accelerators. These have allowed for the
development of intricate and deep structures, which enable for exhaustive investigations such as Material Engineering.
In this part, we will go over some of the AI methods used in Material Engineering. To begin, we will review the datasets
that are available for AI based material engineering.
2.1 Datasets for material engineering:
Datasets are the fuel of any AI modal. For AI-based material engineering, we need datasets with desired properties and
with a sufficient number of samples. Recently, the national institute of standards and technology (NIST) has produced
several datasets for material engineering. The JARVIS-DFT is one such dataset.[
69
] This dataset contains DFT-based
material properties of 40000 bulk and 1000 crystalline materials. This dataset contains the properties such as formation
energies, bandgaps, elastic, piezoelectric, dielectric constants, magnetic moments, exfoliation energies for van der Waals
bonded materials, improved meta-Generalized Gradient Approximations(meta-GGA) bandgaps, frequency-dependent
dielectric function, spin-orbit spillage, spectroscopy limited maximum efficiency (SLME), infrared (IR) intensities,
electric field gradient (EFG), heterojunction classifications, and Wannier tight-binding Hamiltonians.[
69
] The next
dataset is the Open Quantum Material Database. This dataset is developed and maintained by the Wolverton Research
Group at Northwestern University. This dataset contains the thermodynamic and structural properties of 1,022,603
materials.[
70
] A detailed list of all the available datasets for AI-based material Engineering is provided in Table 3 of the
supplementary document.
2.2 Neural network:
Neural networks are modeled after the biological brain. A neural network consists of three types of layers. These are
the input, hidden, and output layers. The input layer is the data feeding layer, the output layer is the final prediction
layer and in the hidden layer(s) the feature extraction and computations are executed. Each layer in the network consists
of a set of artificial neurons, which are connected to the neurons in the adjacent layers by a set of weights. The weights
determine the strength of the connection between two neurons and are the parameters that the network learns during
training. The basic operation of a neuron is to take a weighted sum of its inputs, apply an activation function to the
result, and pass the output to the neurons in the next layer. The activation function is typically a non-linear function
that introduces non-linearity into the network, allowing it to learn complex relationships in the data. A typical Nural
Network is as shown in figure 3. Mathematically the forward pass can be derived as
Hi=σ(Xi∗Wi+bi)
. Where
Xi
is the input,
Wi
is the weight matrix,
bi
is the bias and
σ
is the activation function, such as sigmoid
σ(x) = 1
1+e−x
.
The learning process happens by adjusting the weights and biases to achieve the desired output at the output layer. This
model can be further deepened by introducing more hidden layers. The deeper the layer the more complex phenomenon
can be learned. This type of deep neural network can learn more complex features such as properties of materials.
There are few other types of neural network that are extensively used in material engineering. These are Convolutional
Neural Network (CNN), Graph Neural Networks (GNNs), Generative model etc.
A convolutional neural network(CNN) is based on the mathematical operation of convolution, which is a way of
combining two functions to produce a third function that represents how one of the original functions is modified by the
other.[
71
] In the case of a CNN, the input data (e.g., an image) is convolved with a set of learnable filters to produce a
set of feature maps. In the context of a CNN, the input data is typically a 3D tensor with dimensions (width, height,
channels), where channels represent the number of color channels (e.g., 3 for RGB images). The filters, or kernels,
are also 3D tensors with dimensions (kernel width, kernel height, input channels), where input channels matches the
number of channels in the input data. To apply a convolutional layer in a neural network, slide the kernel over the
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 8
Artificial Intelligence in Material Engineering A PREPRINT
W2
Hidden Layer (H1)
Output Layer
W1
b1
Input Layer
Hidden Layer (H2)
X
b2
W3
Input Volume
Filter
Output Volume
Convolutional Neural Network
Fully connected Neural Network
(a) (b)
X1X2
X3
W3
W2
W1
Figure 3: A schematic of neural netowrks. (a) Schematio of fully connected neural network, (b) convolutional neural network
input data, computing the dot product between the kernel and the local region of the input data at each position. This
operation produces a single value, which is used to populate the corresponding position in the output feature map. The
mathematical formula for the convolution operation can be expressed in matrix form as follows: G= F * K where F is a
matrix representation of the input data, K is a matrix representation of the kernel, and G is a matrix representation of the
output feature map.[
71
] Zhuo Cao et al. have demonstrated the use of CNN in predicting the properties of crystalline
materials.[
72
]. However, performance of CNN generally suffers if the topology is arbitrary or change in orientation of
the object. These type of limitations can be avoided using the graph neural network.[73]
2.3 Graph neural networks for efficient material engineering:
Graph Neural Network (GNN) is an artificial neural network that processes and analyzes data structured in graph form.
They have grown in popularity due to their capacity to deal with complex associations and dependencies between
entities in a graph. Graphs consist of nodes (also known as vertices) and edges that link the nodes. Each node and edge
can have connected characteristics, representing properties such as node or edge types, numerical values, or categorical
labels. GNNs manipulate a graph by iteratively gathering information from neighboring nodes and edges, using a
neural network to change and update the node representations. This process can be repeated over multiple layers, each
learning more intricate patterns and dependencies. GNNs have produced positive results in many areas, for instance,
recommendation systems, drug discovery, social network analysis, and traffic prediction. They can manage different
graphs, such as directed and undirected, bipartite, and heterogeneous graphs. A convenient way to represent molecules
is in the form of a graph, where atoms are used to represent the featured nodes, and the interatomic bonds (with bond
order) are used to represent edges. Features include properties like atomic identity, formal charge, and aromaticity.[
74
]
These are used in a molecular fingerprint that determines the presence or absence of a substructure in an atom. A typical
graph neural network is shown in figure 4.
In GNN based model, interactions between atoms are learned by the atom embedding in a high-dimensional space and
updating the embeddings by performing message passing. In recent years, many groundbreaking works have been
published in GNN.[
32
,
15
,
49
,
25
,
46
,
28
]. In [
20
] crystal graph is used in property prediction. A similar architecture
was applied to predict thermoelectric properties on a dataset of crystal and atomic information.[
32
] The atomistic line
graph neural network (ALIGNN) has achieved up to 85% accuracy in solid and molecular property prediction.[
15
] In
[
32
], the angular information is included explicitly by introducing a line graph. The line graph contains bond distance
and bond angle information. The model designed with the inclusion of structural information is trained on crystalline
materials properties and on molecular properties that could give good accuracy. This model outperformed conventional
descriptors such as CFID. Despite the great potentials of GNN, most of the state-of-the-art GNN models suffers from
the issue of over-smoothing where with increasing depth, the model tends to make the embedding of all the nodes
similar, which makes it challenging to classify unlabeled node and thus making GNN unscalable. To address this
issue, an architecture called Deeper Graph Attention Neural Network (DeeperGATGNN) is introduced.[
75
]. Before
DeeperGATGNN, a simpler version named GATGNN was implemented. It contains two different soft attention layers;
the first layer extracts the features at the local level of neighboring atoms. The subsequent attention layer extends
this neighborhood-dependent information to the global context. This architecture is then extended by introducing
additive skip connections between these soft attention layers and differentiable group normalization (DGN) layers.
The DGN makes different clusters of the nodes of a graph, and each cluster is normalized separately, thus having
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 9
Artificial Intelligence in Material Engineering A PREPRINT
Figure 4: Schematic of a Graph Neural Network (GNN). Graph Neural Network is a type of Deep Learning Network used for node-level or edge-level predictions in
graph data. This figure is used to show the mechanism of a GNN and its convolution process to produce embeddings of nodes and pass through the activation function
to find out the output class
different mean and standard deviation for different clusters and, thus, representation vector being dissimilar. Another
advancement is the invocation of residual skip connection, where the stake layers learn the identity mapping denoted as
F(x) = H(x)−x
. Here x is the input,
H(x)
is the mapping function, and
F(x)
is the output function. This type of
GNN claiming more accurate in predicting material property.
2.4 Generative models:
"One of the continuing scandals of physical science is that it remains, in general, impossible to predict the structure of
even the simplest crystalline solids from a knowledge of their chemical composition." as quoted by John Maddox about
30 years ago.[
76
] Over the years, though different methods are evolved for crystal structure prediction of molecules
and solids, they are computationally expensive, and vast structure space is needed to be searched. As DL algorithms
are introduced in this field, Generative Networks(GN), a class of DL algorithms, can tackle this problem to a great
extent. GN has the capability of producing samples from a given distribution. The functionality is given as input, and
the model outputs a distribution of possible structures. This network works on joint probability distribution p(x, y),
which means they can notice both the molecular representation of a material denoted as x and its property as y. If the
conditional probability is applied, then the notation for design will be
(p(y|x))
, and by reversing the notation, we get
((p(x|y))
, which represents the inverse design of the material.[
73
] The two most commonly used generative networks
are: Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) (see figure 5).
The GAN is extended by adding an extra condition in both generator and discriminator by feeding an input layer Y to
both networks. This extension is Conditional GAN(CGAN).[
77
]. In an attempt to generate chemical compositions of
inorganic materials without the crystal structure information, the concept of CGAN is used by Sawada et al. [
67
]. In
this model, the popular feature engineering scheme Bag-of-Atoms is used with the additional inclusion of the physical
descriptors. In Bag-of-Atoms, the number of atoms in a chemical composition is represented in vector form. The
physical descriptors are the fundamental properties of atoms that do not contain crystal information. CGAN is also used
in [
68
] for generating new concrete formulas for building materials. In the conditional probability
p(x|y)
, x contains
the information about strength, age, and environmental impact, and y represents the number of constituent materials.
2.5 Leveraging transfer learning for improved accuracy:
Needless to say that DL models are data hungry and can give plausible accuracy only after training with millions of
data. But experimental datasets are normally smaller compared to computational datasets. The concept of deep transfer
learning is therefore embedded in many of the recently developed DL models [
17
][
13
],[
47
][
78
][
79
] where knowledge
of one model is transferred to other model by making use of the features learned from the huge dataset. To accomplish
this, at first,a source model is trained using a large dataset and then model parameters are tweaked by training on target
property.
A deeper concept of this is the cross-property deep transfer learning (see figure 6), where the source model and target
models are trained for different properties.[
18
]. The properties that don’t have a large dataset can be predicted using
this concept. It is implemented by two methods: i) by fine tuning the source model with target data and ii) learning the
features from target dataset and then leveraging those in building target model. The Original ElemNet DL model has
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 10
Artificial Intelligence in Material Engineering A PREPRINT
Figure 5: Three Generative Models a)Schematic of the VAE model: Schematic of a Variational Autoencoder. The encoder acts as a compressor and generates the latent
space by mapping the molecules to a vector space which is then mapped back to the molecule representation using the decoder.[73]
b)Schematic of a Generative Adversarial Network. Two different convolutional neural networks are used in the model. One is used as a generator that generates some
pattern from a set of the random input vector, and the discriminator network discriminates whether the data is real data or fakely generated by the model. If the data
generated by the generator is labeled as fake by the discriminator, it is backpropagated to the generator. The generator readjusts the weight and resends the improved
result to the discriminator. this process continues until it becomes difficult to differentiate between real data and the generator’s data.
c)The CVAE model is designed for new concrete formulas for building materials
gone through some significant improvements before implementation of cross property deep transfer learning which
results in improved Mean Absolute Error (MAE). The modified version of ElemNet is implemented using TensorFlow
2 (TF2) with Keras as an interface instead of TensorFlow 1. These improvements in the library lead the model to learn
chemical interactions and elemental similarities more accurately resulting a reduced in MAE of 0.0405 eV/atom. Again,
the use of Monte Carlo Drop out in training, valiation and test phases leads to different activation and output for the
same input in each run of the model which is a barrier in consistent feature extraction. The model improved its MAE
further from 0.0405 eV/atom to 0.0373 eV/atom once this drop out is disabled. These to modifications leads to approx
10% improvement in MAE.
2.6 AI techniques used in traditional analytical instruments:
AI techniques are used extensively and have shown great potential in the existing traditional analytical instruments
of MSE, such as X-Ray Diffraction (XRD), TEM, and Scanning Electron Microscopy (SEM) image analysis. In
XRD, traditional descriptors that are used for pattern analyzation and for mapping the phase diagrams, are found to
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 11
Artificial Intelligence in Material Engineering A PREPRINT
W*
b*β
Wb
Input
Output
W2*b2*β2
W2
b2
New type of Input
Output
Transfer of Trained parameters
Model A: Training from Scratch
Model B: Retrain with
trained parameters from Model A
Store the trained
parameters
Model B
Figure 6: Schematic representation of transfer learning. The DL model A is trained from scratch, especially with a large dataset. The model parameters like weights
(W) and biases (b) are then saved. These saved parameters can be used whenever the model needs to be trained with a new type of dataset or any change needed in the
architecture itself. In this scenario, the saved parameters can be transferred to model B and re-train with a new type of dataset. This method generally converges early
compared to training the model from scratch.
be tedious and time-consuming. Recent works made a leap forward through the seamless integration of ML models
in depicting compounds of interest and phase diagrams. In [
80
], a Convolutional Neural Network (CNN) is used for
one-to-one identification of XRD spectra of materials processed from an experimental database by removing noise.
While comparing the performance of CNN with some other classical ML algorithms, it has shown better results with an
accuracy of 96.7%. In [
81
], a convolutional network and a dense network are used to learn the features of the inference
patterns of powdered XRD, which will lead the model to predict the crystal symmetry, which was otherwise done
conventionally using some peak finding algorithm. The convolutional network performed well in theoretical data, but it
showed poor performance in experimental data, whereas the dense network showed higher classification accuracy for
the experimental data. 82% classification accuracy is achieved for the dense network while classifying only half of
the samples. At the same time, the CNN could not give good accuracy to be used as in crystal symmetry prediction.
In Transmission Electron Microscopy (TEM) also, AI has shown its potential. An unsupervised ML algorithm (Auto
detect mNP) has been designed for the classification of particle shapes of metal nanoparticles (mNP) from TEM images.
The algorithm can also determine the impurities in mNP synthesis. The algorithm is also able to classify the long rod
and short rod nano particles.[
82
] For automation of structural analysis of nanoparticles from high-resolution TEM
(HRTEM) data, a two-stage framework is designed. A CNN with a U-net architecture performs segmentation on
HRTEM nanoparticles, and a random forest classifier detects the defects of individual nanoparticle regions with an
accuracy of 86%.[51]
2.7 Advantage and disadvantages of AI in material engineering and its future:
The use of AI in material engineering is a dynamic area of research that is evolving continuously. Using the right kind
of algorithm can significantly accelerate the discovery of new materials as ML algorithms can learn repetitive patterns,
resulting in faster simulation of complex structures and chemical reactions. The challenge of building interatomic
potentials is also taken care of by AI-based methods, as the artificial Neural Network has paved the way for the
construction of potential energy surfaces with a higher efficiency of several orders of magnitude compared to traditional
methods. AI algorithms can also be used for exploring the massive chemical space of a material by training the model
with existing samples and then using the trained model for predicting all possible combinations, which was earlier a
big hurdle for simulation-based methods such as DFT. Thousands of stable configurations have been discovered by
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 12
Artificial Intelligence in Material Engineering A PREPRINT
researchers by training ML models with the existing datasets. Future researchers can work for more efficient feature
space extraction to provide new paradigms for discovering stable material configurations. ML has stepped into the
area of drug discovery too. The Graph Neural Networks and other DL models have greatly helped researchers predict
the solubility of molecules and drug target interactions for drug material. Another critical area of research in Drug
discovery is the prediction of the protein structure of the targeted molecule to make the treatment successful. Here the
AI researcher can grab the opportunity and think big for developing ML and DL algorithms to fuel the growth of target
protein structure prediction. In energy storage materials, algorithms can be developed to fulfill multiple objectives.
For example, properties like dipole orientation, atomic polarization, resonant effect, and relaxation effect are to be
predicted simultaneously to screen a dielectric material. These factors may also have inverted coupling relationships
to be optimized collaboratively. For optimizing collective properties, single optimization algorithms are not enough.
Therefore, in future work, people may go for the development of multi-optimization algorithms. Though ML has
revolutionized the discoveries of material science, there are a few challenges that still need to be addressed. The lack of
data (especially from experimental results) is one of the significant drawbacks. Again the poor explainability of AI
models is another drawback. The complex generative models are generally treated as a black box, where chemical
relationships are not firmly established, and the error analysis too is a difficult task. Therefore, the development of
both highly accurate and interpretable models will definitely make a leap forward in material science research. The
non-viability is a significant hurdle faced by the chemical structures generated by the generative deep learning models.
The highly complex chemical structures produced by the generative models outside the existing chemical space are
theoretically feasible. However, synthesizing them may not be viable because of the high cost and complexity.
3 Conclusion:
In conclusion, AI will radically change the ways of developing materials. This will help us to find the materials of desired
properties efficiently. This will reduce the time, money, and effort that are needed to create a material. The capabilities
of traditional analytical tools will also be improved by adopting the advantages of AI and ML. Recently applied graph
neural network-based approaches and generative adversarial algorithms are helping in the rapid development of drugs.
The Google Alpha Fold is one such example, which helps us to find the folding of proteins with very high accuracy
and thereby find appropriate molecules for drugs. Again various computer vision-based tools are helping us to analyze
traditional analytical tools such as transmission electron microscopy (TEM), scanning electron microscope (SEM),
Atomic Force Microscopy (AFM), etc. The traditional computation tools such as DFT and in-slico methods are also
defeated by AI and ML-based algorithms, both in terms of accuracy and efficiency. All these development have brought
us the hope to engineer materials that can meet the demands and also shape the future.
Acknowledgment
Mohendra Roy acknowledges the seed grant No.
ORSP/R&D/P DP U /2019/MR/RO051
of PDEU, the core
research grant No.
CRG/2020/000869
of the Science and Engineering Research Board (SERB), Govt. of India and
the project grant no GU JC OS T/S T I/202 −22/3873 of GUJCOST, Govt. of Gujarat, India.
Note: The supplementary is available at the end of the reference section
References
[1]
Kamal Choudhary, Brian DeCost, Chi Chen, Anubhav Jain, Francesca Tavazza, Ryan Cohn, Cheol Woo Park,
Alok Choudhary, Ankit Agrawal, Simon JL Billinge, et al. Recent advances and applications of deep learning
methods in materials science. npj Computational Materials, 8(1):1–26, 2022.
[2] Seymour Michael Blinder. Introduction to quantum mechanics. Academic Press, 2020.
[3]
Ze Xu. Density Functional Perturbation Theory and Adaptively Compressed Polarizability Operator. PhD thesis,
UC Berkeley, 2019.
[4]
Asma Nouira, Nataliya Sokolovska, and Jean-Claude Crivello. Crystalgan: learning to discover crystallographic
structures with generative adversarial networks. arXiv preprint arXiv:1810.11203, 2018.
[5]
Rama K Vasudevan, Kamal Choudhary, Apurva Mehta, Ryan Smith, Gilad Kusne, Francesca Tavazza, Lukas
Vlcek, Maxim Ziatdinov, Sergei V Kalinin, and Jason Hattrick-Simpers. Materials science in the artificial
intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the
underpinning physics. MRS communications, 9(3):821–838, 2019.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 13
Artificial Intelligence in Material Engineering A PREPRINT
[6]
Chi Chen, Yunxing Zuo, Weike Ye, Xiangguo Li, Zhi Deng, and Shyue Ping Ong. A critical review of machine
learning of energy materials. Advanced Energy Materials, 10(8):1903242, 2020.
[7]
Jonathan Schmidt, Mário RG Marques, Silvana Botti, and Miguel AL Marques. Recent advances and applications
of machine learning in solid-state materials science. npj Computational Materials, 5(1):1–36, 2019.
[8]
Valentin Stanev, Kamal Choudhary, Aaron Gilad Kusne, Johnpierre Paglione, and Ichiro Takeuchi. Artificial
intelligence for search and discovery of quantum materials. Communications Materials, 2(1):1–11, 2021.
[9]
Daniel P Tabor, Loïc M Roch, Semion K Saikin, Christoph Kreisbeck, Dennis Sheberla, Joseph H Montoya,
Shyam Dwaraknath, Muratahan Aykol, Carlos Ortiz, Hermann Tribukait, et al. Accelerating the discovery of
materials for clean energy in the era of smart automation. Nature Reviews Materials, 3(5):5–20, 2018.
[10]
Jiaqi Jiang, Mingkun Chen, and Jonathan A Fan. Deep neural networks for the evaluation and design of photonic
devices. Nature Reviews Materials, 6(8):679–700, 2021.
[11]
Mohit Pandey, Michael Fernandez, Francesco Gentile, Olexandr Isayev, Alexander Tropsha, Abraham C Stern,
and Artem Cherkasov. The transformational role of gpu computing and deep learning in drug discovery. Nature
Machine Intelligence, 4(3):211–221, 2022.
[12]
Kamal Choudhary, Kevin F Garrity, Charles Camp, Sergei V Kalinin, Rama Vasudevan, Maxim Ziatdinov, and
Francesca Tavazza. Computational scanning tunneling microscope image database. Scientific data, 8(1):1–9,
2021.
[13]
Nikolas Claussen, B Andrei Bernevig, and Nicolas Regnault. Detection of topological materials with machine
learning. Physical Review B, 101(24):245117, 2020.
[14]
Valentin Stanev, Corey Oses, A Gilad Kusne, Efrain Rodriguez, Johnpierre Paglione, Stefano Curtarolo, and
Ichiro Takeuchi. Machine learning modeling of superconducting critical temperature. npj Computational
Materials, 4(1):1–14, 2018.
[15]
Kamal Choudhary, Taner Yildirim, Daniel W Siderius, A Gilad Kusne, Austin McDannald, and Diana L
Ortiz-Montalvo. Graph neural network predictions of metal organic framework co2 adsorption properties.
Computational Materials Science, 210:111388, 2022.
[16]
Prathik R Kaundinya, Kamal Choudhary, and Surya R Kalidindi. Prediction of the electron density of states for
crystalline compounds with atomistic line graph neural networks (alignn). JOM, 74(4):1395–1405, 2022.
[17]
Dipendra Jha, Kamal Choudhary, Francesca Tavazza, Wei-keng Liao, Alok Choudhary, Carelyn Campbell, and
Ankit Agrawal. Enhancing materials property prediction by leveraging computational and experimental data
using deep transfer learning. Nature communications, 10(1):1–12, 2019.
[18]
Vishu Gupta, Kamal Choudhary, Francesca Tavazza, Carelyn Campbell, Wei-keng Liao, Alok Choudhary, and
Ankit Agrawal. Cross-property deep transfer learning framework for enhanced predictive analytics on small
materials data. Nature communications, 12(1):1–10, 2021.
[19]
Prathik R Kaundinya, Kamal Choudhary, and Surya R Kalidindi. Machine learning approaches for feature
engineering of the crystal structure: Application to the prediction of the formation energy of cubic compounds.
Physical Review Materials, 5(6):063802, 2021.
[20]
Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable
prediction of material properties. Physical review letters, 120(14):145301, 2018.
[21]
Soumya Sanyal, Janakiraman Balachandran, Naganand Yadati, Abhishek Kumar, Padmini Rajagopalan, Suchis-
mita Sanyal, and Partha Talukdar. Mt-cgcnn: Integrating crystal graph convolutional neural network with
multitask learning for material property prediction. arXiv preprint arXiv:1811.05660, 2018.
[22]
Mohammadreza Karamad, Rishikesh Magar, Yuting Shi, Samira Siahrostami, Ian D Gates, and Amir Barati
Farimani. Orbital graph convolutional neural network for material property prediction. Physical Review Materials,
4(9):093801, 2020.
[23]
Peter Bjørn Jørgensen, Karsten Wedel Jacobsen, and Mikkel N Schmidt. Neural message passing with edge
updates for predicting properties of molecules and materials. arXiv preprint arXiv:1806.03146, 2018.
[24]
Quan Zhou, Peizhe Tang, Shenxiu Liu, Jinbo Pan, Qimin Yan, and Shou-Cheng Zhang. Learning atoms for
materials discovery. Proceedings of the National Academy of Sciences, 115(28):E6411–E6417, 2018.
[25]
Sékou-Oumar Kaba, Benjamin Groleau-Paré, Marc-Antoine Gauthier, André-Marie Tremblay, Simon Verret, and
Chloé Gauvin-Ndiaye. Prediction of large magnetic moment materials with graph neural networks and random
forests. arXiv preprint arXiv:2111.14712, 2021.
[26]
Ghanshyam Pilania, Arun Mannodi-Kanakkithodi, BP Uberuaga, Rampi Ramprasad, JE Gubernatis, and Turab
Lookman. Machine learning bandgaps of double perovskites. Scientific reports, 6(1):1–10, 2016.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 14
Artificial Intelligence in Material Engineering A PREPRINT
[27]
Hitarth Choubisa, Petar Todorovi´
c, Joao M Pina, Darshan H Parmar, Ziliang Li, Oleksandr Voznyy, Isaac
Tamblyn, and Edward Sargent. Interpretable discovery of new semiconductors with machine learning. arXiv
preprint arXiv:2101.04383, 2021.
[28]
Sadman Sadeed Omee, Steph-Yves Louis, Nihang Fu, Lai Wei, Sourin Dey, Rongzhi Dong, Qinyang Li, and
Jianjun Hu. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns,
page 100491, 2022.
[29]
Francesca Tavazza, Brian DeCost, and Kamal Choudhary. Uncertainty prediction for machine learning models
of material properties. ACS omega, 6(48):32431–32440, 2021.
[30]
Matthias Rupp. Tkatchenko, alexandre haand müller, klaus-robert, and von lilienfeld, o. anatole. fast and accurate
modeling of molecular atomization energies with machine learning. Physical review letters, 108(5):058301,
2012.
[31]
Katja Hansen, Franziska Biegler, Raghunathan Ramakrishnan, Wiktor Pronobis, O Anatole Von Lilienfeld,
Klaus-Robert MuHERE!HERE!ller, and Alexandre Tkatchenko. Machine learning predictions of molecular
properties: Accurate many-body potentials and nonlocality in chemical space. The journal of physical chemistry
letters, 6(12):2326–2331, 2015.
[32]
Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property
predictions. npj Computational Materials, 7(1):185, 2021.
[33]
Nicholas Lubbers, Justin S Smith, and Kipton Barros. Hierarchical modeling of molecular energies using a deep
neural network. The Journal of chemical physics, 148(24):241715, 2018.
[34]
Kamal Choudhary, Kevin F Garrity, Vinit Sharma, Adam J Biacchi, Angela R Hight Walker, and Francesca
Tavazza. High-throughput density functional perturbation theory and machine learning predictions of infrared,
piezoelectric, and dielectric responses. NPJ Computational Materials, 6(1):1–13, 2020.
[35]
Kamal Choudhary, Kevin F Garrity, Nirmal J Ghimire, Naween Anand, and Francesca Tavazza. High-throughput
search for magnetic topological materials using spin-orbit spillage, machine learning, and experiments. Physical
Review B, 103(15):155131, 2021.
[36]
Nghia Nguyen, Steph-Yves V Louis, Lai Wei, Kamal Choudhary, Ming Hu, and Jianjun Hu. Predicting lattice
vibrational frequencies using deep graph neural networks. ACS omega, 7(30):26641–26649, 2022.
[37]
L Ward, A Agrawal, A Choudhary, and C Wolverton. A general-purpose machine learning framework for
predicting properties of inorganic materials. npj computational materials 2016, 2, 16028. Materials ICSDID
JARVIS-ID HERE!HERE! (G0W0) HERE!HERE! (TBmBJ), 200000, 2016.
[38]
Kamal Choudhary, Marnik Bercx, Jie Jiang, Ruth Pachter, Dirk Lamoen, and Francesca Tavazza. Accelerated
discovery of efficient solar cell materials using quantum and machine-learning methods. Chemistry of Materials,
31(15):5900–5908, 2019.
[39]
Jason R Hattrick-Simpers, Kamal Choudhary, and Claudio Corgnale. A simple constrained machine learning
model for predicting high-pressure-hydrogen-compressor materials. Molecular Systems Design & Engineering,
3(3):509–517, 2018.
[40]
Ekin Dogus Cubuk, Evan J Reed, Gowoon Cheon, Kevin James McCloskey, Lusann Yang, and Michael Brenner.
Crystal structure search with random structure relaxations using graph networks. 2020.
[41] Zeeshan Ahmad, Tian Xie, Chinmay Maheshwari, Jeffrey C Grossman, and Venkatasubramanian Viswanathan.
Machine learning enabled computational screening of inorganic solid electrolytes for suppression of dendrite
formation in lithium metal anodes. ACS central science, 4(8):996–1006, 2018.
[42]
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin
Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, et al. The open catalyst 2020 (oc20) dataset and community
challenges (vol 11, pg 6059, 2021). ACS CATALYSIS, 11(21):13062–13065, 2021.
[43]
Yun Liu, Oladapo Christopher Esan, Zhefei Pan, and Liang An. Machine learning for advanced energy materials.
Energy and AI, 3:100049, 2021.
[44]
So Takamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, Iori Kurata, Taku Watanabe,
Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano, et al. Towards universal neural network potential for material
discovery applicable to arbitrary combination of 45 elements. Nature Communications, 13(1):1–11, 2022.
[45]
Leo Laugier, Daniil Bash, Jose Recatala, Hong Kuan Ng, Savitha Ramasamy, Chuan-Sheng Foo, Vijay R
Chandrasekhar, and Kedar Hippalgaonkar. Predicting thermoelectric properties from crystal graphs and material
descriptors-first application for functional materials. arXiv preprint arXiv:1811.06219, 2018.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 15
Artificial Intelligence in Material Engineering A PREPRINT
[46]
Tian Xie and Jeffrey C Grossman. Hierarchical visualization of materials space with graph convolutional neural
networks. The Journal of chemical physics, 149(17):174111, 2018.
[47]
Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong. Graph networks as a universal machine
learning framework for molecules and crystals. Chemistry of Materials, 31(9):3564–3572, 2019.
[48]
Massimiliano Lupo Pasini, Pei Zhang, Samuel Temple Reeve, and Jong Youl Choi. Multi-task graph neural
networks for simultaneous prediction of global and atomic properties in ferromagnetic systems. Machine
Learning: Science and Technology, 3(2):025007, 2022.
[49]
Adam J Gormley and Michael A Webb. Machine learning in combinatorial polymer chemistry. Nature Reviews
Materials, 6(8):642–644, 2021.
[50]
Huachen Tao, Tianyi Wu, Matteo Aldeghi, Tony C Wu, Alán Aspuru-Guzik, and Eugenia Kumacheva. Nanopar-
ticle synthesis assisted by machine learning. Nature Reviews Materials, 6(8):701–716, 2021.
[51]
Catherine K Groschner, Christina Choi, and Mary C Scott. Machine learning pipeline for segmentation and
defect identification from high-resolution transmission electron microscopy data. Microscopy and Microanalysis,
27(3):549–556, 2021.
[52]
Pascal Marc Vecsei, Kenny Choo, Johan Chang, and Titus Neupert. Neural network based classification of
crystal symmetries from x-ray diffraction patterns. Physical Review B, 99(24):245120, 2019.
[53]
Minchul Shin. Semi-supervised learning with a teacher-student network for generalized attribute prediction. In
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part XI 16, pages 509–525. Springer, 2020.
[54]
Natalia Ares. Machine learning as an enabler of qubit scalability. Nature Reviews Materials, 6(10):870–871,
2021.
[55]
Marco Fronzi, Mutaz Abu Ghazaleh, Olexandr Isayev, David A Winkler, Joe Shapter, and Michael J Ford.
Impressive computational acceleration by using machine learning for 2-dimensional super-lubricant materials
discovery. arXiv preprint arXiv:1911.11559, 2019.
[56]
Linfeng Zhang, De-Ye Lin, Han Wang, Roberto Car, and E Weinan. Active learning of uniformly accurate
interatomic potentials for materials simulation. Physical Review Materials, 3(2):023804, 2019.
[57]
Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, and Yoshua Bengio.
Data-driven approach to encoding and decoding 3-d crystal structures. arXiv preprint arXiv:1909.00949, 2019.
[58]
Atsuto Seko, Tomoya Maekawa, Koji Tsuda, and Isao Tanaka. Machine learning with systematic density-
functional theory calculations: Application to melting temperatures of single-and binary-component solids.
Physical Review B, 89(5):054303, 2014.
[59]
Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl
Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science,
9(2):513–530, 2018.
[60]
Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of
representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022.
[61]
Kristof T Schütt, Farhad Arbabzadah, Stefan Chmiela, Klaus R Müller, and Alexandre Tkatchenko. Quantum-
chemical insights from deep tensor neural networks. Nature communications, 8(1):1–8, 2017.
[62]
Zheni Zeng, Yuan Yao, Zhiyuan Liu, and Maosong Sun. A deep-learning system bridging molecule structure and
biomedical text with comprehension comparable to human professionals. Nature Communications, 13(1):1–11,
2022.
[63]
Kristof T Schütt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R Müller. Schnet–a deep
learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
[64]
Shuo Zhang, Yang Liu, and Lei Xie. Molecular mechanics-driven graph neural network with multiplex graph for
molecular structures. arXiv preprint arXiv:2011.07457, 2020.
[65]
Rui Xin, Edirisuriya MD Siriwardane, Yuqi Song, Yong Zhao, Steph-Yves Louis, Alireza Nasiri, and Jianjun Hu.
Active-learning-based generative design for the discovery of wide-band-gap materials. The Journal of Physical
Chemistry C, 125(29):16118–16128, 2021.
[66]
Zekun Ren, Siyu Isaac Parker Tian, Juhwan Noh, Felipe Oviedo, Guangzong Xing, Jiali Li, Qiaohao Liang,
Ruiming Zhu, Armin G Aberle, Shijing Sun, et al. An invertible crystallographic representation for general
inverse design of inorganic crystals with targeted properties. Matter, 5(1):314–335, 2022.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 16
Artificial Intelligence in Material Engineering A PREPRINT
[67]
Yoshihide Sawada, Koji Morikawa, and Mikiya Fujii. Study of deep generative models for inorganic chemical
compositions. arXiv preprint arXiv:1910.11499, 2019.
[68]
Xiou Ge, Richard T Goodwin, Jeremy R Gregory, Randolph E Kirchain, Joana Maria, and Lav R Varshney.
Accelerated discovery of sustainable building materials. arXiv preprint arXiv:1905.08222, 2019.
[69]
Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight
Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang,
Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang,
Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David
Vanderbilt, Karin Rabe, and Francesca Tavazza. The joint automated repository for various integrated simulations
(JARVIS) for data-driven materials design. npj Computational Materials, 6(1), nov 2020.
[70]
James E. Saal, Scott Kirklin, Muratahan Aykol, Bryce Meredig, and C. Wolverton. Materials design and
discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM,
65(11):1501–1509, September 2013.
[71]
Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. Understanding of a convolutional neural network. In
2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, pages 1–6, 2017.
[72]
Zhuo Cao, Yabo Dan, Zheng Xiong, Chengcheng Niu, Xiang Li, Songrong Qian, and Jianjun Hu. Convolutional
neural networks for crystal material property prediction using hybrid orbital-field matrix and magpie descriptors.
Crystals, 9(4), 2019.
[73]
Benjamin Sanchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular design using machine learning:
Generative models for matter engineering. Science, 361(6400):360–365, 2018.
[74]
Daniel S Wigh, Jonathan M Goodman, and Alexei A Lapkin. A review of molecular representation in the age of
machine learning. Wiley Interdisciplinary Reviews: Computational Molecular Science, page e1603, 2022.
[75]
Sadman Sadeed Omee, Steph-Yves Louis, Nihang Fu, Lai Wei, Sourin Dey, Rongzhi Dong, Qinyang Li, and
Jianjun Hu. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns,
3(5):100491, 2022.
[76]
Jonathan Schmidt, Mário RG Marques, Silvana Botti, and Miguel AL Marques. Recent advances and applications
of machine learning in solid-state materials science. npj Computational Materials, 5(1):1–36, 2019.
[77]
Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,
2014.
[78]
Shufeng Kong, Dan Guevarra, Carla P Gomes, and John M Gregoire. Materials representation and transfer
learning for multi-property prediction. Applied Physics Reviews, 8(2):021409, 2021.
[79]
Shunya Minami, Song Liu, Stephen Wu, Kenji Fukumizu, and Ryo Yoshida. A general class of transfer learning
regression without implementation cost. In Proceedings of the AAAI Conference on Artificial Intelligence,
volume 35, pages 8992–8999, 2021.
[80]
Hong Wang, Yunchao Xie, Dawei Li, Heng Deng, Yunxin Zhao, Ming Xin, and Jian Lin. Rapid identification of
x-ray diffraction patterns based on very limited data by interpretable convolutional neural networks. Journal of
chemical information and modeling, 60(4):2004–2011, 2020.
[81]
Pascal Marc Vecsei, Kenny Choo, Johan Chang, and Titus Neupert. Neural network based classification of
crystal symmetries from x-ray diffraction patterns. Physical Review B, 99(24):245120, 2019.
[82]
Xingzhi Wang, Jie Li, Hyun Dong Ha, Jakob C Dahl, Justin C Ondry, Ivan Moreno-Hernandez, Teresa Head-
Gordon, and A Paul Alivisatos. Autodetect-mnp: An unsupervised machine learning algorithm for automated
analysis of transmission electron microscope images of metal nanoparticles. Jacs Au, 1(3):316–327, 2021.
[83]
Anthony Yu-Tung Wang, Steven K Kauwe, Ryan J Murdock, and Taylor D Sparks. Compositionally restricted
attention-based network for materials property predictions. Npj Computational Materials, 7(1):1–10, 2021.
[84]
Kishalay Das, Bidisha Samanta, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, and Niloy Ganguly.
Crysxpp: An explainable property predictor for crystalline materials. npj Computational Materials, 8(1):1–11,
2022.
[85]
Johannes Klicpera, Shankari Giri, Johannes T Margraf, and Stephan Günnemann. Fast and uncertainty-aware
directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115, 2020.
[86]
Dipendra Jha, Logan Ward, Arindam Paul, Wei-keng Liao, Alok Choudhary, Chris Wolverton, and Ankit
Agrawal. Elemnet: Deep learning the chemistry of materials from only elemental composition. Scientific reports,
8(1):1–13, 2018.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 17
Artificial Intelligence in Material Engineering A PREPRINT
[87]
Benjamin Sanchez-Lengeling, Carlos Outeiral, Gabriel L Guimaraes, and Alan Aspuru-Guzik. Optimizing
distributions over molecular space. an objective-reinforced generative adversarial network for inverse-design
chemistry (organic). 2017.
[88]
Bryce Meredig, Ankit Agrawal, Scott Kirklin, James E Saal, Jeff W Doak, Alan Thompson, Kunpeng Zhang,
Alok Choudhary, and Christopher Wolverton. Combinatorial screening for new materials in unconstrained
composition space with machine learning. Physical Review B, 89(9):094104, 2014.
[89]
Dipendra Jha, Logan Ward, Zijiang Yang, Christopher Wolverton, Ian Foster, Wei-keng Liao, Alok Choudhary,
and Ankit Agrawal. Irnet: A general purpose deep residual regression framework for materials discovery. In
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
pages 2385–2393, 2019.
[90]
Yabo Dan, Yong Zhao, Xiang Li, Shaobo Li, Ming Hu, and Jianjun Hu. Generative adversarial networks
(gan) based efficient sampling of chemical composition space for inverse design of inorganic materials. npj
Computational Materials, 6(1):1–7, 2020.
[91]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing
for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
[92]
Félix Musil, Michael J Willatt, Mikhail A Langovoy, and Michele Ceriotti. Fast and accurate uncertainty
estimation in chemical machine learning. Journal of chemical theory and computation, 15(2):906–915, 2019.
[93]
Rhys EA Goodall and Alpha A Lee. Predicting materials properties without crystal structure: Deep representation
learning from stoichiometry. Nature communications, 11(1):1–9, 2020.
[94]
Bart Olsthoorn, R Matthias Geilhufe, Stanislav S Borysov, and Alexander V Balatsky. Band gap prediction for
large organic crystal structures with machine learning. Advanced Quantum Technologies, 2(7-8):1900023, 2019.
[95]
Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John Gregoire, and Carla Gomes. End-to-end refinement
guided by pre-trained prototypical classifier. arXiv preprint arXiv:1805.08698, 2018.
[96]
Hiroshi Ohno and Yusuke Mukae. Machine learning approach for prediction and search: application to methane
storage in a metal–organic framework. The Journal of Physical Chemistry C, 120(42):23963–23968, 2016.
[97]
Yongchul G Chung, Emmanuel Haldoupis, Benjamin J Bucior, Maciej Haranczyk, Seulchan Lee, Hongda Zhang,
Konstantinos D Vogiatzis, Marija Milisavljevic, Sanliang Ling, Jeffrey S Camp, et al. Advances, updates, and
analytics for the computation-ready, experimental metal–organic framework database: Core mof 2019. Journal
of Chemical & Engineering Data, 64(12):5985–5998, 2019.
[98]
Rohit Batra, Le Song, and Rampi Ramprasad. Emerging materials intelligence ecosystems propelled by machine
learning. Nature Reviews Materials, 6(8):655–678, 2021.
[99]
James Kirkpatrick, Brendan McMorrow, David HP Turban, Alexander L Gaunt, James S Spencer, Alexan-
der GDG Matthews, Annette Obika, Louis Thiry, Meire Fortunato, David Pfau, et al. Pushing the frontiers of
density functionals by solving the fractional electron problem. Science, 374(6573):1385–1389, 2021.
[100]
Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez,
Timothy Hopper, Brian Kelley, Miriam Mathea, et al. Analyzing learned molecular representations for property
prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
[101]
Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks
and symbolic ai. Nature, 555(7698):604–610, 2018.
[102]
Kevin F Garrity and Kamal Choudhary. Database of wannier tight-binding hamiltonians using high-throughput
density functional theory. Scientific data, 8(1):1–10, 2021.
[103] Adam Culka and Jan Jehliˇ
cka. Identification of gemstones using portable sequentially shifted excitation raman
spectrometer and rruff online database: A proof of concept study. The European Physical Journal Plus,
134(4):130, 2019.
[104]
Yoolhee Kim, Edward Kim, Erin Antono, Bryce Meredig, and Julia Ling. Machine-learned metrics for predicting
the likelihood of success in materials discovery. npj Computational Materials, 6(1):1–9, 2020.
[105]
Bart Olsthoorn, R Matthias Geilhufe, Stanislav S Borysov, and Alexander V Balatsky. Band gap prediction for
large organic crystal structures with machine learning. Advanced Quantum Technologies, 2(7-8):1900023, 2019.
[106]
Grégoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre
Tkatchenko, Klaus-Robert Müller, and O Anatole Von Lilienfeld. Machine learning of molecular electronic
properties in chemical compound space. New Journal of Physics, 15(9):095003, 2013.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 18
Artificial Intelligence in Material Engineering A PREPRINT
[107]
Arindam Paul, Dipendra Jha, Reda Al-Bahrani, Wei-keng Liao, Alok Choudhary, and Ankit Agrawal. Chemixnet:
Mixed dnn architectures for predicting chemical properties using multiple molecular representations. arXiv
preprint arXiv:1811.08283, 2018.
[108]
June-Mo Yang, Ju-Hee Lee, Young-Kwang Jung, So-Yeon Kim, Jeong-Hoon Kim, Seul-Gi Kim, Jeong-Hyeon
Kim, Seunghwan Seo, Dong-Am Park, Jin-Wook Lee, et al. Mixed-dimensional formamidinium bismuth iodides
featuring in-situ formed type-i band structure for convolution neural networks. Advanced Science, page 2200168,
2022.
[109]
Bin Wu, Sangwoo Han, Kang G Shin, and Wei Lu. Application of artificial neural networks in design of
lithium-ion batteries. Journal of Power Sources, 395:128–136, 2018.
[110]
Yibo Li, Jianfeng Pei, and Luhua Lai. Structure-based de novo drug design using 3d deep generative models.
Chemical science, 12(41):13664–13675, 2021.
[111]
Zhuoran Qiao, Matthew Welborn, Animashree Anandkumar, Frederick R Manby, and Thomas F Miller III.
Orbnet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. The Journal of
chemical physics, 153(12):124111, 2020.
[112]
Magnus Röding, C Fager, A Olsson, C Von Corswant, E Olsson, and Niklas Loren. Three-dimensional
reconstruction of porous polymer films from fib-sem nanotomography data using random forests. Journal of
Microscopy, 281(1):76–86, 2021.
[113]
Wenjiang Huang, Pedro Martin, and Houlong L Zhuang. Machine-learning phase prediction of high-entropy
alloys. Acta Materialia, 169:225–236, 2019.
[114]
Cheng Wen, Yan Zhang, Changxin Wang, Dezhen Xue, Yang Bai, Stoichko Antonov, Lanhong Dai, Turab
Lookman, and Yanjing Su. Machine learning assisted design of high entropy alloys with desired property. Acta
Materialia, 170:109–117, 2019.
[115]
Yan Zhang, Cheng Wen, Changxin Wang, Stoichko Antonov, Dezhen Xue, Yang Bai, and Yanjing Su. Phase
prediction in high entropy alloys with a rational selection of materials descriptors and machine learning models.
Acta Materialia, 185:528–539, 2020.
[116]
Hagar I Labouta, Nasimeh Asgarian, Kristina Rinker, and David T Cramb. Meta-analysis of nanoparticle
cytotoxicity via data-mining the literature. ACS nano, 13(2):1583–1594, 2019.
[117]
Chun-Teh Chen and Grace X Gu. Effect of constituent materials on composite performance: Exploring design
strategies via machine learning. Advanced Theory and Simulations, 2(6):1900056, 2019.
[118]
Taihao Han, Nicholas Stone-Weiss, Jie Huang, Ashutosh Goel, and Aditya Kumar. Machine learning as a tool to
design glasses with controlled dissolution for healthcare applications. Acta Biomaterialia, 107:286–298, 2020.
[119]
K Wu, N Sukumar, NA Lanzillo, C Wang, Ramamurthy “Rampi” Ramprasad, R Ma, AF Baldwin, G Sotzing, and
C Breneman. Prediction of polymer properties using infinite chain descriptors (icd) and machine learning: Toward
optimized dielectric polymeric materials. Journal of Polymer Science Part B: Polymer Physics, 54(20):2082–2091,
2016.
[120]
Jack R Smith, Vladyslav Kholodovych, Doyle Knight, William J Welsh, and Joachim Kohn. Qsar models for
the analysis of bioresponse data from combinatorial libraries of biomaterials. QSAR & Combinatorial Science,
24(1):99–113, 2005.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 19
Artificial Intelligence in Material Engineering A PREPRINT
Supplementary Tables
Table 2: The first column of the table contains some of the latest AI, ML and DL models used for materials property prediction,second column indicating the type of
property being predicted, third column gives the details of the predicted property, fourth column gives the performance of the respective models in different parameters
and the final column categorises the model based on the techniques of that model.
Begin of Table
Model Type Property Type Properties to Predict Accuracy/Performance Models based on
Techniques
Active Learning
+ Roost+
MatGNN [56] Inverse Design
To discover new stable
Materials and
Semiconductor
discovered one high
bandgap material and
six semiconductor in
a specified range
Generative model +
Generative
adversarial network
ALIGNN [16] Electronic
Formation Energy (MP
dataset)
MAE
0.022ev/atom
Graph Neural
Network
Bandgap(MP dataset) 0.218 ev
Formation
Energy(JARVIS-DFT
dataset) 0.033ev/at
Bandgap(JARVIS-
DFT
dataset) 0.14 ev
Total energy 0.037 ev/at
Ehull 0.076 ev
Bandgap (MBJ) 0.31 ev
Voigt bulk 10.40 GPa
shear modulus 9.48 Gpa
Magnetic Moment 0.26 α|B
Spectroscopic Limited
Maximum Efficiency 4.52 No unit
Spillage 0.35 No unit
Kpoint-length 9.51Å
Plane-wave cutoff 133.8 eV
x(OPT) 20.40 No unit
y (OPT) 19.99 No unit
z(OPT) 19.57 No unit
x (MBJ) 24.05 No unit
y (MBJ) 23.65 No unit
z (MBJ) 23.73 No unit
(DFPT:elec+ionic) 28.15 No unit
Max. piezoelectric
strain coeff (dij) 20.57 CN−1
Max. piezo. stress
coeff (eij) 0.147 Cm−2
Exfoliation Energy 51.42 mev(atom)-1
Max. EFG 19.12 1021 Vm−2
avg. me 0.085 electron mass
unit
avg.mh 0.124 electron mass
unit
n-Seebeck 40.92μVK−1
n-PF 442.3μW(mK2)−1
P-Seebeck 42.42μVK−1
p-PF 440.26μW(mK2)−1
CGCNN[1] Electronic/ Elastic
Formation energy
MAE
0.039 ev/atom
GNN
Absolute Energy 0.072 ev/atom
Bandgap 0.388 ev
Fermi Energy 0.363 ev
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 20
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 2
Bulk moduli 0.054 log(Gpa)
Shear moduli 0.087 log(Gpa)
Poisson ratio 0.030 log(Gpa)
classification of metal for threshold 0.5 0.8
classification of
semiconductor 0.95
Conditional
Variational
Autoencoder
(CVAE) [68]
To design
concrete formulas
Global warming
Potential (GWP)
(of the formulas)
MAE 7.187
Generative Model
RMSE 9.374
R2 0.979
Acidification
Potential (AP)
(of the formulas)
MAE 0.019
RMSE 0.04
R2 0.974
Strength Predictor
Performance
(of the formulas)
MAE 4.457 (>=90 days)
RMSE 0.125 (>=90 days)
R2 0.789 (>=90 days)
CVAE +
3-D U-Net
segmentation
model [7]
Encode and Decode
3-D atomic position
and species
Segment the locations
of molecules for Single
Unit cell
99%
Generative Model +
CNN
Reconstruction 90%
Classification of
Species 66%
Nearest Atom Species
Prediction for Repeating
Unit cell 65.40%
Compositionally
Restricted
Attention-based
Network
(CrabNet) [83]
Electronic
Thermal
Elastic
Castelli perovskites
MAE
0.127
Graph Attention
Network (GAN)
Refractive Index 0.348
Shear modulus 0.092
Bulk Modulus 0.068
Experimental band gap
0.338
Exfoliation Energy 50.512
MP Formation Energy 0.077
MP Band gap 0.263
Phonon peak 53.341
Steels yield 91.748
AFLOW Bulk
Modulus 8.692
AFLOW Debye
Temperature 33.464
,AFLOW Shear
Modulus, 9.082
AFLOW Thermal
Conductivity, 2.318
AFLOW Thermal
Expansion 3.85E-06
AFLOW Band gap 0.301
AFLOW Energy per
atom, 0.093
Bartel Decomposition 0.063
Bartel Formation, 0.059
MP bulk modulus 11.209
MP Elastic anisotropy, 8.263
MP Energy above
convex hull, 0.089
MP Magnetic Moment
2.105
MP shear modulus, 12.787
OQMD Band gap 0.049
OQMD Energy per
Atom, 0.033
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 21
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 2
OQMD Formation
Enthalpy 0.031
OQMD Volume per
atom 0.277
Crystal
eXplainable
Property
Predictor
(CrysXPP)[84]
Crystal state,Elastic
Formation energy
MAE
0.086 ev/atom
GNN
Band Gap 0.467 ev
Fermi Energy 0.471 ev
Bulk moduli 0.08 log(Gpa)
Shear moduli 0.105 log(Gpa)
Poisson ratio 0.035 log(Gpa)
Deep Adaptive
Regressive
Weighted
Intelligent
Network
(DARWIN) [27]
Electronic
Band Gap (for Zn
based systems) MAE 0.9592ev Graph Convolutional
Neural Network
(GCNN)
Band Gap (for Cu
based systems) 0.1903ev
Band Gap (for Mg
based systems) 1.4995ev
DeeperGATGNN
[28] Physico-Chemical
Bulk Materials
Formation Energy
over previous
best model
34.97% ev/atom
GNN
Alloy Surface
Adsorption Energy 29.55%
Pt-cluster Total Energy
14.03%
2D Materials Work
Function 15.76%
MOF Band Gap 5.34%
Bulk Materials
Bandgap 5.42%
(DimeNet++)
[85]
Thermodynamic,
Electronic
Dipole Moment (mu)
MAE
0.0297 D
GNN
Electronic
Polarizability (alpha) 0.0435a03
HOMO 24.6meV
LUMO 19.5
Energy difference of
HOMO and LUMO 32.6meV
electronic spatial
extent <R2> 0.331a02
ZPVE 1.21meV
Internal Energy at 0k
(U0), 6.32meV
Internal Energy at
298K (U) 6.28meV
enthalpy at 298 K(H) 6.53meV
Gibbs free energy at
298 K(G), 7.56meV
heat capacity at 298 K
(Cv) 0.0230cal/mol K
Decision-Trees
(DT) [38]
Classification Task
Classify
materials
based on
spectroscopic
limited maximum
efficiency (SLME)
AUC
0.67
Classification Model
randomforest
(RF) [38] 0.79
K-nearest
neighbor (KNN)
[38] 0.77
Multi-layer
perceptron
(MLP) [38] 0.8
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 22
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 2
GBDT in scikit
learn
(SK-GB)[38] 0.84
GBDT in
XGBoost (XGB)
[38] 0.84
GBDT in
LightGBM
(LGB) [38] 0.87
ElemNet [86] Electronic Formation Enthalpy MAE 0.050±0.0007
ev/atom ———
Fully Connected
Neural Network
(FCNN)[87]
Thermoelectric
Property Thermoelectric power
factor MAE 20.70% FCNN
Heuristic and
Ensemble of
decision trees
[88]
Electronic Formation Energy
MAE(Decision
Trees) 0.16ev
—————R2(Decision
Trees) 0.93
MAE (heuristic) 0.12ev
R2(Heuristic) 0.95
Hierarchically
Interacting
Particle
Neural
Network
(HIP-NN) [33]
Quantum Chemical
Energy of Benzene,
MAE (for training
size=50k)
0.064 ± 0.002
kcal/mol Residual Neural
Network
Energy of
Malonaldehyde 0.094 ± 0.001
kcal/mol
Energy of Salicylic
Acid 0.195 ± 0.002
kcal/mol
Energy of Toluene 0.144 ± 0.004
kcal/mol
Hydra-GNN [48]Magnetic, Electronic
Multi-task
Learning(MTL),
Mixing Enthalpy(H)
Charge transfer(C)
Magnetic Moment
(M) RMSE
H= 7.54e−3 ±
8.70e−4
Multi-Task GCN
C= 6.77e−3 ±
3.59e−4 1
M= 04e−2 ± 4.94e−4
MTL, HC H=7.33e-3 ± 4.77e-4
C=7.36e-3 ± 3.23e-4
MTL, HM H=6.64e-3 ± 5.08e-4
M=1.02e-2
±
5:23e-4
STL(Single-Task
Learning) H H=1.02e-2 ± 1.16e-3
STL, M M=8.77e-3± 3.18e-4
STL, C C=5.94e-3 ± 4.39e-4
IRNET [89] Electronic and
Elemental
OQMD-C Formation
Enthalpy
MAE ev/atom
(for 17
Layer IRNET)
0.054
Residual Neural
Network
OQMD-C Bandgap 0.051
OQMD-C Energy per
atom 0.0696
OQMD-C Volume per
atom 0.415
MP-C Bandgap 0.363
MP-C density 0.348
MP-C
Enery-above-hull 0.091
MP-C Enery-per-atom 0.143
MP-C Total
magnetization 3.005
MP-C Volume 215.037
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 23
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 2
JARVIS-
STMnet
[12] Classification Task
Classification of five
lattice classes (square,
hexagon,
rhombus/centered-
rectangle, rectangle
and parallelo-
gram/oblique)
0.9
Convolutional Neural
Network (CNN)
MATGNN [90] Inverse design
Generates hypothetical
inorganic materials
84.5% chemically
valid samples out of
total generated
samples
Generative
adversarial network
Message Passing
Neural Network
(MPNN)(enn-
s2s)
[91]
Chemical Property mu,alpha,HOMO,
LUMO,Gap,R2,
ZPVE,U0,U,H,
G, Cv,Omega,
MAE(avg) 0.68 GNN
MPNN
(enn-s2s-ens5)
[91] 0.52
Moleculenet [59]
Quantum mechanical,
physical chemistry,
biophysical affinity ,
physiological
Properties on QM7
dataset MAE (on best
model) DTNN: 8.75
Different Variations
of GNN
Properties on QM7b
dataset MAE DTNN: 1.77a
Properties on QM8
dataset MAE MPNN: 0.0143
Properties on QM9
dataset MAE DTNN: 2.35
Properties on ESOL RMSE MPNN: 0.58
Properties on FreeSolv
RMSE MPNN: 1.15
Properties on
Lipophilicity RMSE GC: 0.655
Properties on PCBA AUC-PRC GC: 0.136
Properties on MUV AUC-PRC Weave: 0.109
Properties on HIV AUC-PRC GC: 0.763
Properties on BACE AUC-PRC Weave: 0.806
Properties on
PDBBind(FULL) RMSE GC: 1.44
Properties on BBBP AUC-PRC GC: 0.690
Properties on Tox21 AUC-PRC GC: 0.829
Properties on ToxCast AUC-PRC Weave: 0.742
Properties on SIDER AUC-PRC GC: 0.638
Properties on ClinTox AUC-PRC Weave: 0.832
Multiplex
Molecular Graph
Neural Network
(MXMNet) with
batch size(BS)
=128 and cut off
distance dg [64]
Quantum Chemical
mu, alpha,HOMO
,LUMO, Gap, R2,
ZPVE, U0, U, H, G,
Cv,Omega,
MAE(avg) 0.92% GNN
MT-CGCNN [21] Electronic
Multi-task Learning
(Formation Energy and
Bandgap) Improvement
over
CGCNN
8.30%
GNN
(Formation Energy and
Fermi Energy) 3.80%
(Bandgap and Fermi
Energy) 1.70%
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 24
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 2
(Formation Energy,
Bandgap, Fermi
Energy) 4.40%
Orbital Graph
Convolutional
Neural Network
(OGCNN) [22]
Electrical
Lanthanides-
formation
energy
MAE
0.06 ev/atom
GNN
Perovskites-formation
energy 0.05 ev/atom
MP-formation energy 0.03 ev/atom
MP-Band Gap 0.032 ev
MP-Fermi Energy 0.38 ev
OrbNet [92] Quantum
Mechanical
Total Energy for
Molecules and
Relative Conformer
Energy for Molecules
33% (improvemet
over best prior
model) Similar
accuracy as DFT
method
GNN
Roost
[Ensemble] [93] Electronic bandgap MAE 0.0241 ev GAN
RMSE 0.0871 ev
SOAP + SchNet
Model [94] Electronic Bandgap MAE 0.388 ev Kernel Ridge
Regression + CNN
UNet+proto-
DenseNet
[95]
To find refined
pattern _______ 80.05% CNN
End of Table
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 25
Artificial Intelligence in Material Engineering A PREPRINT
Table 3: Dataset details: This table contains the publicly available datasets (both experimental based and DFT based) available in the field of materials science and
engineering. The description column gives details about the databases and the last column provides the respective link of that dataset.
Category Database Name Description Link
Quantum Physi-
cal
JARVIS-DFT 3D
[34]
The database contains electrical, magnetic
and electro-magnetic properties of 40000
bulk and 1000 low-dimensional crystalline
materials
"https://www.nist.gov/programs-
projects/jarvis-dft"
Structural
Open Quan-
tum Materi-
als Database
(OQMD)[34]
Contains thermodynamic and structural
properties of 1,022,603 materials https://oqmd.org/
Mechanical and
Thermal
(AFLOWLIB)
[35]
The database consists of 3,528,653 material
compounds with over 733,959,824 electric,
elastic and thermal properties
http://www.aflowlib.org
Chemical Zinc [94]
A collection of commercially available
chemical compounds prepared for virtual
screening
https://zinc12.docking.org/
Crystallo-graphic
Inorganic Crys-
tal Structure
Database (ICSD)
[78]
Contains 262242 crystal structures of ele-
ments to Quaternary compounds as of May
2022
"https://icsd.products. fiz-karlsruhe.de/"
Structural NOMAD [79]
Contains ab initio electronic-structure data
from DFT and other methods
"https://nomad-lab.eu/index.
php?page=repo-arch"
Computational
Chemistry ioChem-BD [4]
A digital repository that deals with compu-
tational chemisty dataset https://www.iochem-bd.org/
Multivariate
Materials Exper-
iment and Anal-
ysis Database
(MEAD)[78]
"Contains raw and metadata obtained from
material synthesis and characterization ex-
periments also contains the property and
performance analysis of that data."
https://solarfuelshub.org/materials-
experiment-and-analysis-database
Multivariate
UCI Machine
Learning Reposi-
tory [68]
Is a collection of over 550 datasets
https://paperswithcode.com/dataset/uci-
machine-learning-repository
Structural and sur-
face morphology Crystalium [9] Contains surface properties of 145 crystals
of 74 elements available and Grain Bound-
ary properties of 58 crystals of 58 elements
http://crystalium.materialsvirtuallab.org/
Quantum Physi-
cal
Materials
Project(MP)[32]
It is consist of inorganic compounds, band
structures,molecules, nano porous materials
and their properties like Magnetic moment,
formation energy, energy above hull etc.
https://materialsproject.org
Structural
Crystallography
Open Database
(COD)[38]
an Open-access database containing crys-
tal structures of organic, inorganic, metal-
organic compounds and minerals
http://www.crystallography.net/cod/
Thermo-
dyanamics
SGTE Solid SUB-
stance (SSUB)
[17]
contains thermochemical property for about
4300 condensed or gaseous species https://www.sgte.net/en/neu
Quantum Chem-
istry QMOF[96]
Contains quantum-chemical properties of
MOF https://github.com/arosen93/QMOF
Metal–organic
framework
Reduced_HMOF
[96]
Contains 51,163 unique hypothetical MOFs
with genetic information https://mof.tech.northwestern.edu/
Metal–organic
framework
CoRE MOF-2019
[97]
Contains 3D porous MOFs that are directly
usable in molecular simulations or elec-
tronic structure calculations
https://zenodo.org/record/3370144-
#.Yr3UBnZBzIU
Energy Material
Hydrogen Stor-
age Materials
Database [39]
It contains 2722 hydrogen storage materi-
als with their composition and hydrogen
gravimetric capacity
http://surl.li/cejcd
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 26
Artificial Intelligence in Material Engineering A PREPRINT
Structural
Cambridge Struc-
ture Database
[[15]]
Contains one million 3-D structural data of
molecules
https://www.ccdc.cam.ac.uk/solutions/csd-
core/components/csd/
Structural and
Quantum Me-
chanical
NIMS (Mat-
Navi) Materials
database [98]
Contains properties of Polymer, Inorganic
and Matellic materials and computational
electronic structures of materials
https://mits.nims.go.jp/en/
Quantum Chem-
istry QM7[59]
A subset of GDB-13 dataset containing
7165 molecules http://quantum-machine.org/datasets/
Quantum Chem-
istry QM7b[59]
An extension of QM7 dataset by predicting
13 additional properties. http://quantum-machine.org/datasets/
Quantum Chem-
istry QM8 [59]
Contains electronic spectra and excited
space energy of small molecules https://moleculenet.org/datasets-1
Quantum Chem-
istry QM9 [99]
Gives Quantum Chemical Properties of
small organic molecules http://quantum-machine.org/datasets/
Quantum Chem-
istry
Free Solva-
tion Database
(FreeSolv) [59]
Contains hydration free energy of experi-
mental and calculated water solubility data
https://github.com/MobleyLab/FreeSolv
Quantum Chem-
istry ESOL[59] Solubility database of 1128 compounds
https://integbio.jp/dbcatalog/en/record-
/nbdc00440
Quantum Chem-
istry
Organic Mate-
rials Database
(OMDB)[94]
Contains electronic properties of organic
crystal structures https://omdb.mathub.io/dataset.
XRD DiffraNet [80]
Contains 25,000 labeled serial crystallogra-
phy diffraction images. https://arturluis.github.io/diffranet/
Molecular PubChem [59]
Contains chemical molecules and their ac-
tivities against biological assays https://pubchem.ncbi.nlm.nih.gov/
Quantum physical
OC20 [42]
Contains 1,281,040 DFT relaxations for ma-
terials, surfaces and adsorbates and their
molecular dynamics, randomly perturbed
and electronic structure analyses.
https://github.com/Open-Catalyst-
Project/ocp/blob/main/DATASET.md
Structural and
Surface morphol-
ogy
Warwick Elec-
tron Microscopy
Datasets [82]
Contains 19769 experimental scanning
transmission electron microscopy2 (STEM)
images, 17266 experimental transmission
electron microscopy2 (TEM) images and
98340 simulated TEM exit wavefunctions
in three different datasets.
https://github.com/Jeffrey-Ede/datasets
Solid-state
physics and
Synthesis
Text-mined
dataset of inor-
ganic materials
synthesis recipes
[50]
Contains 19,488 synthesis entries from
53,538 solid-state synthesis paragraphs that
uses using text mining and natural language
processing approaches
https://figshare.com/articles/dataset/solid-
state_dataset_2019-06-
27_upd_json/9722159/3
Molecular
GDB databases
[59]
Contains small organic molecules up to 13
atoms of C, N, O, S and Cl based on simple
chemical stability and synthetic feasibility
rules
https://pubs.acs.org/doi/10.1021/ci600423u
Structural and
physical
Concrete Com-
pressive Strength
[59]
A multivariate dataset containing concrete
compressive strength (MPa) for a given mix-
ture for a particular time was determined
from laboratory.
http://surl.li/cejcc
Physiology
and molecular
medicine
Clintox [59]
For a total of 1491 drug compounds, clin-
ical trial toxicity (or absence of toxicity)
FDA approval status are included.
https://lifesci.dgl.ai/api/data.html
Molecular LIT-PCBA [100]
From PubChem dataset, 149 dose-response
bioassays are included by removing false
positives and assay artifacts but keepings
ative and inactive compounds having simi-
lar molecular property.
https://drugdesign.unistra.fr/LIT-PCBA/
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 27
Artificial Intelligence in Material Engineering A PREPRINT
Bio Molecular
PDBBind
database [59]
Contains 23496 biomolecular complexes
with their binding affinity data http://www.pdbbind.org.cn/
Reactions
Reaxys
database[101]
It contains organic and organometallic reac-
tions https://www.reaxys.com/#/
Structural and
Surface morphol-
ogy
JARVIS_STM
[12]
Contains scanning tunneling microscope
(STM) images
https://jarvis.nist.gov/login?next=/jarvisstm/
Structural
Database of Wan-
nier tight bind-
ing Hamiltonians
(WTBH) [102]
Electronic band structure calculations of
WTBH https://github.com/usnistgov/jarvis
Chemical InfoChem[9]
Contains large number of known reactions
and molecules
https://www.deepmatter.io/about-
us/infochem
Chemical Citrination[45]
An experimental based dataset containing
chemical information of materials https://citrination.com/datasets
Physical Chem-
istry
Lipophilicity[
100
]
Contains chemical structure (SMILES) of
1,130 organic compounds and their n-
octanol/buffer solution distribution coeffi-
cients at pH 7.4
https://deepai.org/dataset/lipophilicity
BioPhysics
Maximum Unbi-
ased Validation
(MUV)[59]
A dataset selected from PubChem BioAs-
say that contains 17 different tasks of 90
thousand compounds
https://www.tubraunschweig.de/en/pharm-
chem/forschung/baumann/translate-to-
english-muv
BioPhysics HIV[59]
Contains 40000 compounds that have the
ability to inhibit HIV replication
https://data.unicef.org/resources/dataset/hiv-
aids-statistical-tables/
Physiology
The Blood–brain
barrier penetra-
tion (BBBP)[59]
A blood-brain barrier dataset for prediction
of barrier permeability https://github.com/theochem/B3DB
BioPhysics BACE [59]
Provides binding results for human b-
secretase 1 inhibitors
https://enamine.net/compound-
libraries/targeted-libraries/bace-library
Physiology Tox21 [59] Measures toxicity of 8014 compounds
https://paperswithcode.com/dataset/tox21-
1
Physiology ToxCast [59]
Contains toxicity of compounds for and
larger in size as compared to Tox21 https://tox21.gov/data-and-tools/
Physiology Cider[59]
Contains market drugs and their adverse
reactions http://sideeffects.embl.de/
Molecular Dy-
namics
COLL Dataset
[85]
Contains Molecular Collision configura-
tions
https://figshare.com/articles/dataset/COLL-
_Dataset_v1_2/13289165/1
Crystallography
and Spectroscopy RRUFF [103]
Contains Raman spectra, X-ray diffraction
& chemistry data for minerals https://rruff.info/
Chemical
Melting Point
Dataset[104]
contains melting point of around 8000
chemical structures.
https://old.datahub.io/dataset/open-
melting-point-data
Multivariate
Superconductivity
Dataset [104]
Contains Chemical Formula and relevant
features of about 21263 superconductors.
https://archive.ics.uci.edu/ml/datasets/super-
conductivity+data
Thermoelectric
UCSB Thermo-
electrics dataset
[104]
The dataset contains Electrical conductiv-
ity, Power factor and Seebeck coefficient of
thermoelectric materials
https://citrination.com/datasets/150557/sh-
ow_files
Electronic
dataset of
Strehlow and
Cook [104]
Contains energy bandgaps of semiconduc-
tors and Insulators
https://srd.nist.gov/jpcrdreprint/1.3253115-
.pdf
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 28
Artificial Intelligence in Material Engineering A PREPRINT
Table 4: The table contains the work done in the field of five different types of materials, Inorganic, Organic, Energy-storage,drug and Pharmaceutical and Bio materials.
The frameworks used in the fields along with a brief description of the work are also mentioned in the table label
Begin of Table
Material Type Reference Model/Framework Name Description
Inorganic
Materials
[66] Fourier Transformed Crystal Properties
(FTCP)
The model is used to predict the structure and
chemistry of inorganic crystals for some targeted
property
[46] Modified CGCNN Local energy prediction of inorganic materials is
performed using Graph Neural Network
[81] Convolutional Neural Network + Deep
Dense Network
Artificial Intelligence based prediction of Space
Group Determination problem for powdered XRD
pattern of inorganic non-magnetic materials is
carried out.
[46] CGCNN
CGCNN model is used in inorganic crystals of
different compositions and different structures for
finding elemental and local environment
similarities
[67] Conditional GAN and Conditional VAE Inverse design framework is used to generate
inorganic compositions from desired property,
without crystal structure informaton
[14] Random Forest Random Forest ML algorithms are used for
finding critical temperature of inorganic crystals
[50] Artificial Neural Network and Gaussian
Processes
For finding a relationship between reaction
condition and resulting spectra of inorganic nano
particles, Artificial Neural Network and Gaussian
Processes have been used.
[90] Generative Machine Learning
Model(MatGAN) A GAN is developed for learning the chemical
compositions of inorganic materials
[89] IRNET Residual connection is used for predicting
formation enthalpy of inorganic materials
Organic Materials
[87] ORGANIC A framework named as ORGANIC , which is a
Objective-reinforced Generative model is designed
and applied in organic photovoltaic materials.
[15] ALIGNN The ALIGNN Model is applied in Metal-Organic
Freamework (MOF) materials to screen the MOFs
that helps in reduction of CO2
[96] Gaussian Process Regression, Support
Vector Regression, Neural Network
Different ML approaches are explored to find out
the relationship of structure of MOF and their
methane uptake
[44] Preffered Potential (PFP) A neural network potential is developed and
applied in MOF for molecular adsobption
[92] Gaussian Process Regression (GPR)
An uncertainty prediction scheme for ML models
is developed and tested in chemical shielding of H
NMR in organic crystals and formation energy
prediction of small organic molecules.
[105] SOAP model with kernel ridge regression
and Schnet
SOAP kernel and SchNet models are used for
bandgap prediction of crystal structures of Large
Organic molecules
[106]
Multi-task Deep Artificial Neural Network
ML model is used for prediction of electronic
ground state property and excited state properties
of organic molecules
[107] ChemMIxNet ChemMIxNet model is developed and applied on
water solubility dataset of small organic molecules
Energy Storage
Materials
[107] ChemMIxNet A ML Model CheMixNet is used to predict
HOMO value of Organic photovoltaic cells
[40] MegNet Designed a GNN named as MegNet to compute
the atomic force and stress tensor of unit cell of
battery materials.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 29
Artificial Intelligence in Material Engineering A PREPRINT
Continuation of Table 4
[38] Gradient Boosting Decision Tree (GBDT)
ML models are used for classifying solar absorber
materials based on spectroscopic limited
maximum efficiency (SLME)
[37] Linear Regression,Reduced Error Pruning
(REP) tree,Rotation forest+REP tree,
Random Subspace+REP tree
ML model is designed for Bandgap prediction of
solar cell materials
[39] Rep Tree, Random Forest Regression and
Neural Networks ML framework is used for predicting high
pressure alloys that can store hydrogen.
[108] Convolutional Neural Network CNN model is used to recognize mixed
dimensional (2D-0D) Formamidinium Bismuth
Iodides as they are useful in energy consumption
[42] CGCNN, Schnet, Dimenet++ Different state-of-the art Graph Neural Networks
are applied on a catalyst dataset.
[109] Artificial Neural Network as classifier and
calculator
Neural Networks are used for lithium-ion battery
design that can reduce the computational time by
orders of magnitude.
Drug &
Pharmaceutical
Materials
[107] ChemMIxNet CheMixNet model is applied on the compounds
that has the ability to slow down HIV replication
in vitro study
[59] Moleculenet The ML framework Moleculenet is designed and
used on properties of drug molecules.
[110] Densenet
A generative model ,Densenet is used to design de
Novo drug.
[62] KV-PLM The model named KV-PLM is designed to assist
in the discovery of drug.
[111] OrbNet
The deep learning model Orbnet is designed and
applied in drug dataset that showed prediction
accuracy same as DFT with reduced
computational cost.
[112] Random Forest Classifier Random Forest classifer is used for segmenting
polymer blends that are used for pharmaceutical
tablets.
Bio Materials
[113] K-nearest neighbour, support vector
machine(SVM), ANN
ML algorithms are used for phase selection of
High Entropy Alloys (HEA) that is used for
medical implant
[114]
Linear Regression,polynomial Regression
model,SVM with three different kernel
functions, regression tree model,k-nearest
neighbour model, and ANN with
backpropagation
Employed ML surrogate models combined with
experimental methods for finding High Entropy
Alloy (HEA) for targeted property.
[115] Nine different Classification Models
With the help of a Genetic algorithm, suitable ML
model and desciptors are selected for Phase
classification of HEA
[116] Decision Tree Decision tree classifiers are used for analysing
Cytotoxicity of nano-particles based on cell
viability
[117] Linear Model, Non-linear Model and
Convolutional Neural Network (CNN) ML algorithms are employed on biological
composites to explore the mechanical properties
[118] Random Forest ensembled with Additive
Regression (RF_AR)
Ensemble methods are used in designing bio glass
[119] Support Vector Machine Support Vector Machine is used for screening the
dielectric polymers
[120] ANN models surrogate models are used for surface properties
such as protein adsoption prediction and cellular
response for biodegradable polymers.
End of Table
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 30
Artificial Intelligence in Material Engineering A PREPRINT
Biography
Dr. Mohendra Roy is an expert in the area of artificial intelligence and experienced in developing biomedical
sensors/devices. He is currently serving as an assistant professor at the school of technology of Pandit Deendayal
Energy University (PDEU), India. He was a postdoctoral fellow at Delta-NTU Corporate Laboratory for Cyber-Physical
Systems of Nanyang Technological University, Singapore. He received his Ph.D. in electronics and information
engineering from Korea University, South Korea. He did his master’s in bioelectronics as well as physics from Tezpur
Central University, India. He has published several high-quality research papers in IEEE transactions, Biosensors and
Bioelectronics Journal, Sensors and Actuators B, etc.
Copyright: Adv. Eng. Mater. 2023, https://onlinelibrary.wiley.com/doi/10.1002/adem.202300104 Page 31
A preview of this full-text is provided by Wiley.
Content available from Advanced Engineering Materials
This content is subject to copyright. Terms and conditions apply.