ArticlePDF Available

Abstract and Figures

Elpasolite is the predominant quaternary crystal structure (AlNaK$_2$F$_6$ prototype) reported in the Inorganic Crystal Structure Database. We have developed a machine learning model to calculate density functional theory quality formation energies of all the 2 M pristine ABC$_2$D$_6$ elpasolite crystals which can be made up from main-group elements (up to bismuth). Our model's accuracy can be improved systematically, reaching 0.1 eV/atom for a training set consisting of 10 k crystals. Important bonding trends are revealed, fluoride is best suited to fit the coordination of the D site which lowers the formation energy whereas the opposite is found for carbon. The bonding contribution of elements A and B is very small on average. Low formation energies result from A and B being late elements from group (II), C being a late (I) element, and D being fluoride. Out of 2 M crystals, the three degenerate pairs CaSrCs$_2$F$_6$/SrCaCs$_2$F$_6$, CaSrRb$_2$F$_6$/SrCaRb$_2$F$_6$ and CaBaCs$_2$F$_6$/BaCaCs$_2$F$_6$ yield the lowest formation energies: $-3.44$, $-3.41$, and $-3.39$ eV/atom, respectively. In crystals with large negative formation energies unusual atomic oxidation states have been discovered for Sb and Te.
Content may be subject to copyright.
Machine Learning Energies of 2 M Elpasolite (ABC2D6) Crystals
Felix Faber,1Alexander Lindmaa,2O. Anatole von Lilienfeld,3, and Rickard Armiento2,
1Institute of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials, Department of Chemistry, University of Basel.
2Department of Physics, Chemistry and Biology,
Link¨oping University, SE-581 83 Link¨oping, Sweden.
3Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials,
Department of Chemistry, University of Basel, Switzerland.
(Dated: September 7, 2015)
Elpasolite is the predominant quaternary crystal structure (AlNaK2F6prototype) reported in
the Inorganic Crystal Structure Database. We have developed a machine learning model to calcu-
late density functional theory quality formation energies of all the 2M pristine ABC2D6elpasolite
crystals which can be made up from main-group elements (up to bismuth). Our model’s accuracy
can be improved systematically, reaching 0.1 eV/atom for a training set consisting of 10 k crys-
tals. Important bonding trends are revealed, fluoride is best suited to fit the coordination of the
D site which lowers the formation energy whereas the opposite is found for carbon. The bonding
contribution of elements A and B is very small on average. Low formation energies result from A
and B being late elements from group (II), C being a late (I) element, and D being fluoride. Out
of 2 M crystals, the three degenerate pairs CaSrCs2F6/SrCaCs2F6, CaSrRb2F6/SrCaRb2F6and
CaBaCs2F6/BaCaCs2F6yield the lowest formation energies: 3.44, 3.41, and 3.39 eV/atom,
respectively. In crystals with large negative formation energies unusual atomic oxidation states have
been discovered for Sb and Te.
Elpasolite (AlNaK2F6) is a glassy, transparent, lus-
ter, colorless, and soft quaternary crystal in the Fm3m
space group which can be found in the Rocky Moun-
tains, Virginia, or the Apennines. The elpasolite crystal
structure (See Fig. 1) is not uncommon, it is the most
abundant prototype in the Inorganic Crystal Structure
Database [1, 2]. Some elpasolites emit light when exposed
to ionic radiation, which makes them interesting mate-
rial candidates for scintillator devices [3, 4]. One could
use first-principle methods such as density functional the-
ory (DFT) [5, 6] to computationally predict the existence
and basic properties of every elpasolite. Unfortunately,
even when considering crystals composed of only main
group elements (columns I to VIII) the sheer number
of all the 2 M possible combinations makes DFT based
screening challenging—if not prohibitive. Recently, com-
putationally efficient machine learning (ML) models were
introduced for predicting molecular properties with the
same accuracy as DFT [7, 8]. Requiring only millisec-
onds per prediction, they represent an attractive alter-
native when it comes to the combinatorial screening of
millions of crystals. While some ML model variants have
already been proposed for solids [9–11], a generally ap-
plicable ML-scheme with DFT accuracy of formation en-
ergies is still amiss. In this Letter we introduce a newly
developed ML model which we use to investigate the for-
mation energies of all 2 M elpasolites made from all
main-group elements up to Bi. Resulting estimates are
used to identify a new elemental order of descending el-
pasolite formation energy, crystals with peculiar atomic
charges, as well as 250 elpasolites with lowest formation
energies. The ML model achieves DFT accuracy or bet-
ter, and can be generalized to any crystalline material.
The ML-model is based on kernel ridge regression [12]
which maps the non-linear energy difference between the
actual DFT energy and an inexpensive approximate base-
line model into a linear feature space [13]. More specifi-
cally, we construct a ML model of the energy difference
to the sum of static, atom-type dependent, atomic energy
contributions It, obtained through fitting of each atom
type tin all main group elements up to Bi. The energy-
predicting function is a sum of weighted exponentials in
similarity dbetween query and training crystal,
E(x) =
It +
where N0is the number of atoms/unit cell (10 in the case
of elpasolites), and the second sum runs over all Ntrain-
ing instances. αiare the weights obtained through linear
regression, and σis the global exponential width, regu-
lating the length scale of the problem. The similarity di
is the Manhattan distance, i.e., di=kxxik1. While
various crystal structure representations have previously
been proposed [9–11, 14], we have found the following
representation to yield superior performance: xis a n×2
tuple that encodes any stoichiometry within a given crys-
tal prototype. For quaternary (n= 4) elpasolites, each
x14refers to the 4 representative sites, the atom type
for each site is represented by its row (principal quantum
number 2 to 6) and column (number of valence electrons)
I to VIII in the periodic table, and sites are ordered ac-
cording to the Wyckoff sequence of the crystal. As such,
ximplicitly represents the global energy minimum struc-
ture for a system restricted to this prototype—without
explicitly encoding precise coordinates, lattice constants,
arXiv:1508.05315v1 [physics.chem-ph] 21 Aug 2015
FIG. 1. (a) Illustration of elpasolite crystal (AlNaK2F6
structure). The four-tuple x= (x1,...,x4) representation
of atomic sites is specified. (b) Frequency of elements (de-
fined by nuclear charge Z) for the three data sets studied. (c)
Mean absolute out-of-sample prediction error as a function of
training set size for the three data sets studied. Inset: Error
distributions and DFT vs. ML scatter plots for three train-
ing set sizes for the (IVIII) data set. (d) Estimated mean
energy contribution of each element to formation of any el-
pasolite crystal. The color code reflects the new elemental
elpasolite order. (e) Lowest 250 ML model predicted forma-
tion energies of elpasolites in ascending order from (IIIVI)
(TOP) and (IVIII) (MID and BOTTOM) data sets. Results
in TOP and MID panel correspond to ML models trained on
2000 examples, BOTTOM panel results correspond to a ML
model trained on 10k crystals. Validating DFT energies are
shown aside. (f) Distributions of absolute lowest possible to-
tal oxidation states (LPTOS) in energies. Formulas indicate
the lowest lying crystals.
or other (approximate) solutions to Schr¨odinger’s equa-
tion. This representation is not restricted to the elpaso-
lite structure, it can be used for any crystalline configu-
ration: Below we also briefly discuss test results for small
size ML models applied to ternary crystals.
For training and evaluation, we have generated DFT
data for two data sets of elpasolites, one small, (IIIVI),
made up from only 12 elements, C, N, O, Al, Si, P, S,
Ga, Ge, As, Sn, and Sb; and one large, (IVIII), con-
taining all main-group elements up to Bi. Since (IIIVI)
only comprise 12 k possible permutations, we have used
DFT to obtain a complete list of formation energies (for
computational details see Ref. 15). (IVIII) consists of
10 k structures, i.e. 0.5% of the total number of 2 M pos-
sible crystals. The (IVIII) data set has been generated
through random selection of elpasolites while ensuring
an unbiased composition. To verify that the ML model
is general and not only restricted to elpasolites, we have
also included a materials project [16] dataset (MPD) con-
sisting of 0.5k ternary crystals in ThCr2Si2(I4/mmm)
prototype and made up of 84 different atom types. The
distribution of the chemical elements in the data sets
are shown in Fig. 1(b). Numerical results on display in
Fig. 1(c) indicate systematic improvement of the predic-
tive accuracy of the ML model with increasing training
set size, for all three datasets. The inset details normally
distributed errors and scatter plots which systematically
improve with training set size for (IIIVI) and (IVIII)
machine. The accuracy of our ML model can be com-
pared to that of semi-local DFT as used in our data sets.
Lany [17] reported prediction errors for heats of for-
mation for general chemistries with filled d-shells which
(assuming normal distributions) translates to a MAE of
at least 0.19 eV/atom [18]. For transition metal oxides,
results of Hautier et al. correspond to MAEs of at least
0.055 eV/atom (0.019 for DFT+U), but such errors are
expected to increase when going beyond oxides, as in our
datasets. For a training set of 10k, our ML model reaches
a MAE of 0.1 eV/atom, which is roughly at the level of
accuracy of semi-local DFT formation energies.
The converged performance for using all crystals of the
(IIIVI) data set as training set confirms that our repre-
sentation captures all the information of a crystal neces-
sary to determine its energy. While errors decay mono-
tonically, the learning rate levels off for the (IIIVI) data
set when Napproaches 10k. This is due to the em-
ployed relaxation threshold in the DFT calculations of
±10 meV/atom. Any inductive model will obviously fail
to go below this level, and only numerically more precise
reference numbers would mitigate this issue. In all vali-
dation tests dealing with energy predictions for random
out-of-sample crystals, the ML model performance meets
the expectations set in Fig. 1(c). For example, drawing
100 crystals at random from (IIIVI) and (IVIII), ML
models perform as expected when compared to the result
from validating DFT calculations (cf. Fig. S1 of Ref. 15).
Having established the performance of the ML model,
we have subsequently used the 10 k training set model
(IVIII) for investigation of the elpasolite universe. Es-
timated formation energies for all 2 M elpasolites are fea-
tured in Fig. 2. The formation energies are clearly domi-
nated by the chemical identity of position 4, followed by
position 3 but according to a different pattern. Chemical
identity at position 1 and 2 has the smallest influence
and very similar impact. (also illustrated in Fig. S2 of
Ref. 15.) Due to the effective degeneracy of positions 1
and 2, all inner matrices in Fig. 2 appear largely sym-
metric. Figure 1(d) shows the average contribution of
each element to the formation energies estimated by the
10 k ML model. These average contributions per ele-
FIG. 2. Formation energies for all 2M elpasolites made up of all main-group elements up to Bi predicted by the 10k ML-model.
The outer vertical and horizontal axis correspond to x4and x3symmetry position, respectively. Inner vertical and horizontal
axis correspond to x2and x1symmetry position, respectively. Elemental sequence follows the elpasolite order of Fig. 1(d).
White pixels correspond to subspaces of ternary, binary, or elementary non-elpasolite crystals.
ment are used to order the elements in Fig. 2 to yield the
smoothest elpasolite map. Arranging elements by their
nuclear charge, or by their Pettifor order [19], results in
a much more oscillatory map or stripe-like pattern due
to underlying periodicities (cf. Fig. S3 in Ref. 15).
Figures 1(d) and S2 visualize the bonding emergent
from the geometry and bond coordination of the elpa-
solite crystal structure. Fluorine and carbon are at the
respective ends of the global scale of low and high for-
mation energies. But also alkaline metals, alkaline earth
metals, and oxygen contribute to lowering the formation
energy. On average, the formation energies of elpasolites
involving halogens, alkaline metals, noble gases increase
as the periodic table is descended. The opposite holds
for all other elements, except oxygen, boron, carbon and
nitrogen, which all have a noticeably higher average for-
mation energy than any other element. A saddle point
can also be observed in the midst of the periodic table
table as well as two valleys along the halogen and al-
kaline earth rows. Site-specific resolution indicates that
fluorine fits best with the bond coordination of sites 1,
2, and 4, whereas the same does not apply to later halo-
gens (not shown in the paper, see Fig. S2 of Ref. 15).
In contrast, as the element on site 3 goes down column
II in the periodic table, the formation energy is succes-
sively lowered, with Ca, Sr, and Ba contributing more
than any halogen atom. On sites 1 and 2, the forma-
tion energy generally increases the most for heavy noble
gases. On sites 3 and 4, it is carbon, followed by neigh-
boring B and N that increase the formation energy the
most. The accuracy of linear single atom energy mod-
els based on these scales, however, is not on par with
the ML-model, and—maybe more importantly—cannot
be improved systematically through increasing training
set sizes but rather converges to a finite residual error.
In order to achieve satisfying accuracy of ±0.1
eV/atom for elpasolites, a relatively large training set of
10 k is needed. This is likely due to the sparsity of crys-
tals at the opposite ends of the high and low formation
energy spectrum; this results in a decreased predictive
ML model accuracy for crystals in these regions. Nev-
ertheless, the 10 k ML model readily identifies a larger
set of lowest lying elpasolites for which the actual DFT
minima can be obtained through subsequent DFT based
screening. This is shown in Fig. 1(e) where the 250
crystals with the lowest ML predicted formation ener-
gies are shown in ascending order (with further details
on these systems given in Table III of Ref. 15.) Sub-
sequent screening with DFT indicates the 26th crystal
CaSrCs2F6(out of 2M) to be the global formation en-
ergy minimum at 3.44 eV/atom, closely followed a near-
degenerate isomer SrCaCs2F6. The DFT energies of the
next two degenerate pairs CaSrRb2F6/SrCaRb2F6and
CaBaCs2F6/BaCaCs2F6correspond to 3.41, and 3.39
eV/atom, respectively. Overall, the elpasolites with the
most favorable formation energies, ABC2D6, correspond
to A and B being late elements from group (II), and C
and D being a late element from group (I) and fluoride,
respectively. Populating the four sites with elements from
groups (II),(II),(I), and (VIII), respectively, differs from
the experimentally established stoichiometry AlNaK2F6.
In fact, the lowest DFT energy crystal with a group-
(III) element is CsAlRb2F6(in 69th position) with 3.09
eV/atom (ML energy: 2.96 eV/atom, see Table III).
We have also used our predictions to analyse atomic ox-
idation states in elpasolites. In particular, we have found
that roughly 6 % of the crystals with formation ener-
gies below 1 eV/atom exhibit unusual atomic charges:
They are low in energy despite the fact that no combi-
nation of conventional atomic charges would result in a
TABLE I. Calculated atomic charges for the 10 lowest for-
mation energy crystals with non-zero LPTOS. ML and DFT
energies in eV/atom. Values for elements of unusual oxidation
states are printed in bold.
Formula LPTOS EML EDFT q1q2q3q4
MgSbBa2F61 -2.88 -2.70 1.66 0.42 1.63 -0.89
CaTeBa2F61 -2.90 -2.68 1.58 0.31 1.67 -0.87
TeCaBa2F61 -2.83 -2.68 0.31 1.59 1.67 -0.87
LiSbBa2F62 -3.06 -2.62 0.89 1.06 1.62 -0.86
CsMgRb2F61 -2.93 -2.61 0.98 1.67 0.92 -0.75
BeSbBa2F62 -2.88 -2.60 1.68 0.35 1.62 -0.88
CsMgK2F61 -2.97 -2.58 1.01 1.68 0.92 -0.75
SrSbBa2F62 -2.90 -2.56 1.48 0.60 1.59 -0.88
SrTeBa2F62 -2.89 -2.55 1.70 0.40 1.66 -0.90
BaNaCs2F61 -2.84 -2.52 1.69 0.86 0.92 -0.73.
neutral system. In order to identify these crystals, we
have used the absolute value of the lowest possible total
oxidation state (LPTOS) that could possibly be realized
using a list of typical atomic oxidation states on display
in Table I. The lowest lying crystals have a LPTOS of 0
(3 to 3.44 eV/atom formation energies). However, al-
ready at 3 eV/atom crystals with LPTOS of 2 or 1 start
to occur. At formation energies of ∼ −1.25 eV/atom
and higher, the number of crystals with non-zero LP-
TOS increases rapidly, with LPTOS as high as 12. Cor-
responding crystal frequency distributions are shown in
Fig. 1(e), along with formulas for the mutually lowest
lying crystals. Interestingly, the number of crystals with
zero LPTOS increases monotonically with formation en-
ergy, while for nonzero LPTOS crystals the distribution
is oscillatory. In order to identify elements with unusual
oxidation states we report atomic charges obtained ac-
cording to Bader’s scheme [20–22] in Table I for the 10
lowest lying crystals with non-zero LPTOS. We found the
Bader analysis to indicate atomic charges consistent (af-
ter rounding to the next integer) with the conventional
oxidation states in Table I for 95% of the 250 lowest ly-
ing crystals with zero LPTOS. Not surprisingly, due to its
strong electronegativity, F always conserves its negative
oxidation state of -1 in x4position. For CsMgRb2F6,
CsMgK2F6, or BaNaCs2F6, no unusual atomic charge is
found, the non-zero LPTOS being rather due to the ac-
cumulation of relatively low charges on the six fluoride
atoms. For the remaining seven crystals, however, un-
usual charges are found for atoms late in the periodic
table and populating the energetically weakly contribut-
ing x1or x2sites. In particular, Bader’s charge analysis
indicate unusual oxidation states for elements Sb (0;1)
and Te (0), suggesting that new chemistries could be ex-
plored for compounds involving these elements.
In conclusion, we have developed and used ML-models
of formation energies to investigate all possible elpasolites
made up of main-group elements. We have presented nu-
merical results for 2 M formation energies. The ML-
model is only implicitly dependent on spatial coordinates,
through reference data used for training. No spatial co-
ordinates are needed for new queries, yet for a training
set of 10 k crystals the model reaches ±0.1 eV/atom—
comparable to DFT accuracy for solids. The results have
been used to identify the most strongly bound elpasolites
as well as to investigate energy and bonding trends at
crystal structure sites, leading to a new “elpasolite order”
of elements, consistent with the bonding physics in the el-
pasolite crystal structure. Crystals with lowest lying for-
mation energies have been identified, and using Bader’s
charge analysis, Te and Sb have been found to exhibit
unconventional oxidation states. We believe that our re-
sults hold great promise for the computational screening
of polymorphs, other crystal structure symmetries, solid
mixtures, phase transitions, or defects at unprecedented
rate and extent. Other crystal properties than energies
could also be considered.
The authors thank G. Hart, R. Ramakrishnan and
R. Ramprasad for comments. O.A.v.L. acknowledges
funding from the Swiss National Science foundation
(No. PP00P2 138932). This material is based upon work
supported by the Air Force Office of Scientific Research,
Air Force Material Command, USAF under Award No.
FA9550-15-1-0026. R.A. acknowledges funding by the
Swedish Research Council Grant No. 621-2011-4249 and
Linnaeus Environment grant (LiLi-NFM). Calculations
have been performed at the Swedish National Infrastruc-
ture for Computing (SNIC).
[1] A. Belsky, M. Hellenbrandt, V. L. Karen, and P. Luksch,
Acta Crystallographica Section B Structural Science 58,
364 (2002).
[2] G. Bergerhoff, R. Hundt, R. Sievers, and I. D. Brown,
Journal of Chemical Information and Computer Sciences
23, 66 (1983).
[3] P. Yang, F. P. Doty, M. A. Rodriguez, M. R. Sanchez,
X. Zhou, and K. S. Shah, in Symposium L Nuclear
Radiation Detection Materials 2009 , MRS Online Pro-
ceedings Library, Vol. 1164 (2009).
[4] K. Biswas and M.-H. Du, Phys. Rev. B 86, 014102 (2012).
[5] P. Hohenberg and W. Kohn, Phys. Rev. 136, B864
[6] W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965).
[7] M. Rupp, A. Tkatchenko, K.-R. M¨uller, and O. A. von
Lilienfeld, Phys. Rev. Lett. 108, 058301 (2012).
[8] G. Montavon, M. Rupp, V. Gobre, A. Vazquez-
Mayagoitia, K. Hansen, A. Tkatchenko, K.-R. M¨uller,
and O. A. von Lilienfeld, New Journal of Physics 15,
095003 (2013).
[9] K. T. Sch¨utt, H. Glawe, F. Brockherde, A. Sanna, K. R.
uller, and E. K. U. Gross, Phys. Rev. B 89, 205118
[10] B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W.
Doak, A. Thompson, K. Zhang, A. Choudhary, and
C. Wolverton, Phys. Rev. B 89, 094104 (2014).
[11] F. Faber, A. Lindmaa, O. A. von Lilienfeld, and
R. Armiento, International Journal of Quantum Chem-
istry 115, 1094 (2015).
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Ele-
ments of Statistical Learning: Data Mining, Inference,
and Prediction, Second Edition, 2nd ed. (Springer, New
York, 2011).
[13] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von
Lilienfeld, Journal of Chemical Theory and Computation
11, 2087 (2015).
[14] L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl,
and M. Scheffler, Phys. Rev. Lett. 114, 105503 (2015).
[15] See Supplemental Material at [URL will be inserted by
publisher] for computational details, and an extended set
of figures.
[16] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards,
S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder,
and K. A. Persson, APL Materials 1, 011002 (2013).
[17] S. Lany, Phys. Rev. B 78, 245207 (2008).
[18] R. C. Geary, Biometrika 27, 310 (1935).
[19] D. Pettifor, Bonding and Structure of Molecules and
Solids (Oxford university press, 2002).
[20] W. Tang, E. Sanville, and G. Henkelman, Journal of
Physics: Condensed Matter 21, 084204 (2009).
[21] E. Sanville, S. D. Kenny, R. Smith, and G. Henkelman,
Journal of Computational Chemistry 28, 899 (2007).
[22] G. Henkelman, A. Arnaldsson, and H. J`onsson, Compu-
tational Materials Science 36, 354 (2006).
[23] e. a. R. Armiento, The High-Throughput Toolkit (httk),
[24] P. E. Bl¨ochl, Physical Review B 50, 17953 (1994).
[25] G. Kresse and D. Joubert, Physical Review B 59, 1758
[26] G. Kresse and J. Furtm¨uller, Vienna Ab Initio Simula-
tion Package, Users Guide (The University of Vienna,
Vienna, 2007).
[27] J. P. Perdew, K. Burke, and M. Ernzerhof, Physical
Review Letters 77, 3865 (1996).
[28] H. J. Monkhorst and J. D. Pack, Physical Review B 13,
5188 (1976).
[29] K. Lejaeghere, V. Van Speybroeck, G. Van Oost, and
S. Cottenier, Critical Reviews in Solid State and Materi-
als Sciences 39, 1 (2014).
Supplemental Materials
The crystals where processed using the high-
throughput toolkit [23] to relax the structures and calcu-
late the primitive unit cell energies using DFT as imple-
mented in the Vienna ab-initio simulation package (vasp
5.2.2) with projector augmented wave pseudopotentials
(PAWs) [24–26] and the exchange correlation functional
of Perdew, Burke, and Ernzerhof [27]. The calculations
were done by a first low-accuracy relaxation of the cell
volume and internal degrees of freedom, followed by re-
peated restarting of vasp relaxation runs until the final
energy difference is below 10 meV/atom. The calcula-
tions use a Monkhorst-Pack [28] k-point grid of at least
3x3x3, and an energy cutoff of the plane-wave basis of
600 eV. The formation energies relative to the phase di-
agram end-points are obtained as the difference between
the unit cell energy per atom and that of the stoichio-
metric combination of the ground state phase of the pure
elements calculated the same way [29].
FIG. S1. Calculated ML and DFT formation energies of 100 elpasolites drawn at random from (IIIVI) (TOP) and (IVIII)
(MID & BOTTOM) data sets. TOP and MID ML models have been trained on 500 crystals; the ML model used for the
BOTTOM panel has been trained on 10 k crystals.
FIG. S2. Site resolved mean contribution to elpasolite forma-
tion energy [eV/atom] for each main-group element. Panels
(a), (b), (c), and (d) correspond to respective elpasolite crys-
tal sites x1, x2, x3, and x4[see Fig. 1(a)].
FIG. S3. ML predicted energies of all 2 M elpasolites. The outer vertical axis corresponds to crystals in x4symmetry position,
the outer horizontal axis to x3, the inner vertical axis x2and the inner horizontal axis to the x1symmetry position. The
element are arranged w.r.t. mean energy contribution. The white lines in the corresponds to ternary, binary, or elemental
compositions for which energy has not been predicted.
FIG. S4. Average formation energy one would get by placing two elements on two different sites in the four-tuple x. The
coloring is the average formation energy that a crystal which contains the two elements has.
TABLE I. Conventional oxidation states for all elements considered in this work (values taken from Uncon-
ventional oxidation states found in this study are highlighted in red.
Element -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
He X
Li X
Be X X
Ne X
Na X X
Mg X X
Al X X X
Si X X X X X X X X
Cl X X X X X X X
Ar X
Ca X X X
Element -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
Ga X X X X X X
Ge X X X X X X X X X
As X X X X X
Se X X X X X
Br X X X X X X
Rb X X
Sr X X
In X X X X
Sn X X X X X
Cs X X
Ba X
Ti X X X X
Pb X X X X
Bi X X X X X
LPTOS Z1#1Z2#2Z3#3Z4#4Z5#5
0 F 183K Cl 38K O 31K Br 23K Sr 12K
1 F 8.8K Li 806 Ba 656 K 455 Rb 454
2 F 3.6K He 269 Ne 260 Ar 249 Ba 243
3 F 1.9K Ba 322 He 175 Ne 168 Ar 168
4 F 345 Ba 251 Sr 97 Ar 40 He 37
5 Ba 414 Sr 144 F 63 O 59 Cl 38
6 Ba 390 F 58 Cl 40 Br 39 Sr 36
7 Ba 288 F 41 Br 35 Cl 34 I 25
8 Ba 66 F 11 Cl 10 Br 9 I 8
9 Ba 48 Bi 12 O 7 F 5 As 2
10 Ba 132 Bi 28 F 12 O 10 Pb 10
11 Ba 174 Bi 28 F 12 Pb 18 O 17
12 Ba 48 Bi 10 F 8 Pb 6 Cl 4
TABLE II. The first column is the lowest possible total oxi-
dation state. The rest of the columns show the occurrence of
the most common elements for each LPTOS, sorted from left
to right.
TABLE III. 250 elpasolite crystals predicted by the 10 k ML-model to be the lowest in formation energy, as shown in lower
panel of Fig. 1(e). ML formation energies [eV/atom] are shown together with their corresponding DFT energies [eV/atom],
together with the index that sorts the DFT energies.
Formula # ML DFT Formula # ML DFT Formula # ML DFT Formula # ML DFT Formula # ML DFT
MgCaBa2F670 -3.27 -3.07 AlSrRb2F694 -3.04 -3.01 CaGeBa2F6201 -2.95 -2.72 CaSnBa2F6189 -2.89 -2.76 BaBeCs2F682 -2.83 -3.04
AlCaBa2F6147 -3.22 -2.85 CaMgSr2F681 -3.04 -3.04 NaAlBa2F6152 -2.95 -2.84 CaBaNa2F666 -2.89 -3.11 TlSrBa2F6169 -2.83 -2.8
BaMgCs2F630 -3.22 -3.27 SrMgCs2F618 -3.04 -3.32 AlBeBa2F6212 -2.95 -2.69 CaAsBa2F6217 -2.89 -2.68 CaInBa2F6134 -2.83 -2.89
BaMgRb2F635 -3.2 -3.25 CaMgRb2F610 -3.04 -3.35 AlCaNa2F693 -2.95 -3.01 BaSiCs2F6196 -2.89 -2.74 AlSrCa2F6206 -2.83 -2.71
BaMgK2F638 -3.19 -3.23 CaAlBa2F6148 -3.04 -2.85 SrMgCa2F6122 -2.94 -2.92 CaSbBa2F6214 -2.89 -2.68 CaLiSr2F644 -2.83 -3.21
BaCaCs2F65 -3.19 -3.39 GaCaBa2F6124 -3.03 -2.91 CaLiBa2F629 -2.94 -3.28 BaGaK2F6197 -2.89 -2.74 LiAlCa2F6170 -2.83 -2.8
MgCaSr2F680 -3.19 -3.04 NaMgBa2F650 -3.03 -3.19 AlLiSr2F6139 -2.93 -2.88 CsCaBa2F6106 -2.89 -2.99 LiSrBa2F647 -2.82 -3.19
CaMgBa2F671 -3.17 -3.07 BaCaSr2F686 -3.03 -3.03 MgCaNa2F642 -2.93 -3.22 AlKBa2F6141 -2.89 -2.87 InLiBa2F6118 -2.82 -2.94
LiBeBa2F690 -3.17 -3.03 MgBeBa2F6151 -3.03 -2.84 SrLiBa2F648 -2.93 -3.19 SrBaNa2F675 -2.89 -3.06 CaGaSr2F6175 -2.82 -2.79
BaCaRb2F615 -3.17 -3.34 MgBaCs2F631 -3.03 -3.27 NaBeBa2F696 -2.93 -3.01 MgBaCa2F6156 -2.88 -2.83 SrTeBa2F6241 -2.82 -2.55
SrCaBa2F653 -3.17 -3.15 CaSrRb2F62 -3.03 -3.41 BLiBa2F6223 -2.93 -2.65 AlLiCa2F6173 -2.88 -2.8 SrNaBa2F658 -2.82 -3.14
MgCaCs2F617 -3.16 -3.33 AlLiBa2F6113 -3.02 -2.96 BaGaRb2F6188 -2.93 -2.76 TlSrCs2F6193 -2.88 -2.75 NaGaBa2F6135 -2.81 -2.89
MgCaRb2F611 -3.15 -3.35 SrMgRb2F621 -3.02 -3.31 MgBeCa2F6209 -2.93 -2.7 BaMgCa2F6155 -2.88 -2.83 CaKBa2F662 -2.81 -3.12
AlCaCs2F6114 -3.15 -2.95 MgBaRb2F634 -3.02 -3.25 CsMgK2F6238 -2.93 -2.58 MgAlBa2F6172 -2.88 -2.8 LiGeBa2F6180 -2.81 -2.78
BaAlCs2F6108 -3.15 -2.99 SrBaRb2F626 -3.02 -3.3 GaSrBa2F6165 -2.93 -2.81 MgBaNa2F676 -2.88 -3.05 SiSrBa2F6243 -2.81 -2.54
BaAlK2F698 -3.14 -3.01 BeMgBa2F6183 -3.02 -2.78 LiMgSr2F649 -2.93 -3.19 SiCaSr2F6245 -2.88 -2.54 LiGaBa2F6100 -2.8 -3.0
BaSrCs2F612 -3.13 -3.35 SrAlK2F684 -3.01 -3.03 KMgBa2F667 -2.93 -3.1 CaNaBa2F636 -2.88 -3.23 BeAsBa2F6227 -2.8 -2.63
MgSrBa2F679 -3.11 -3.05 LiBeSr2F691 -3.01 -3.02 AlMgSr2F6198 -2.93 -2.73 SrSnBa2F6213 -2.88 -2.68 CaTeBa2F6216 -2.8 -2.68
BeCaBa2F6158 -3.11 -2.83 BaMgSr2F6119 -3.01 -2.94 InSrCs2F6146 -2.93 -2.86 BCaBa2F6247 -2.88 -2.51 LiSiBa2F6200 -2.8 -2.72
AlBaK2F697 -3.11 -3.01 AlMgK2F6130 -3.01 -2.89 KAlBa2F6140 -2.92 -2.87 NaSrBa2F659 -2.88 -3.14 LiSbBa2F6231 -2.8 -2.62
AlSrBa2F6184 -3.1 -2.77 AlSrK2F683 -3.01 -3.03 InSrBa2F6171 -2.92 -2.8 SrMgNa2F655 -2.87 -3.15 AlGaBa2F6192 -2.8 -2.75
LiMgBa2F641 -3.1 -3.22 CaAlK2F688 -3.0 -3.03 BaMgNa2F677 -2.92 -3.05 BaCaNa2F665 -2.87 -3.11 BaSiRb2F6207 -2.8 -2.71
CaSrBa2F654 -3.1 -3.15 MgSrK2F624 -3.0 -3.3 LiCaSr2F645 -2.92 -3.21 SrGeBa2F6228 -2.87 -2.63 RbCaBa2F673 -2.8 -3.06
CaMgK2F68 -3.1 -3.36 AlBaRb2F6101 -3.0 -3.0 CaGaCs2F6145 -2.92 -2.86 SrSbBa2F6240 -2.87 -2.56 SrSiBa2F6244 -2.8 -2.54
SrMgBa2F678 -3.1 -3.05 BeLiSr2F692 -3.0 -3.02 CaMgNa2F643 -2.92 -3.22 LiPBa2F6239 -2.87 -2.57 GaLiBa2F699 -2.79 -3.0
CaSrCs2F60 -3.1 -3.44 InCaBa2F6133 -3.0 -2.89 TlCaCs2F6182 -2.92 -2.78 BeSrBa2F6163 -2.87 -2.82 BaSiK2F6219 -2.79 -2.67
SrMgK2F625 -3.1 -3.3 LiAlBa2F6112 -3.0 -2.96 TlMgBa2F6138 -2.92 -2.88 BeLiCa2F6104 -2.87 -3.0 CaGeSr2F6230 -2.79 -2.62
MgCaK2F69 -3.09 -3.36 CaBaRb2F614 -3.0 -3.34 CsMgRb2F6234 -2.92 -2.61 CaBeSr2F6167 -2.87 -2.81 BLiSr2F6237 -2.79 -2.58
BaAlRb2F6103 -3.09 -3.0 CaGaBa2F6125 -2.99 -2.91 LiBBa2F6222 -2.92 -2.65 SnCaBa2F6190 -2.87 -2.76 MgSnBa2F6195 -2.79 -2.74
MgSrCs2F619 -3.09 -3.32 BeCaSr2F6168 -2.99 -2.81 GaMgBa2F6131 -2.92 -2.89 AlNaBa2F6143 -2.87 -2.86 MgGaSr2F6179 -2.79 -2.79
CaMgCs2F616 -3.08 -3.33 CsMgBa2F6111 -2.99 -2.96 NaCaSr2F656 -2.91 -3.14 AlRbCs2F657 -2.86 -3.14 MgNaBa2F651 -2.79 -3.19
BaCaK2F622 -3.08 -3.31 MgBeSr2F6181 -2.98 -2.78 MgLiSr2F652 -2.91 -3.19 AlBaSr2F6191 -2.86 -2.76 AlRbBa2F6149 -2.78 -2.85
SrCaCs2F61 -3.08 -3.44 BaGaCs2F6177 -2.98 -2.79 AlBaNa2F6117 -2.91 -2.95 CaAlSr2F6205 -2.86 -2.71 GeCaBa2F6202 -2.78 -2.72
MgBaK2F639 -3.08 -3.23 CaBeBa2F6160 -2.98 -2.83 LiMgCa2F664 -2.91 -3.12 BaAlNa2F6116 -2.86 -2.95 GaNaBa2F6136 -2.78 -2.89
BeLiBa2F689 -3.08 -3.03 CaAlCs2F6115 -2.97 -2.95 HBeBa2F6121 -2.91 -2.93 MgAlK2F6126 -2.86 -2.91 BBeBa2F6249 -2.78 -2.45
AlCaSr2F6194 -3.08 -2.75 BaSrK2F633 -2.97 -3.26 TlMgCs2F6221 -2.91 -2.66 GaCaSr2F6174 -2.86 -2.79 SbCaBa2F6215 -2.78 -2.68
MgSrRb2F620 -3.07 -3.31 GaCaCs2F6144 -2.97 -2.86 MgGaBa2F6132 -2.91 -2.89 CaSiBa2F6225 -2.86 -2.64 MgKBa2F668 -2.78 -3.1
SrBaK2F632 -3.07 -3.26 AlMgCa2F6236 -2.97 -2.61 GaBaCs2F6176 -2.91 -2.79 AlBeSr2F6235 -2.86 -2.61 MgSbBa2F6211 -2.77 -2.7
BaSrRb2F627 -3.07 -3.3 CsAlBa2F6157 -2.96 -2.83 AlMgNa2F6129 -2.91 -2.9 LiAsBa2F6232 -2.85 -2.62 MgAsBa2F6204 -2.77 -2.71
SrBaCs2F613 -3.07 -3.35 SrAlBa2F6185 -2.96 -2.77 KCaBa2F661 -2.9 -3.12 PbMgK2F6150 -2.85 -2.85 BaSrNa2F674 -2.77 -3.06
AlCaK2F687 -3.06 -3.03 CaSrK2F66 -2.96 -3.38 NaMgSr2F660 -2.9 -3.13 GaMgK2F6187 -2.85 -2.77 BeGeBa2F6233 -2.77 -2.62
SrCaRb2F63 -3.06 -3.41 SiCaBa2F6226 -2.96 -2.64 CaAlRb2F6107 -2.9 -2.99 MgSrCa2F6123 -2.85 -2.92 LiAlSr2F6153 -2.76 -2.84
AlCaRb2F6105 -3.06 -2.99 MgLiBa2F640 -2.96 -3.22 InCaCs2F6137 -2.9 -2.88 TlMgK2F6224 -2.85 -2.65 MgAlSr2F6203 -2.76 -2.71
LiCaBa2F628 -3.06 -3.28 SrBeBa2F6162 -2.96 -2.82 SrGaBa2F6164 -2.9 -2.81 BeMgSr2F6210 -2.85 -2.7 CaSbSr2F6246 -2.76 -2.52
SrCaK2F67 -3.06 -3.38 CsAlRb2F669 -2.96 -3.09 AlBeCa2F6248 -2.9 -2.5 MgGeBa2F6199 -2.84 -2.72 CaSnSr2F6220 -2.76 -2.67
AlMgBa2F6159 -3.06 -2.83 SiBaK2F6218 -2.96 -2.68 MgBaSr2F6120 -2.9 -2.94 CaBaSr2F685 -2.84 -3.03 BeAlBa2F6229 -2.76 -2.62
AlBaCs2F6109 -3.05 -2.99 GaSrCs2F6154 -2.96 -2.84 SrCaNa2F646 -2.9 -3.2 BeCaRb2F663 -2.84 -3.12 GaMgSr2F6178 -2.76 -2.79
CaBaK2F623 -3.05 -3.31 SrAlCs2F6110 -2.96 -2.98 SrAlRb2F695 -2.89 -3.01 SrAsBa2F6242 -2.83 -2.54 GaKBa2F6208 -2.75 -2.7
NaCaBa2F637 -3.05 -3.23 LiBeCa2F6102 -2.96 -3.0 InMgBa2F6142 -2.89 -2.87 InCaSr2F6186 -2.83 -2.77 AlKSr2F6161 -2.75 -2.82
CaBaCs2F64 -3.05 -3.39 TlCaBa2F6128 -2.95 -2.9 BaInCs2F6166 -2.89 -2.81 BeCaCs2F672 -2.83 -3.06 LiHBa2F6127 -2.75 -2.9
... Previous material informatics studies have unambiguously demonstrated that it is more difficult for a machine learning model to detect patterns and make accurate predictions when a smaller training dataset is used. [13][14][15] Transfer learning, a machine learning technique, is a potential solution. [16,17] In this approach, a neural network model is trained on the source domain with sufficient training data and then the architecture and parameters of the source model are transferred to a related target domain and the model is slightly adjusted with the new training dataset. ...
Full-text available
Real‐time prediction and dynamic control systems that can adapt to an unsteady environment are necessary for material fabrication processes, especially crystal growth. Recent studies have demonstrated the effectiveness of machine learning in predicting an unsteady crystal growth process, but its wider application is hindered by the large amount of training data required for sufficient accuracy. To address this problem, this study investigates the capability of transfer learning to predict geometric evolution in an unsteady silicon carbide (SiC) solution growth system based on a small amount of data. The performance of transferred models is discussed regarding the effect of the transfer learning method, training data amount, and time step length. The transfer learning strategy yields the same accuracy as that of training from scratch but requires only 20% of the training data. The accuracy is stably inherited through successive time steps, which demonstrates the effectiveness of transfer learning in reducing the required amount of training data for predicting evolution in an unsteady crystal growth process. Moreover, the transferred models trained with relatively more data (no more than 100%) further improve the accuracy inherited from the source model through multiple time steps, which broadens the application scope of transfer learning. Process optimization is crucial for obtaining high‐quality crystal with large size, but is always time and data consuming. Transfer learning is employed to inherit the time‐independent features and predict the unsteady crystal growth. The data amount required by the machine learning models can be reduced up to 80%, which facilitates the generation of “digital twin” of real crystal growth experiment.
A fundamental challenge in materials science pertains to elucidating the relationship between stoichiometry, stability, structure, and property. Recent advances have shown that machine learning can be used to learn such relationships, allowing the stability and functional properties of materials to be accurately predicted. However, most of these approaches use atomic coordinates as input and are thus bottlenecked by crystal structure identification when investigating previously unidentified materials. Our approach solves this bottleneck by coarse-graining the infinite search space of atomic coordinates into a combinatorially enumerable search space. The key idea is to use Wyckoff representations, coordinate-free sets of symmetry-related positions in a crystal, as the input to a machine learning model. Our model demonstrates exceptionally high precision in finding unknown theoretically stable materials, identifying 1569 materials that lie below the known convex hull of previously calculated materials from just 5675 ab initio calculations. Our approach opens up fundamental advances in computational materials discovery.
Electrocatalysts and photocatalysts are key to a sustainable future, generating clean fuels, reducing the impact of global warming, and providing solutions to environmental pollution. Improved processes for catalyst design and a better understanding of electro/photocatalytic processes are essential for improving catalyst effectiveness. Recent advances in data science and artificial intelligence have great potential to accelerate electrocatalysis and photocatalysis research, particularly the rapid exploration of large materials chemistry spaces through machine learning. Here a comprehensive introduction to, and critical review of, machine learning techniques used in electrocatalysis and photocatalysis research are provided. Sources of electro/photocatalyst data and current approaches to representing these materials by mathematical features are described, the most commonly used machine learning methods summarized, and the quality and utility of electro/photocatalyst models evaluated. Illustrations of how machine learning models are applied to novel electro/photocatalyst discovery and used to elucidate electrocatalytic or photocatalytic reaction mechanisms are provided. The review offers a guide for materials scientists on the selection of machine learning methods for electrocatalysis and photocatalysis research. The application of machine learning to catalysis science represents a paradigm shift in the way advanced, next-generation catalysts will be designed and synthesized.
Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules—within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
Two-dimensional (2D) metal-organic framework (MOF) materials with large perpendicular magnetic anisotropy energy (MAE) are important candidates for high-density magnetic storage. The MAE-targeted high-throughput screening of 2D MOFs is currently limited by the time-consuming electronic structure calculations. In this study, a machine learning model, namely, transition-metal interlink neural network (TMINN) based on a database with 1440 2D MOF materials is developed to quickly and accurately predict MAE. The well-trained TMINN model for MAE successfully captures the general correlation between the geometrical configurations and the MAEs. We explore the MAEs of 2583 other 2D MOFs using our trained TMINN model. From these two databases, we obtain 11 unreported 2D ferromagnetic MOFs with MAEs over 35 meV/atom, which are further demonstrated by the high-level density functional theory calculations. Such results show good performance of the extrapolation predictions of TMINN. We also propose some simple design rules to acquire 2D MOFs with large MAEs by building a Pearson correlation coefficient map between various geometrical descriptors and MAE. Our developed TMINN model provides a powerful tool for high-throughput screening and intentional design of 2D magnetic MOFs with large MAE.
Full-text available
While experiments and DFT-computations have been the primary means for understanding the chemical and physical properties of crystalline materials, experiments are expensive and DFT-computations are time-consuming and have significant discrepancies against experiments. Currently, predictive modeling based on DFT-computations have provided a rapid screening method for materials candidates for further DFT-computations and experiments; however, such models inherit the large discrepancies from the DFT-based training data. Here, we demonstrate how AI can be leveraged together with DFT to compute materials properties more accurately than DFT itself by focusing on the critical materials science task of predicting “formation energy of a material given its structure and composition”. On an experimental hold-out test set containing 137 entries, AI can predict formation energy from materials structure and composition with a mean absolute error (MAE) of 0.064 eV/atom; comparing this against DFT-computations, we find that AI can significantly outperform DFT computations for the same task (discrepancies of >0.076 eV/atom) for the first time.
The application of machine learning (ML) to electronic structure theory enables electronic property prediction with ab initio accuracy. However, most previous ML models predict one or several properties of intrinsic materials. The prediction of electronic band structure, which embeds all the main electronic information, has yet to be deeply studied. This is a challenging task due to the highly variable inputs and outputs; the input materials may have different sizes and compositions, and the output band structures may have varying band numbers and k-point samplings. This task becomes even more difficult when quantum-confined nanostructures are considered, whose band structures are sensitive to the confinements applied. This paper presents an ML framework for predicting band structures of quantum-confined nanostructures from their geometries. Our framework introduces a graph convolutional network applicable to materials with varying compositions and geometries to extract their atoms’ local environment information. A learnable real-space Hamiltonian construction process then enables the utilization of the information to predict the electronic structure at any arbitrary k-point; the theoretical foundations introduced in this process help to capture and incorporate minor changes in quantum confinements into band structures, and endow the framework with the ability of few-shot learning. Taking an example of graphene nanoribbons, typical quantum-confined nanostructures, we show how the framework is constructed and its excellent performance on band structure prediction with a tiny data set. Our framework may not only provide a rapid yet reliable method for electronic structure determination but also enlighten the applications of graph representation to ML in related fields.
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Most of the research on perovskite materials rely on costly experiments or complex density functional theory (DFT) calculations to a large extent. In contrast, machine learning (ML) combined with data mining is more effective in predicting perovskite properties. In this work, by mining data from the Materials Project database and other materials databases, we constructed a raw data set containing the ABO3-type compounds calculated by density functional theory (DFT) and generated a feature set based on multi-scale descriptors including compound properties and component element attributes. By comparing various machine learning models, the optimized support machine regression (SVR) model, Particle swarm optimization-support machine regression (PSO-SVR) were used to predict the energy above the convex hull (Ehull) of ABO3-type compounds that is the criteria for thermodynamic stability of ABO3-type compounds. In addition, the important descriptors that have significant influence on the thermodynamic stability of ABO3-type compounds were screened out, and the relationship between these descriptors and Ehull was discussed. Finally, the stable and ideal ABO3 compounds were screened out for perovskite candidates.
Full-text available
Accelerating the discovery of advanced materials is essential for human welfare and sustainable, clean energy. In this paper, we introduce the Materials Project (, a core program of the Materials Genome Initiative that uses high-throughput computing to uncover the properties of all known inorganic materials. This open dataset can be accessed through multiple channels for both interactive exploration and data mining. The Materials Project also seeks to create open-source platforms for developing robust, sophisticated materials analyses. Future efforts will enable users to perform ‘‘rapid-prototyping’’ of new materials in silico, and provide researchers with new avenues for cost-effective, data-driven materials design.
Full-text available
The new ternary intermetallic compound Sr14[Al4]2[Ge]3 was synthesized from stoichiometric ratios of the elements. The crystal structure (trigonal, space group R3̅, a = 1196.58(2), c = 4010.33(7) pm, Z = 6, R1 = 0.0574) was determined using single crystal X-ray data. The structure contains two crystallographically independent tetrahedral [Al4] anions with Al-Al distances in the range from 269.7 to 273.6 pm. Taking into account the Zintl concept and the isosteric analogy to white phosphorus, their formal charge is−8. Both of these tetrahedra are surrounded by 16 Sr cations. The three isolated Ge4− anions per formula unit (isosteric to the noble gases) are coordinated by nine Sr cations. According to the ionic description Sr14[Al4]2[Ge]3̅↦14Sr2+ +2[Al4]8− +3[Ge]4− the title compound is an electron-precise Zintl phase. This interpretation is supported by the results of a FP-LAPW band structure calculation, which show a distinct minimum of the total density of states at the Fermi level. Attempts to synthesize the analogous compounds in the systems Sr-Ga-Ge and Ca-Ga-Ge resulted in the formation of new members of the Ca11Ga7 structure type family. In the case of Ca-Al-Ge only the stable binary border compounds Ca2Ge and CaAl2 were formed in respective experiments.
Full-text available
Statistical learning of materials properties or functions so far starts with a largely silent, non-challenged step: the introduction of a descriptor. However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, causality of the learned descriptor-property relation is uncertain. Thus, trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyze this issue and define requirements for a suited descriptor. For a classical example, the energy difference of zincblende/wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.
Full-text available
Intermetallic compounds made of alkali metals and gold have intriguing electronic and structural properties that have not been extensively explored. We perform a systematic study of the phase diagram of one binary system belonging to this family, namely NaxAu1‑x, using the ab initio minima hopping structural prediction method. We discover that the most stable composition is NaAu2, in agreement with available experimental data. We also confirm the crystal structures of NaAu2 and Na2Au, that were fully characterized in experiments, and identify a candidate ground-state structure for the experimental stoichiometry NaAu. Moreover, we obtain three other stoichiometries, namely Na3Au2, Na3Au and Na5Au, that could be thermodynamically stable. We do not find any evidence for the existence of the experimentally proposed composition NaAu5. Finally, we perform phonon calculations to check the dynamical stability of all reported phases and we simulate x-ray diffraction spectra for comparison with future experimental data.
Full-text available
The evaluation of reaction energies between solids using density functional theory (DFT) is of practical importance in many technological fields and paramount in the study of the phase stability of known and predicted compounds. In this work, we present a comparison between reaction energies provided by experiments and computed by DFT in the generalized gradient approximation (GGA), using a Hubbard U parameter for some transition metal elements (GGA+U). We use a data set of 135 reactions involving the formation of ternary oxides from binary oxides in a broad range of chemistries and crystal structures. We find that the computational errors can be modeled by a normal distribution with a mean close to zero and a standard deviation of 24 meV/atom. The significantly smaller error compared to the more commonly reported errors in the formation energies from the elements is related to the larger cancellation of errors in energies when reactions involve chemically similar compounds. This result is of importance for phase diagram computations for which the relevant reaction energies are often not from the elements but from chemically close phases (e.g., ternary oxides versus binary oxides). In addition, we discuss the distribution of computational errors among chemistries and show that the use of a Hubbard U parameter is critical to the accuracy of reaction energies involving transition metals even when no major change in formal oxidation state is occurring.
A method is given for generating sets of special points in the Brillouin zone which provides an efficient means of integrating periodic functions of the wave vector. The integration can be over the entire Brillouin zone or over specified portions thereof. This method also has applications in spectral and density-of-state calculations. The relationships to the Chadi-Cohen and Gilat-Raubenheimer methods are indicated.
Data-driven approaches are particularly useful for computational materials discovery and design as they can be used for rapidly screening over a very large number of materials, thus suggesting lead candidates for further in-depth investigations. A central challenge of such approaches is to develop a numerical representation, often referred to as a fingerprint, of the materials. Inspired by recent developments in chem-informatics, we propose a class of hierarchical motif-based topological fingerprints for materials composed of elements such as C, O, H, N, F, etc., whose coordination preferences are well understood. We show that these fingerprints, when representing either molecules or crystals, may be effectively mapped onto a variety of properties using a similarity-based learning model and hence can be used to predict relevant properties of a material, given that its fingerprint can be defined. Two simple procedures are introduced to demonstrate that the learning model can be inverted to identify the desired fingerprints and then, to reconstruct molecules which possess a set of targeted properties.
We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an Ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix by using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a data set of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) 0.37 eV/atom for the respective representations.
Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k constitutional isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semi-empirical quantum chemistry and machine learning models trained on 1 and 10\% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.
Typically, computational screens for new materials sharply constrain the compositional search space, structural search space, or both, for the sake of tractability. To lift these constraints, we construct a machine learning model from a database of thousands of density functional theory (DFT) calculations. The resulting model can predict the thermodynamic stability of arbitrary compositions without any other input and with six orders of magnitude less computer time than DFT. We use this model to scan roughly 1.6 million candidate compositions for novel ternary compounds (AxByCz), and predict 4500 new stable materials. Our method can be readily applied to other descriptors of interest to accelerate domain-specific materials discovery.