A Transferable Recommender Approach for Selecting the Best Density Functional
Approximations in Chemical Discovery
Chenru Duan1,2, Aditya Nandy1,2, Ralf Meyer1, Naveen Arunachalam1, and Heather J. Kulik1,2
1Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA
2Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
Abstract: Approximate density functional theory (DFT) has become indispensable owing to its
cost-accuracy trade-off in comparison to more computationally demanding but accurate
correlated wavefunction theory. To date, however, no single density functional approximation
(DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data
generated from DFT. With electron density fitting and transfer learning, we build a DFA
recommender that selects the DFA with the lowest expected error with respect to gold standard
but cost-prohibitive coupled cluster theory in a system-specific manner. We demonstrate this
recommender approach on vertical spin-splitting energy evaluation for challenging transition
metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy
(ca. 2 kcal/mol) for chemical discovery, outperforming both individual transfer learning models
and the single best functional in a set of 48 DFAs. We demonstrate the transferability of the DFA
recommender to experimentally synthesized compounds with distinct chemistry.
With rapid advancements in computing power, density functional theory (DFT) has
become an indispensable companion to experiments as well as a primary tool in virtual high
throughput screening (VHTS)1 to generate large-scale computational datasets2. Combined with
machine learning (ML), these datasets have accelerated computational chemical discovery and
revolutionized scientific discoveries3, 4 in physics, chemistry, materials sciences, and biology. All
ML models, however, are limited by the quality of the training data, imposing strict requirements
for the accuracy of the first-principles methods used in VHTS. In DFT, a density functional
approximation (DFA) that works well on certain systems can fail prominently on other systems
due to the approximations made in the exchange-correlation functional5. This DFA dependence
is particularly strong in open-shell transition metal chemistry, with many examples of
compelling functional materials (e.g., metal organic frameworks) and catalytic reactions (e.g., C–
H activation) that are dominated by static correlation6.
For simplicity, a single DFA is typically selected to screen through a large chemical
space in VHTS2. It is known that the single-DFA approach can lead to bias in the computational
data sets generated, which further bias ML models and the final candidate materials in the ML-
accelerated discovery7. Alternatively, when computational chemistry efforts are focused enough
on a small set of molecules for which experimental or accurate correlated wavefunction theory
reference data is available, different DFAs may be evaluated and then selected in a system and
property-dependent manner to obtain agreement8. Nevertheless, there is no guarantee that the
same DFA is optimal for different properties or different materials. These challenges pose
limitations on both mechanistic inquiry by computational chemists aiming to reach the "chemical
accuracy" needed to be predictive of experimental outcomes and in the emergence of massive-
scale “big data” that enables deep model training in computational sciences9 as in computer
vision and natural language processing.
One way to improve the fidelity of DFT-derived data sets is to develop DFAs with
increased accuracy and generalizability. Recent advances in using ML to develop neural network
potentials10 and exchange-correlation functionals11 have shown promise in developing
transferable models. Currently, however, they have severe limitations that curtail their practical
use. These DFAs have primarily been developed for and applied on narrow sets of closed-shell
organic molecules and are still less transferable relative to conventional DFAs developed in the
theoretical chemistry community over the past few decades. More importantly, these models
only target the total electronic energy of a geometry rather than other properties of chemical
interest, such as those involving multiple electronic states (e.g., spin splitting).
Here, we demonstrate an alternate route. Instead of developing a new DFA, we leverage
transfer learning (TL) and use a “regress-then-classify” strategy to develop a recommender12 to
select the best conventional DFA for a given system and a property of interest based on features
the electron density of the system under study. To demonstrate this approach, we recommend a
DFA that most accurately evaluates the vertical spin splitting energy of a transition metal
complex (TMC), a property highly sensitive to the choice of DFA7,20. This recommender selects
a DFA in a system-specific manner rather than based on average performance of DFAs and
captures the rank ordering of the top-performing DFAs. Crucially, our recommender has a mean
absolute error (MAE) of 2.1 kcal/mol, providing the accuracy required for the exploration of
transition metal chemical space. Because the electron density is a fundamental property of the
system, our recommender demonstrates excellent transferability, achieving similar accuracy for
recommending DFAs on unseen experimentally synthesized TMCs that contain diverse
chemistry. This recommender approach is expected to be a general framework for method
selection to increase the data quality in VHTS and ML-accelerated discovery in computational
Overview of the DFA recommender. Electron density lies in the core of Kohn-Sham (KS)-DFT
and can derive any ground state property of interest for a system. However, it is less commonly
used as representations in ML models due to its non-local nature in KS orbitals and its
cumbersome to fulfill translational and rotational symmetries when discretized to 3D cubes. Here,
we perform density fitting on the electron density13 of a TMC obtained from the representative
B3LYP calculation. The resulting coefficients of the basis functions, which preserve the required
physical (i.e., translational, rotational, and permutation) symmetries, are used as the
representations of the TMC (Fig. 1a–1b, see Methods). To make this approach more general with
respect to chemical properties that involve multiple electronic states (here, vertical spin splitting),
we directly decompose the difference of electron density at two electronic states. These density
fitting coefficients are atom-centered, which have the advantage of being agnostic to model
architectures and can be readily combined with Behler–Parrinello, message passing, and graph
With the overarching goal of recommending a DFA in a system-specific way, one may
expect that the optimal approach is to treat the learning task as a multi-class classification
problem14 where the classes are different DFAs. However, due to the inherent similarity among
different DFAs, many DFAs perform similarly well in most cases, leading to label noise when
determining the “best” DFA. Correspondingly, this noise poses a challenge to classification
models (Supplementary Fig.1). Therefore, we take an alternative “regress-then-classify” strategy,
where we aim to recommend a low-error DFA instead of forcing ML models to recognize top-
ranking DFAs (Fig. 1, see Methods). We thus set up transfer learning (TL) regression task to
predict the absolute difference of the vertical spin splitting energy between a DFA (f) and our
reference data from domain-based local pair natural orbital (DLPNO)-CCSD(T) theory15 (|DDEH–
L[f]|). In this “regress-then-classify” strategy, we first build Behler–Parrinello-type neural
networks to predict |DDEH–L[f]| using our B3LYP density fitting coefficients separately for a pool
of pre-selected candidate DFAs (Fig. 1c). We then select the DFA that gives the smallest
absolute predicted difference (i.e., predicted |DDEH–L[f]|) as the recommend DFA for the TMC
There are two approaches to evaluate the performance of the DFA recommender. The
first is the absolute error as a result of using the recommended DFA relative to the reference
DLPNO-CCSD(T) method. This measure serves as a practical metric for evaluating the accuracy
obtained by the recommender in VHTS. The second is the rank ordering of the recommended
DFA among the pool of candidate DFAs. This statistical measure quantifies how well the
recommender distinguishes top-performing candidate DFAs. Throughout our work, we will
consider both perspectives.
Fig. 1 | Workflow for the DFA recommender. a, B3LYP/def2-TZVP single-point energy
calculations are performed on both the high-spin (HS) and low-spin (LS) states to obtain their
electron densities at B3LYP level. b, The difference of the electron densities between the HS and
LS states is decomposed into each atom using a density fitting procedure. c, These coefficients
are used in a Behler–Parrinello-type neural network as a TL model to predict |DDEH-L[f]| for each
DFA (f) in our pool of 48 DFAs. Coefficients of different atoms that are in the same group of the
periodic table share the same local network and weights (e.g., WO for O (red), WH for H (gray)
and O and S share the same weights WO). The latent vectors of each element are lastly
concatenated and passed to a fully-connected network (WL) to predict |DDEH-L[f]|. d, The 48
predicted |DDEH-L[f]| are then sorted, where we recommend the DFA that yields the lowest
Performance of TL models. We first consider a set of 452 octahedral TMCs composed of 3d
mid-row transition metals and small common organic ligands in spectrochemical series (VSS-542,
Supplementary Fig. 2, see Methods). We demonstrate the performance of our TL models for
predicting the differences in vertical spin-splitting energies obtained by a DFA and DLPNO-
CCSD(T) for a pool of 48 DFAs that cover multiple rungs of “Jacob’s ladder”. These 48 TL
models have mean absolute errors (MAEs) ranging from 2.3 kcal/mol to 3.4 kcal/mol, with a
median MAE of 2.5 kcal/mol on the 152 set-aside test TMCs in VSS-452 (Fig. 2a). These MAEs
are low considering the fact that the TL models were only trained on a small set of 300 TMCs
that contain diverse chemistry (see Methods). In fact, these MAEs are lower than the typical
experimental uncertainties for thermochemical properties of TMCs (e.g., 3 kcal/mol)16. Despite
the fact that some DFAs have very large (i.e., > 30 kcal/mol) DFA-derived MAEs, all 48 TL
models yield reasonably low MAEs, suggesting the general applicability of this TL approach
regardless of the candidate DFA considered.
Interestingly, the ranking of TL model MAEs does not have the same order as the error
ranking of the underlying DFA results relative to DLPNO-CCSD(T). For example, the DFA with
the lowest TL MAE is a double-hybrid functional DSD-PBEB95-D3BJ, which only has the fifth-
lowest MAE relative to the reference calculation (Supplementary Figs. 3 and 4). MN15, which
gives the highest MAE among 48 TL models, only ranks 17th among the DFA MAEs relative to
DLPNO-CCSD(T). For the set of 48 DFAs, the rank-order coefficient (i.e., Spearman's r)
between DFAs ranked by TL model MAE and those ranked by DFA-derived MAE is only 0.36.
This observation suggests that a TL model does not necessarily have improved performance
when the MAE of the DFA-derived MAE of the baseline DFA is smaller, posing an interesting
question of how to select the best baseline method from which TL would yield the lowest errors.
Fig. 2 | Performance of TL models and the recommender on VSS-452 set. a, MAE of 48 TL
models for the prediction of |DDEH-L| with a box indicating their median (solid line). The MAEs
for each TL model (black circle) and our recommender approach (blue star) are also shown. b,
Percentage for the absolute error with the recommender-selected DFA (gray bars), with the
cumulative percentage (blue solid line) shown according to the axis on the right. c, An example
complex Co(III)(C3H4N2)4(PH3)2, its recommended DFA (i.e ωB97X), and the associated DFA
error (0.1 kcal/mol). Atoms are colored as follows: pink for Co, brown for P, gray for C, blue for
N, and white for H. d, Percentage likelihood of a DFA residing in the top-5 choices suggested by
ground truth (green) and our DFA recommender (blue). The DFAs are sorted in a descending
order of the predicted likelihood of the recommender. In all cases, the model performance is
evaluated on the set-aside 152 test complexes of VSS-452.
Performance of the DFA recommender. We then utilize the predicted |DDEH–L[f]| from all 48
TL models to recommend a “best” DFA. We compare the vertical spin splitting energy obtained
by the recommended DFA and the DLPNO-CCSD(T) reference to evaluate the performance of
the recommender (see Methods). The recommender achieves an MAE of 2.1 kcal/mol on the set-
aside test set of VSS-452, outperforming all 48 TL models. This MAE is only 2.5 times the
theoretical lower bound (0.8 kcal/mol), which is the performance we would achieve if the DFA
that gives the lowest error is always selected (Supplementary Fig. 5). Our recommender
outperforms random DFA selection (MAE = 13.3 kcal/mol) by a factor of 6.5. More importantly,
even for an alternate strategy representative of using prior knowledge in which we pick the
single DFA with the lowest average error over the VSS-452 set, its DFA-derived MAE (i.e., 6.2
kcal/mol with DSD-BLYP-D3BJ) would be three times larger than that of our recommender.
With this recommender, we are able to select DFAs that give errors within the 3 kcal/mol
threshold required for transition metal chemical discovery in 77% cases (Fig. 2b). Almost for all
complexes (i.e., 94%), the recommender can achieve a higher accuracy than the MAE (i.e., 6.2
kcal/mol) of the best single DFA.
One distinct advantage of this recommender approach is that its performance is likely to
improve systematically with increasing number of DFAs under consideration, despite using the
same training data set and TL models. For example, the recommender would achieve an MAE of
3.0 kcal/mol if we had used the smaller pool of 23 candidate DFAs introduced in our previous
work7 (Supplementary Table 1). However, as we add the remaining 25 DFAs that incorporate
alternate fractions of Hartree-Fock (HF) exchange, the MAE reduces to 2.1 kcal/mol, improving
upon the accuracy of our best TL model of DSD-PBEB95-D3BJ (2.3 kcal/mol, Supplementary
Table 2). We gain this additional accuracy of the recommender without significantly increasing
the computational cost, as there is no need for more training data with the computationally
demanding DLPNO-CCSD(T) reference.
The other distinct feature of our recommender approach is its system specificity.
Compared to the widely used strategy of selecting DFAs based on the statistically averaged
performance over a benchmark data set17, the recommender chooses the DFA only based on the
chemistry (here the electron density) of the system under consideration. As a result, our
recommender can avoid selecting a DFA that performs well on average yet particularly bad on
the given complex. For example, DSD-BLYP-D3BJ has the lowest DFA-derived MAE for
|DDEH–L[f]| (i.e., 6.2 kcal/mol) against DLPNO-CCSD(T) but gives a relatively high absolute
error of 9.2 kcal/mol on Co(III)(C3H5N2)4(PH3)2. Our recommender, instead, selects wB97X,
which has a very small error of 0.1 kcal/mol on this compound, despite the fact the DFA-derived
MAE of wB97X is much higher (i.e, 16.9 kcal/mol) and only ranks 37th out of 48 DFAs (Fig. 2c
and Supplementary Fig. 3).
Next, we investigate the statistics of the recommended DFA performance to determine
when our recommender correctly selects the top-performing functional. We focus on DFAs that
are within the top-5 choices as they usually result in the accuracy required for studies in
transition metal chemistry16 (i.e., 3.0 kcal/mol) because multiple DFAs can achieve similar
accuracy for a TMC (Supplementary Figs. 5 and 6). We find more than two-thirds (i.e., 100 out
of 152 complexes in the set-aside test set of VSS-452) of the recommended DFAs are within the
top-5 DFAs relative to the ground truth (Supplementary Fig. 7). In less than 15% of cases, our
approach recommends a DFA that is not in the top 10 out of 48 candidate DFAs. Interestingly,
we get more favorable ranking statistics using only the 23 DFAs from prior work7, despite a
higher recommender MAE (i.e., 3.0 kcal/mol, Supplementary Table 1). With these 23 candidate
DFAs, 88% of the DFAs selected by the recommender are within the top 5 and nearly none (i.e.
4%) fall out of the top-10 DFAs (Supplementary Fig. 8). We expect this behavior to be general
in our recommender approach: with more candidate DFAs in the pool, it is more difficult to get
favorable ranking statistics, as judged by identifying the single top performing functional, but
easier to obtain a lower MAE for practical performance because there are more DFAs to select
Lastly, we compare the statistics of most probable DFAs that reside in top-5 choices
obtained by the ground truth and our recommender. Out of the 48 DFAs, MN15-L with 50% HF
exchange (i.e., MN15-L:50%) appears most frequently as a top-5 DFA, with a likelihood of 43%
for the 152 set-aside test TMCs in VSS-452 (Fig. 2d). Correspondingly, our recommender
identifies the same DFA to have the highest likelihood (53%) being in the top-5 candidate DFAs.
Moreover, for each base semi-local functional (i.e., BLYP, PBE, SCAN, M06-L, and MN15-L),
the DFA recommender successfully picks the HF exchange fraction that is most accurate for
VSS-452. As a result, our recommended DFAs maintain the rank ordering of probable top-5
DFAs compared to the ground truth, leading to a Spearman's r of 0.95. This extremely high
correspondence demonstrates that our DFA recommender is capable of identifying DFAs that are
most likely to be accurate in a given chemical space.
Interpreting TL model and recommender predictions. We use virtual adversarial attack18 as
an approach to uncover TL model focus when predicting |DDEH–L[f]|. During the attack, we
obtain the virtual adversarial perturbation (rvadv) on the inputs that maximizes the change of TL
model output (i.e., before and after the perturbation), which represents the region of model focus.
For example, the rvadv of the B3LYP TL model on Co(II)(SH2)4(SCN–)2 mostly concentrates on
Co and S atoms, suggesting that the B3LYP TL model focuses mostly on the metal and first
coordination sphere when predicting |DDEH–L| (Fig. 3a). By directly averaging over rvadv of all
TMCs in the set-aside test set of VSS-452 for a given DFA, we obtain its average TL model
focus. Interestingly, we find that all TL models have much stronger focus on the metal-local
environments compared to the untrained model, which is in agreement with our previous work7,
19 (Fig. 3b). This trend holds for all 48 DFAs despite the fact that spin-splitting energy itself can
be sensitive to DFA choice and the associated HF exchange fraction.
Intuitively, one would expect the accuracy of a DFA for predicting spin splitting energy
to depend on the ligand field strength of the ligands in the TMC17, 20. This relationship, however,
is challenging to disentangle when we have 48 candidate DFAs and only 152 test complexes,
making all the statistics insignificant. To tease out the trends of the recommended DFAs with
respect to the ligand field (measured by DLPNO-CCSD(T) DEH-L), we perform a control
experiment using only a small pool of candidate DFAs that contain the best DFA at each semi-
local functional (i.e., BLYP:50%, PBE:30%, SCAN:40%, M06-L:40%, and MN15-L:50%). For
this experiment, we are able to maintain reasonable recommender performance (i.e., MAE = 2.5
kcal/mol) with these five select DFAs and still have a relatively large ratio between the number
of test TMCs and candidate DFAs to make our statistical analysis sound. We further partition the
DLPNO-CCSD(T) DEH-L from -100 kcal/mol to -10 kcal/mol equally into 9 ranges, thus
quantifying the complexes from those containing the weakest to the strongest ligand field ligands.
Among the five DFAs, each DFA has a ligand field strength range over which it performs the
best (Fig. 4a). For example, M06-L:40% is mostly selected for strong ligand fields (DLPNO-
CCSD(T) DEH-L > -20 kcal/mol), while MN15-L:50% is recommended frequently for the weakest
fields (DLPNO-CCSD(T) DEH-L < -70 kcal/mol). This preference of selecting different DFAs at
different ligand field strengths also explains the great accuracy of our recommender approach: At
each specific range of DLPNO-CCSD(T) DEH-L, the recommender successfully avoids selecting
DFAs that have a large MAE, which in turn will likely yield low errors in practical applications
of the recommender (Fig. 4b).
Fig. 3 | Analysis of TL model focus using virtual adversarial attack. a, The cis
Co(II)(SH2)4(SCN)2 molecule (left) where the sphere radius of each atom is proportional to its
unsigned average of rvadv and its normalized rvadv (right) with the atom type shown at the start of
each row. Only the average rvadv over each atom type is shown, since the differences are small
within the same atom type due to the symmetry of this complex. The number of non-zero
elements differs in rvadv since the basis set size varies for different atom types. All atoms are
colored as follows: pink for Co, yellow for S, gray for C, blue for N, and white for H. b, Model
focus decomposed to metal locality for select DFAs in BLYP family (blue for BLYP, red for
B3LYP, green for BLYP:50%, and orange for B2GP-PLYP). Model focus of an untrained (i.e.,
randomly initialized) TL model is also shown as a comparison. Error bars represent the standard
deviation across different TMCs in VSS-452 set.
Fig. 4 | Recommended DFAs by ligand field strength. a, Stacked normalized histogram for the
recommender-selected DFA by DLPNO-CCSD(T) DEH-L with a bin width of 10 kcal/mol. b,
|DDEH-L| MAE for the same DFAs (circles colored as in top legend) at different ranges of
DLPNO-CCSD(T) DEH-L grouped by the same set of bins in a. If a DFA is never selected in a
range, it is shown with a horizontal bar instead of a circle. In both a and b, the DFA get most
frequently selected in a range is outlined with a black solid outline. For ease of visualization, we
show the recommender results on the set-aside test set of VSS-452 with only five DFAs (blue for
PBE:30%, red for SCAN:40%green for M06-L:40%, orange for MN15-L:50%, gray for
BLYP:50%) as candidates.
Transferability of the DFA recommender on diverse CSD complexes. A more challenging
test of the DFA recommender is on its application of chemically distinct out-of-distribution
complexes. For this purpose, we construct CSD-76, a set of 76 TMCs randomly sampled from
Cambridge Structural Database (CSD)21 that contain diverse ligand chemistry, symmetry, and
connectivity (Supplementary Fig. 9). Since all these complexes have been experimentally
synthesized and crystallized, they test the DFA recommender on a realistic task for exploring
transition metal chemical space. Without seeing any TMCs in CSD-76, the 48 TL models have
MAEs of predicting |DDEH–L[f]| that range from 3.1 to 6.6 kcal/mol, with a median of 4.5
kcal/mol (Fig. 5a). These MAEs are < 2 times those on the set-aside test set of VSS-452. While
there is some accuracy degradation, the transferability improves significantly over chemical-
composition-based representations that often yield > 5 times MAE on out-of-distribution CSD
data22. We ascribe the great transferability of our TL models to the use of electron density as
inputs, which is a more fundamental property and can, in principle, dictate all ground state
properties of a system.
Using the predicted |DDEH–L[f]| from all 48 TL models, the recommender has a MAE of
3.0 kcal/mol, which still slightly outperforms the best TL model (3.1 kcal/mol from M06-2X, Fig.
5a). The recommender MAE on CSD-76 is only 1.5 times that on VSS-452 and is still within the
threshold of accuracy required for transition metal chemical discovery compared to experimental
uncertainties on measuring thermodynamic properties16. Despite the diverse and unseen
chemistry present in CSD-76, the recommender still selects DFAs with < 3 kcal/mol error 60%
of the time and < 5 kcal/mol error most of the time (82%), demonstrating its great transferability
(Fig. 5b). These observations are particularly encouraging as the TL models’ MAEs are more
varied on CSD-76 compared to VSS-452 (Fig. 5a). Moreover, the TL model MAE rankings by
dataset (e.g. VSS-452 vs. CSD-76) are very different (Supplementary Figs. 4 and 10). For
example, the DSD-PBEB95-D3BJ TL model gives the lowest MAE of 2.3 kcal/mol on VSS-452
but a rather high MAE of 4.6 kcal/mol (i.e., the median of 48 TL models) on CSD-76. This
highlights the robustness of the DFA recommender over the conventional TL approach, where
the DFA recommender always give lower MAEs regardless the distributions and rankings of TL
models built on different DFAs.
We next proceed to comparing the statistics of recommended DFAs for the out-of-
distribution CSD-76 and the set-aside test set of VSS-452. The recommender gives comparable
ranking statistics of selected DFAs on CSD-76: 62% of the recommended DFAs are in the top 5
and 86% are in the top 10 (Supplementary Fig. 12). If we insisted on using the single “best” DFA
benchmarked on the VSS-452 set (i.e., DSD-BLYP-D3BJ) for exploring CSD chemical space, we
would have a 5.90 kcal/mol MAE. Moreover, this functional choice is only the actual top-5-
performing DFA 28% of the time over the CSD-76 set. This observation, again, demonstrates the
robustness and transferability of the recommender approach over both the conventional
benchmark and TL approach on realistic chemical discovery.
Similar to the case of VSS-452, the DFA recommender is also able to identify the top-
performing DFAs and correctly predict the relative likelihood of a DFA to be accurate for CSD-
76 (Fig. 5c). For example, it successfully identifies M06 as the most probable DFA to reside in
the top-5, despite a slight overestimate of its likelihood (i.e., 64%) compared to the ground truth
of 56%. In addition, the recommender maintains the high rank ordering (Spearman's r = 0.90) of
probable top-5 DFAs relative to the ground truth on the CSD-76 set, demonstrating its great
transferability to unseen chemistry.
Due to the drastically different chemistry present in VSS-452 and CSD-76, it is no
surprise that the top-performing DFAs will vary for the two data sets (Supplementary Table 3,
Figs. 3 and 10). MN15-L:50%, the most probable (45%) DFA residing in the top-5 choices for
VSS-452, only has a 17% likelihood of being in top-5 for CSD-76. Meanwhile, M06, which is a
top-5 DFA only 30% of the time for VSS-452, becomes the most probable DFA to reside in the
top-5 with a probability of 56%. Although the recommender does not have this prior knowledge
for the two data sets, it still captures the trend well and selects M06 much more often than
MN15L:50% for TMCs in CSD-76 (Fig. 5c). The recommender also “intelligently” down-selects
DFAs that only perform well on VSS-452 (e.g., SCAN:40%, MN15-L:40%, M06-L:40%) and
up-selects DFAs that would perform well on CSD-76 (e.g., LRC-wPBEh, SCAN0, PBE:20%).
These observations suggest that our DFA recommender can be reliably applied to explore
diverse transition metal chemical spaces with high accuracy.
Fig. 5 | Performance of TL models and the recommender on CSD-76 set. a, Box plot for
MAE of 48 TL models for the prediction of |DDEH-L| for both the set-aside test set of VSS-452
and the CSD-76 set. The median of 48 TL models (solid line) and the recommender MAE (blue
star) are also shown. For each data set, an example complex is shown: trans
Mn(II)(C4H4O4)4(H2O)2 for VSS-452 (left) and a hexadentate Fe complex (refcode: KIJNEW) for
CSD-76 (right). All atoms are colored as follows: purple for Mn, orange for Fe, yellow for S,
gray for C, blue for N, red for O, and white for H. b, Percentage for the DFA recommender to
have errors below certain thresholds (1, 3, 5, or 10 kcal/mol) for the set-aside test set of VSS-452
(red circles) and CSD-76 (gray bars). c, Percentage likelihood of a DFA residing in the top-5
choices suggested by ground truth (green) and our recommender approach (blue) for the CSD-76
set, where DFAs are sorted in a descending order of the predicted likelihood of the recommender.
Percentage likelihood obtained by the recommender on the set-aside test set of VSS-452 (red
circles) is also shown as a comparison.
DFT has become indispensable in both mechanistic study and in accelerated, automated
chemical or materials discovery. Its accuracy, however, can highly depend on the choice of DFA.
The single-DFA approach widely used in VHTS leads to bias in data acquisition, and expert
knowledge and heuristics cannot be expected to be predictive of a single best DFA across large
chemical spaces. In this work, we developed a general recommender approach to select DFAs in
a system-specific manner. Distinct from traditional classification tasks where the label is certain,
the “best” DFA can be expected to be ambiguous due to the similarities between candidate DFAs.
We devise a “regress-then-classify” strategy to select DFAs with low error instead of forcing a
model to directly classify the “best” DFA. By partitioning the electron density difference onto
each atom within a system, we build Behler–Parrinello-type neural networks for transfer learning
the differences between a DFA and the coupled cluster reference. The recommender then selects
the DFA that gives the lowest predicted difference from the reference.
We demonstrated this recommender approach on evaluating the vertical spin splitting
energy of open shell transition metal complexes. Trained only on 300 TMCs with common
monodentate ligands, our recommender achieves an accuracy of 2.1 kcal/mol, outperforming
both the conventional, single DFA and TL approach. This recommender also accurately captures
the rank ordering (Spearman’s r=0.96) of the likelihood of a DFA residing in the top-5 choices
relative to the ground truth. When directly applied on experimentally synthesized complexes
with diverse and unseen ligand chemistry and symmetry, the recommender maintains its stellar
performance despite the fact that the top-performing DFAs for the CSD complexes are
significantly different than the top-performing DFAs in the training data. The recommender still
provides the accuracy needed for transition metal chemistry exploration (i.e., MAE=3.0 kcal/mol)
and is able to select top-5 DFAs 62% of the time and capture the rank ordering (Spearman’s
r=0.90) of the likelihood of a DFA residing in the top-5 choices.
The recommender approach has two limitations in its current implementation. First, since
it uses the B3LYP electron density as inputs, this recommender is not "zero cost", and its
advantage is therefore greatest in transfer learning tasks where a property prediction with
"beyond DFT" accuracy is needed. The use of ML-predicted density, semi-empirical densities, or
guess densities (e.g., superposition of atomic potentials) can reduce the cost further. Second, a
DFA may not be universally accurate across all properties for a system. Generalization of the
current recommender may include redefining the loss function in the TL models to explicitly
encode all relevant objectives.
In its present form, this recommender approach does not introduce additional
computational cost when combined with existing DFT-based VHTS workflows that natively
output an optimized geometry and electron density of a molecule. Therefore, it can be directly
used in conjunction with traditional VHTS for improving the data quality from VHTS at no
additional cost. In addition, our recommender approach is not restricted to predicting a single
electronic energy of a molecule and thus can be generalized to more complex applications such
as catalysis. Although we demonstrate the recommender to select from a pool of conventional
DFAs, it is a general approach for method selection, including among semi-empirical theories,
ML-derived DFAs, or wavefunction theories. We expect this recommender approach to be
broadly useful in light of continuing advances in the methods available in the computational
Density fitting procedure. In Kohn–Sham (KS) DFT, it is known that the ground state energy
of any interacting system is captured by a universal functional of the electron density23. In
practice, the electron density (
) is obtained from the occupied KS orbitals
(r), expanded as a
linear combination of the products of one-electron basis functions
where D is the density matrix and
are indices for one-electron basis functions. The
electron density in Eq. 1, however, is not expressed in an atom-centered basis and thus cannot be
directly used as representation in neural networks. Thus, it is common to use density-fitting (DF)
basis functions to rewrite the electron density as an expansion of atom-centered densities,
# / #&
Q(r–rA) is the Qth DF basis function for atom A13. However, CA
Q contains elements
resulting from DF basis sets where the angular momentum is nonzero (L ≠ 0) and is thus not
rotationally invariant. To obtain a rotationally invariant representation, we calculated the power
spectrum of CA
Q as the norm for each angular momentum L in the DF basis set.
L satisfies rotational, translational, and permutation symmetry and correspondingly
can be used as a set of features into any neural network architectures (Fig. 1b). These features
then represent the chemical environment of atom A. For this procedure, we employ only the
density obtained from B3LYP, regardless of which functional is being studied in the TL models.
Here, we consider the vertical spin-splitting energy DEH–L as our property of interest,
which is the electronic energy difference between the high-spin (HS) and low-spin (LS) states of
open-shell TMCs. We focus on the spin-splitting energy because it identifies the quantum
mechanical ground state, which is an essential property of an open-shell system4. Because our
target property involves two distinct spin states in open-shell systems, we decomposed the
difference between the HS and LS electron densities for both the majority spin and minority spin
separately (Fig. 1a).
# / #&
# / #&
For an atom A, we obtained and concatenated the power spectra of C(A,
Q and C(A,
Q to obtain its
features using Eq. 3. We used the def2-universal-jkfit24 as our DF basis set throughout this work.
The number of DF features for different atoms can vary due to the differences in the auxiliary
basis functions used by atoms. Here, we zero-padded the DF features for all atoms to the
maximum dimension of 58 for each atom density, which is the size of the DF basis set for the
transition metal atom (i.e., Cr, Mn, Fe, or Co) in a TMC.
Behler–Parrinello-type neural networks for transfer learning. We built Behler–Parrinello-
type neural networks using the DF representation of the TMCs in this work25. These fully-
connected neural networks used the DF representation of each atom as inputs,
A is the representation of atom A at layer l, Wl
g is the lth-layer weights for the network
of elements in group g, and
is the activation function. Specifically, X0
A is the set of
concatenated DF features of atom A (see density fitting procedure). The last layer of the network
A , are summed for each chemical element (e),
e of different elements are then concatenated and passed to a fully-connected neural
network to obtain the final output (Fig. 1c).
Our model has three main differences from the original Behler–Parrinello neural network.
First, we replace the symmetry functions that describe the local geometric environment of an
atom therein by the DF representation, which is derived from the electron density and is thus a
more transferable representation. Second, we use the same local network for chemical elements
that are in the same group of the periodic table (e.g., O and S) to promote inter-row learning26.
Lastly, we keep the latent vector Xn
e for each element and use a neural network to obtain the final
output because our final target is not an single electronic energy of the ground state.
We adopted TL strategies and chose our target to be the absolute difference of vertical
spin-splitting energies between the result from each DFA (f) and a reference calculation (|DDEH–
L[f]|). For each fully-connected neural network, we used three hidden layers and 96 neurons per
layer. The shifted softplus activation function,
$, is used throughout.
Recommender. We constructed separate TL models for each DFA (f) to predict |DDEH–L[f]| from
a pre-selected pool of DFAs (F). For a given system, we recommend the DFA, frec, that yields the
lowest predicted |DDEH–L[f]|,
Grec % HIFJKLf
When we evaluate the practical performance of the DFA recommender, we focus on the absolute
error introduced by using frec relative to the reference method (i.e., |DDEH–L[f]|) and the actual
ranking of frec among the pool of DFAs.
Data set construction. Mononuclear octahedral TMCs with Cr, Mn, Fe, and Co in oxidation
states II and III were studied in their HS and LS states: quintet and singlet for d6 Co(III)/Fe(II)
and d4 Mn(III)/Cr(II); sextet and doublet for d5 Fe(III)/ Mn(II), and quartet and doublet for d3
Cr(III) and d7 Co(II). For VSS-452, we used 20 monodentate ligands from both the
spectrochemical series and common organic ligands to obtain properties of complexes with
ligand fields ranging from weak to strong (Supplementary Fig. 2). We allowed up to two unique
ligands in a TMC and did not pose any constraints on ligand symmetry. Together with eight
metal–oxidation state combination and 20 ligands, this rule of assembling TMCs leads to a
hypothetical space of 24,480 TMCs (8´20=160 homoleptic and 8´20´19´8=24,320 heteroleptic).
We used k-medoids sampling to obtain 750 TMCs in this space as our starting data set. To test
the transferability and practical usefulness of our recommender, we collected 100 experimentally
synthesized TMCs with diverse ligand chemistry and connectivity from CSD as the starting point
DFT geometry optimization. Since we are interested in vertical spin splitting, only one
structure needs to be geometry optimized. In this case, we chose to optimize only the HS state.
For each HS complex, a DFT geometry optimization with the B3LYP27 global hybrid functional
was carried out using a developer version of graphical processing unit (GPU)-accelerated
electronic structure code TeraChem28. The LANL2DZ effective core potential29 basis set was
used for metals and the 6-31G* basis24 for all other atoms. In all DFT geometry optimizations,
level shifting30 of 0.25 Ha on all virtual orbitals was employed. Initial geometries were
assembled by molSimplify31 for VSS-452 and were adopted from the crystal structure of CSD for
CSD-76. These geometries were optimized using the L-BFGS algorithm in translation rotation
internal coordinates (TRIC)32 to the default tolerances of 4.5 ´ 10-4 hartree/bohr for the
maximum gradient and 10-6 hartree for the energy change between steps. Because all HS TMCs
are open-shell, the unrestricted formalism was used for all geometry optimizations.
Geometry checks were applied to eliminate optimized structures that deviated from the
expected octahedral shape following previously established metrics without modification33.
Open-shell structures were also removed from the data set following established protocols if the
expectation value of the S2 operator deviated from its expected value33 of S(S + 1) by >1
After these two filtering steps, we converged 452 HS TMCs for VSS-452 and 76 HS TMCs for
CSD-76 with good octahedral geometries and electronic structures (Supplementary Table 4).
Single-point energy calculation. We followed our established protocol for the computation of
HS and LS electronic energies with multiple DFAs for the optimized TMCs using a developer
version of Psi4 1.434. In this workflow, the converged wavefunction obtained from the B3LYP
geometry optimization was used as the initial guess for the single-point energy calculations with
other DFAs, thus maximizing the correspondence of the converged electronic state among all
DFAs and also reducing the computational cost.
The range of 23 DFAs used in the development of the protocol7 were chosen to be evenly
distributed among the rungs of “Jacob's ladder”35 (Supplementary Table 1). Practically, it has
been observed that there is a nearly linear change of chemical properties (e.g., spin splitting)
computed with a DFA at different fractions of HF exchange20. Therefore, we sampled the HF
exchange from 10% to 50% with an interval of 10% on five selected semi-local functionals (i.e.,
BLYP, PBE, SCAN, M06-L, and MN15-L). This procedure results in 25 additional DFAs
(Supplementary Table 2). Combined with the original 23 DFAs, we have a final pool of 48 DFAs
CCSD(T) has been treated as the “gold standard” for quantum chemistry and is
frequently used as benchmark for DFT17. Here, we used DLPNO-CCSD(T) (with T0 perturbative
triple correction15), which is a proxy for canonical CCSD(T), as our reference method due to the
sufficient accuracy of DLPNO-CCSD(T) on TMCs and the high computational cost of canonical
CCSD(T) for a large data set15. In addition, we expect our DFA recommender approach to be
general and have similar accuracy if reference data is derived from higher-level theory (e.g.,
phaseless auxiliary field quantum Monte-Carlo36) or experiments in the future. Both DFT and
DLPNO-CCSD(T) single-point energies for all non-singlet states were calculated with an
unrestricted formalism and for singlet states with a restricted formalism. All single-point energy
calculations were performed with a balanced polarized triple-zeta basis set def2-TZVP24.
Train/test partition and model training. We randomly partitioned VSS-452, with 300 points
(66%) as the training set and 152 (34%) points as the set-aside test set. For all TL models, the
hyperparameters were selected using HyperOpt37 with 200 evaluations, with 60 points of the
training set used as the validation data. All TL models were built with PyTorch38. All models
were trained with the Adam optimizer up to 2000 epochs, using dropout and early stopping to
avoid over-fitting. We treated CSD-76 as the out-of-distribution test set, and thus no points in
CSD-76 was used during the whole model training procedure.
(1) Coley, C. W.; Eyke, N. S.; Jensen, K. F. Autonomous Discovery in the Chemical Sciences
Part I: Progress. Angew Chem Int Ed Engl 2020, 59 (51), 22858-22893. DOI:
(2) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter,
D.; Skinner, D.; Ceder, G.; et al. Commentary: The Materials Project: A materials genome
approach to accelerating materials innovation. APL Materials 2013, 1 (1), 011002. DOI:
(3) Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for
Molecular and Materials Science. Nature 2018, 559 (7715), 547-555. DOI: 10.1038/s41586-018-
0337-2. Carleo, G.; Cirac, I.; Cranmer, K.; Daudet, L.; Schuld, M.; Tishby, N.; Vogt-Maranto, L.;
Zdeborová, L. Machine learning and the physical sciences. Reviews of Modern Physics 2019, 91
(4), 045002. DOI: 10.1103/RevModPhys.91.045002.
(4) Nandy, A.; Duan, C.; Taylor, M. G.; Liu, F.; Steeves, A. H.; Kulik, H. J. Computational
Discovery of Transition-metal Complexes: From High-throughput Screening to Machine
Learning. Chem Rev 2021, 121 (16), 9927-10000. DOI: 10.1021/acs.chemrev.1c00347.
(5) Cohen, A. J.; Mori-Sánchez, P.; Yang, W. Challenges for Density Functional Theory.
Chemical reviews 2012, 112 (1), 289-320. Mardirossian, N.; Head-Gordon, M. Thirty years of
density functional theory in computational chemistry: an overview and extensive assessment of
200 density functionals. Molecular Physics 2017, 115 (19), 2315-2372. DOI:
(6) Janesko, B. G. Replacing Hybrid Density Functional Theory: Motivation and Recent
Advances. Chemical Society Reviews 2021. DOI: 10.1039/d0cs01074j.
(7) Duan, C.; Chen, S.; Taylor, M. G.; Liu, F.; Kulik, H. J. Machine learning to tame divergent
density functional approximations: a new path to consensus materials design principles. Chem
Sci 2021, 12 (39), 13021-13036. DOI: 10.1039/d1sc03701c.
(8) Loipersberger, M.; Cabral, D. G. A.; Chu, D. B. K.; Head-Gordon, M. Mechanistic Insights
into Co and Fe Quaterpyridine-Based CO2 Reduction Catalysts: Metal–Ligand Orbital
Interaction as the Key Driving Force for Distinct Pathways. Journal of the American Chemical
Society 2021, 143 (2), 744-763. DOI: 10.1021/jacs.0c09380.
(9) N., F.; R., S.; S., A.; S., S.; R., G.-B.; C., C.; V., G. Neural Scaling of Deep Chemical Models.
ChemRxiv. Cambridge: Cambridge Open Engage 2022.
(10) Zhang, L.; Han, J.; Wang, H.; Car, R.; E, W. Deep Potential Molecular Dynamics: A
Scalable Model with the Accuracy of Quantum Mechanics. Phys Rev Lett 2018, 120 (14),
143001. DOI: 10.1103/PhysRevLett.120.143001. Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1:
an extensible neural network potential with DFT accuracy at force field computational cost.
Chem Sci 2017, 8 (4), 3192-3203. DOI: 10.1039/c6sc05720a. Batzner, S.; Musaelian, A.; Sun, L.;
Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-
equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat
Commun 2022, 13 (1), 2453. DOI: 10.1038/s41467-022-29939-5.
(11) Dick, S.; Fernandez-Serra, M. Machine learning accurate exchange and correlation
functionals of the electronic density. Nat Commun 2020, 11 (1), 3509. DOI: 10.1038/s41467-
020-17265-7. Kirkpatrick, J.; McMorrow, B.; Turban, D. H. P.; Gaunt, A. L.; Spencer, J. S.;
Matthews, A.; Obika, A.; Thiry, L.; Fortunato, M.; Pfau, D.; et al. Pushing the frontiers of
density functionals by solving the fractional electron problem. Science 2021, 374 (6573), 1385-
1389. DOI: 10.1126/science.abj6511. Li, L.; Hoyer, S.; Pederson, R.; Sun, R.; Cubuk, E. D.;
Riley, P.; Burke, K. Kohn-Sham Equations as Regularizer: Building Prior Knowledge into
Machine-Learned Physics. Physical Review Letters 2021, 126 (3), 036401. DOI:
10.1103/PhysRevLett.126.036401. Ma, H.; Narayanaswamy, A.; Riley, P.; Li, L. Evolving
symbolic density functionals. arXiv, https://arxiv.org/abs/2203.02540 2022. DOI:
(12) McAnanama-Brereton, S.; Waller, M. P. Rational Density Functional Selection Using Game
Theory. Journal of Chemical Information and Modeling 2018, 58 (1), 61-67. DOI:
(13) Margraf, J. T.; Reuter, K. Pure non-local machine-learned density functional theory for
electron correlation. Nat Commun 2021, 12 (1), 344. DOI: 10.1038/s41467-020-20471-y. Grisafi,
A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. Transferable
Machine-Learning Model of the Electron Density. ACS Cent Sci 2019, 5 (1), 57-64. DOI:
(14) He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. DOI:
(15) Floser, B. M.; Guo, Y.; Riplinger, C.; Tuczek, F.; Neese, F. Detailed Pair Natural Orbital-
Based Coupled Cluster Studies of Spin Crossover Energetics. J Chem Theory Comput 2020, 16
(4), 2224-2235. DOI: 10.1021/acs.jctc.9b01109.
(16) Jiang, W.; DeYonker, N. J.; Determan, J. J.; Wilson, A. K. Toward accurate theoretical
thermochemistry of first row transition metal complexes. J Phys Chem A 2012, 116 (2), 870-885.
(17) Mardirossian, N.; Head-Gordon, M. How Accurate Are the Minnesota Density Functionals
for Noncovalent Interactions, Isomerization Energies, Thermochemistry, and Barrier Heights
Involving Molecules Composed of Main-Group Elements? Journal of Chemical Theory and
Computation 2016, 12 (9), 4303-4325. DOI: 10.1021/acs.jctc.6b00637.
(18) Miyato, T.; Maeda, S. I.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A
Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans Pattern Anal
Mach Intell 2019, 41 (8), 1979-1993. DOI: 10.1109/TPAMI.2018.2858821.
(19) Janet, J. P.; Kulik, H. J. Resolving transition metal chemical space: feature selection for
machine learning and structure-property relationships. The Journal of Physical Chemistry A 2017,
121 (46), 8939-8954.
(20) Liu, F.; Yang, T. H.; Yang, J.; Xu, E.; Bajaj, A.; Kulik, H. J. Bridging the Homogeneous-
Heterogeneous Divide: Modeling Spin for Reactivity in Single Atom Catalysis. Frontiers in
Chemistry 2019, 7, 219. DOI: ARTN 219
(21) Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge Structural
Database. Acta Crystallographica Section B-Structural Science Crystal Engineering and
Materials 2016, 72, 171-179. DOI: 10.1107/S2052520616003954.
(22) Janet, J. P.; Duan, C.; Yang, T. H.; Nandy, A.; Kulik, H. J. A Quantitative Uncertainty
Metric Controls Error in Neural Network-Driven Chemical Discovery. Chemical Science 2019,
10 (34), 7913-7922. DOI: 10.1039/c9sc02298h.
(23) Hohenberg, P. a. K., W. Inhomogeneous Electron Gas. Phys. Rev. 1964, 136 (3B), 8864-
8871. DOI: 10.1103/PhysRev.136.B864. Kohn, W. a. S., L. J. Self-Consistent Equations
Including Exchange and Correlation Effects. Phys. Rev. 1965, 140 (4A), A1133-A1138. DOI:
(24) Pritchard, B. P.; Altarawy, D.; Didier, B.; Gibson, T. D.; Windus, T. L. New Basis Set
Exchange: An Open, Up-to-Date Resource for the Molecular Sciences Community. J Chem Inf
Model 2019, 59 (11), 4814-4820. DOI: 10.1021/acs.jcim.9b00725.
(25) Behler, J.; Parrinello, M. Generalized neural-network representation of high-dimensional
potential-energy surfaces. Phys Rev Lett 2007, 98 (14), 146401. DOI:
(26) Harper, D. R.; Nandy, A.; Arunachalam, N.; Duan, C.; Janet, J. P.; Kulik, H. J.
Representations and strategies for transferable machine learning improve model performance in
chemical discovery. J Chem Phys 2022, 156 (7), 074101. DOI: 10.1063/5.0082964.
(27) Becke, A. D. Density-Functional Thermochemistry. III. The Role of Exact Exchange.
Journal of Chemical Physics 1993, 98 (7), 5648-5652. Stephens, P. J.; Devlin, F. J.;
Chabalowski, C. F.; Frisch, M. J. Ab Initio Calculation of Vibrational Absorption and Circular
Dichroism Spectra Using Density Functional Force Fields. The Journal of Physical Chemistry
1994, 98 (45), 11623-11627.
(28) Seritan, S.; Bannwarth, C.; Fales, B. S.; Hohenstein, E. G.; Isborn, C. M.; Kokkila-
Schumacher, S. I. L.; Li, X.; Liu, F.; Luehr, N.; Snyder Jr., J. W.; et al. TeraChem: A graphical
processing unit-accelerated electronic structure package for large-scale ab initio molecular
dynamics. WIREs Computational Molecular Science 2021, 11 (2), e1494. DOI:
https://doi.org/10.1002/wcms.1494. Ufimtsev, I. S.; Martinez, T. J. Quantum Chemistry on
Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First
Principles Molecular Dynamics. J Chem Theory Comput 2009, 5 (10), 2619-2628. DOI:
(29) Hay, P. J.; Wadt, W. R. Ab initio effective core potentials for molecular calculations.
Potentials for K to Au including the outermost core orbitals. The Journal of chemical physics
1985, 82 (1), 299-310.
(30) Saunders, V. R.; Hillier, I. H. A “Level–Shifting” method for converging closed shell
Hartree–Fock wave functions. International Journal of Quantum Chemistry 1973, 7 (4), 699-705.
(31) Ioannidis, E. I.; Gani, T. Z. H.; Kulik, H. J. molSimplify: A Toolkit for Automating
Discovery in Inorganic Chemistry. J. Comput. Chem. 2016, 37, 2106-2117. DOI:
(32) Wang, L.-P.; Song, C. Geometry optimization made simple with translation and rotation
coordinates. The Journal of Chemical Physics 2016, 144 (21), 214108.
(33) Duan, C.; Janet, J. P.; Liu, F.; Nandy, A.; Kulik, H. J. Learning from Failure: Predicting
Electronic Structure Calculation Outcomes with Machine Learning Models. Journal of Chemical
Theory and Computation 2019, 15 (4), 2331-2345. DOI: 10.1021/acs.jctc.9b00057.
(34) Smith, D. G. A.; Burns, L. A.; Simmonett, A. C.; Parrish, R. M.; Schieber, M. C.; Galvelis,
R.; Kraus, P.; Kruse, H.; Di Remigio, R.; Alenaizan, A.; et al. Psi4 1.4: Open-source software for
high-throughput quantum chemistry. J Chem Phys 2020, 152 (18), 184108. DOI:
(35) Perdew, J. P.; Schmidt, K. Jacob's Ladder of Density Functional Approximations for the
Exchange-Correlation Energy. Density Functional Theory and Its Application to Materials 2001,
(36) Shee, J.; Arthur, E. J.; Zhang, S.; Reichman, D. R.; Friesner, R. A. Phaseless Auxiliary-
Field Quantum Monte Carlo on Graphical Processing Units. J Chem Theory Comput 2018, 14 (8),
4109-4121. DOI: 10.1021/acs.jctc.8b00342.
(37) Bergstra, J.; Yamins, D.; Cox, D. D. Hyperopt: A Python Library for Optimizing the
Hyperparameters of Machine Learning Algorithms. In Proceedings of the 12th Python in science
conference, 2013; pp 13-20.
(38) Pytorch. https://pytorch.org/ (accessed 2022 July 7, 2022).
Supplementary Information of “A transferable recommender approach for selecting the best
density functional approximations in chemical discovery”
Chenru Duan1,2, Aditya Nandy1,2, Ralf Meyer1, Naveen Arunachalam1, and Heather J. Kulik1,2
1Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA
2Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
The following is the list of abbreviation utilized in the main paper.
• DFA: Density functional approximation
• MAE: Mean absolute error
• VSS: vertical spin splitting
• CSD: Cambridge Structural Database
• HF: Hartree-Fock
• LRC: Long-range correction
• RS: Range-separated
• GGA: General gradient approximation
Supplementary Figure 1. Standard deviation of the absolute error for the top-5 DFAs.
Normalized distributions of standard deviation (std. dev.) of |DDEH–L| for the top-5 DFAs. A kernel
density estimate (black) is also shown. It can be observed that the differences of |DDEH–L| for the
top-5 DFAs are mostly small (i.e., to 1.0 kcal/mol) for TMCs in VSS-452 set.
Supplementary Figure 2. 20 ligands used in VSS-452. The 20 small monodentate ligands
resembling those from the spectrochemical series that are used for VSS-452 set. The coordinating
atom is shaded in gray for each ligand.
Supplementary Figure 3. MAEs derived from the 48 DFAs compared to DLPNO-CCSD(T).
MAE of |DDEH–L| for the 48 DFAs of the VSS-452 set. The DFAs are sorted in an ascending order
of their DFA-derived MAEs.
Supplementary Figure 4. MAEs derived from the 48 TL models compared to DLPNO-
CCSD(T). MAE of |DDEH–L| for the 48 TL models of the set-aside test set of VSS-452. The DFAs
are sorted in an ascending order of their TL MAEs.
Supplementary Figure 5. Theoretical bound of recommender approach with 48 DFAs. MAE
of |DDEH–L| constrained by only selecting a random top-n DFA (black dots and solid line) among
the 48 DFAs. The lower bound (green line), upper bound (red line), and the feasible area (gray
shaded) are also shown when the DFA selection is constrained in top-n choices.
Supplementary Figure 6. Box plot for the performance of 5th-best DFA. MAE for the 5th-best
DFA of |DDEH-L| with a box indicating their median (solid line) and mean (dashed line) for the VSS-
Supplementary Figure 7. Recommender rank-ordering performance with 48 DFAs.
Normalized distribution of the rank for selected DFA using our recommender approach, with the
cumulative percentage (blue solid line) shown according to the axis on the right. The cumulative
curve for a random guess (blue dashed line) is also shown.
Supplementary Figure 8. Recommender rank-ordering performance with 23 DFAs from
Duan et al.1. Normalized distribution of the rank for selected DFA using our recommender
approach, with the cumulative percentage (blue solid line) shown according to the axis on the right.
The cumulative curve for a random guess (blue dashed line) is also shown.
Supplementary Figure 9. Example TMCs in CSD-76. A Co complex with two tridentate ligands
(refcode: BEBJIB, left), a Co complex with one tetradentate and one bidentate ligand (refcode:
CEWTUT, middle), and a Fe complex with a hexadentate ligand (refcode: KIJNEW, right). All
atoms are colored as follows: pink for Co, orange for Fe, yellow for S, gray for C, blue for N, red
for O, and white for H.
Supplementary Figure 10. MAEs derived from the 48 TL models compared to DLPNO-
CCSD(T). MAE of |DDEH–L| for the 48 TL models of the CSD-76 set. The DFAs are sorted in an
ascending order of their TL MAEs.
Supplementary Figure 11. MAEs derived from the 48 DFAs compared to DLPNO-CCSD(T).
MAE of |DDEH–L| for the 48 DFAs of the CSD-76 set. The DFAs are sorted in an ascending order
of their DFA-derived MAEs.
Supplementary Figure 12. Comparison of ranking statistics for recommended DFAs of set-
aside test VSS-452 and CSD-76. Percentage for the DFA recommender to select top-n DFAs for
the set-aside test set of VSS-452 (red circles) and out-of-distribution CSD-76 (gray bars).
Table 1. Summary of 23 DFAs in the original work of Duan et al.1, including their rungs on
“Jacob’s ladder” of DFT, HF exchange fraction, LRC range-separation parameter (bohr-1), MP2
correlation fraction, and whether empirical (i.e., D3) dispersion correction is included.
Table 2. Summary of the additional 25 DFAs compared to Duan et al.1, including their rungs on
“Jacob’s ladder” of DFT, Hartree–Fock (HF) exchange fraction, long-range correction (LRC)
range-separation parameter (bohr−1), MP2 correlation fraction, whether empirical (i.e., D3)
dispersion correction is included.
Table 3. Ranking of DFA-derived MAEs of 48 DFAs on VSS-452 and CSD-76.
MAE ranking in VSS-452
MAE ranking in CSD-76
Table 4. Summary of data filtering statistics for VSS-452 and CSD-76.
converged with good geometry and
1. Duan, C.; Chen, S.; Taylor, M. G.; Liu, F.; Kulik, H. J., Machine learning to tame divergent density
functional approximations: a new path to consensus materials design principles. Chem Sci 2021, 12 (39),
2. Perdew, J. P., Density-Functional Approximation for the Correlation-Energy of the
Inhomogeneous Electron-Gas. Physical Review B 1986, 33 (12), 8822-8824.
3. Becke, A. D., Density-Functional Exchange-Energy Approximation with Correct Asymptotic-
Behavior. Physical Review A 1988, 38 (6), 3098-3100.
4. Devlin, F. J.; Finley, J. W.; Stephens, P. J.; Frisch, M. J., Ab-Initio Calculation of Vibrational
Absorption and Circular-Dichroism Spectra Using Density-Functional Force-Fields - a Comparison of
Local, Nonlocal, and Hybrid Density Functionals. Journal of Physical Chemistry 1995, 99 (46), 16883-
5. Miehlich, B.; Savin, A.; Stoll, H.; Preuss, H., Results Obtained with the Correlation-Energy Density
Functionals of Becke and Lee, Yang and Parr. Chemical Physics Letters 1989, 157 (3), 200-206.
6. Perdew, J. P.; Burke, K.; Ernzerhof, M., Generalized Gradient Approximation Made Simple.
Physical Review Letters 1996, 77 (18), 3865-3868.
7. Tao, J. M.; Perdew, J. P.; Staroverov, V. N.; Scuseria, G. E., Climbing the Density Functional
Ladder: Nonempirical Meta-Generalized Gradient Approximation Designed for Molecules and Solids.
Physical Review Letters 2003, 91 (14), 146401.
8. Sun, J. W.; Ruzsinszky, A.; Perdew, J. P., Strongly Constrained and Appropriately Normed
Semilocal Density Functional. Physical Review Letters 2015, 115 (3), 036402.
9. Zhao, Y.; Truhlar, D. G., A New Local Density Functional for Main-Group Thermochemistry,
Transition Metal Bonding, Thermochemical Kinetics, and Noncovalent Interactions. Journal of Chemical
Physics 2006, 125 (19), 194101.
10. Yu, H. S.; He, X.; Truhlar, D. G., MN15-L: A New Local Exchange-Correlation Functional for Kohn-
Sham Density Functional Theory with Broad Accuracy for Atoms, Molecules, and Solids. Journal of
Chemical Theory and Computation 2016, 12 (3), 1280-1293.
11. Becke, A. D., Density-Functional Thermochemistry. III. The Role of Exact Exchange. Journal of
Chemical Physics 1993, 98 (7), 5648-5652.
12. Lee, C.; Yang, W.; Parr, R. G., Development of the Colle-Salvetti Correlation-Energy Formula into
a Functional of the Electron Density. Physical Review B 1988, 37, 785--789.
13. Stephens, P. J.; Devlin, F. J.; Chabalowski, C. F.; Frisch, M. J., Ab Initio Calculation of Vibrational
Absorption and Circular Dichroism Spectra Using Density Functional Force Fields. The Journal of Physical
Chemistry 1994, 98 (45), 11623-11627.
14. Perdew, J. P.; Chevary, J. A.; Vosko, S. H.; Jackson, K. A.; Pederson, M. R.; Singh, D. J.; Fiolhais, C.,
Atoms, Molecules, Solids, and Surfaces - Applications of the Generalized Gradient Approximation for
Exchange and Correlation. Physical Review B 1992, 46 (11), 6671-6687.
15. Adamo, C.; Barone, V., Toward Reliable Density Functional Methods Without Adjustable
Parameters: The PBE0 Model. Journal of Chemical Physics 1999, 110 (13), 6158-6170.
16. Chai, J. D.; Head-Gordon, M., Systematic Optimization of Long-Range Corrected Hybrid Density
Functionals. Journal of Chemical Physics 2008, 128 (8), 084106.
17. Rohrdanz, M. A.; Martins, K. M.; Herbert, J. M., A Long-Range-Corrected Density Functional That
Performs Well for Both Ground-State Properties and Time-Dependent Density Functional Theory
Excitation Energies, Including Charge-Transfer Excited States. Journal of Chemical Physics 2009, 130 (5),
18. Hui, K.; Chai, J. D., SCAN-based Hybrid and Double-Hybrid Density Functionals From Models
Without Fitted Parameters. Journal of Chemical Physics 2016, 144 (4), 044114.
19. Zhao, Y.; Truhlar, D. G., The M06 Suite of Density Functionals for Main Group Thermochemistry,
Thermochemical Kinetics, Noncovalent Interactions, Excited States, and Transition Elements: Two New
Functionals and Systematic Testing of Four M06-Class Functionals and 12 Other Functionals. Theoretical
Chemistry Accounts 2008, 120 (1-3), 215-241.
20. Yu, H. Y. S.; He, X.; Li, S. H. L.; Truhlar, D. G., MN15: A Kohn-Sham Global-Hybrid Exchange-
Correlation Density Functional With Broad Accuracy for Multi-Reference and Single-Reference Systems
and Noncovalent Interactions. Chemical Science 2016, 7 (9), 6278-6279.
21. Karton, A.; Tarnopolsky, A.; Lamere, J. F.; Schatz, G. C.; Martin, J. M. L., Highly Accurate First-
Principles Benchmark Data Sets for the Parametrization and Validation of Density Functional and Other
Approximate Methods. Derivation of a Robust, Generally Applicable, Double-Hybrid Functional for
Thermochemistry and Thermochemical Kinetics. Journal of Physical Chemistry A 2008, 112 (50), 12868-
22. Bremond, E.; Adamo, C., Seeking for Parameter-Free Double-Hybrid Functionals: The PBE0-DH
Model. Journal of Chemical Physics 2011, 135 (2), 024106.
23. Kozuch, S.; Martin, J. M. L., Spin-Component-Scaled Double Hybrids: An Extensive Search for the
Best Fifth-Rung Functionals Blending DFT and Perturbation Theory. Journal of Computational Chemistry
2013, 34 (27), 2327-2344.