Content uploaded by Michael S. Murillo
Author content
All content in this area was uploaded by Michael S. Murillo on Aug 20, 2020
Content may be subject to copyright.
Machine Learning Discovery of Computational Model Efficacy Boundaries
Michael S. Murillo ,1,* Mathieu Marciante ,2,†and Liam G. Stanton 3,‡
1Department of Computational Mathematics, Science and Engineering, Michigan State University,
East Lansing, Michigan 48824, USA
2CEA-DAM, DIF F-91297 Arpajon, France
3Department of Mathematics and Statistics, San Jos ´e State University, San Jos´e, California 95192, USA
(Received 28 December 2019; accepted 30 July 2020; published 20 August 2020)
Computational models are formulated in hierarchies of variable fidelity, often with no quantitative rule
for defining the fidelity boundaries. We have constructed a dataset from a wide range of atomistic
computational models to reveal the accuracy boundary between higher-fidelity models and a simple, lower-
fidelity model. The symbolic decision boundary is discovered by optimizing a support vector machine on
the data through iterative feature engineering. This data-driven approach reveals two important results:
(i) a symbolic rule emerges that is independent of the algorithm, and (ii) the symbolic rule provides a deeper
understanding of the fidelity boundary. Specifically, our dataset is composed of radial distribution functions
from seven high-fidelity methods that cover wide ranges in the features (element, density, and temperature);
high-fidelity results are compared with a simple pair-potential model to discover the nonlinear combination
of the features, and the machine learning approach directly reveals the central role of atomic physics in
determining accuracy.
DOI: 10.1103/PhysRevLett.125.085503
Computational models of physical systems vary
markedly in accuracy and attainable scales. The costs
associated with high-fidelity (HF) models drive the need
for accurate surrogate models as well as methods that
combine fidelities [1–3]. Unfortunately, there are no simple
rules that determine the “fidelity boundary”among all
available models. Here, we construct a symbolic machine-
learning framework with the goal of discovering the fidelity
boundary between HF and low-fidelity (LF) computational
models. For our purposes, we employ HF models that
resolve atomic scales and include electronic-structure
methods that generate on-the-fly potentials. Such HF
models incur costs associated with shorter timescales
and length scales, reduced statistical convergence, and
fewer cases, among other difficulties. Choosing the optimal
fidelity level allows these costs to be minimized; in some
cases, the accessible physics phenomena can be qualita-
tively different when using a LF model. For example, the
number of particles used in HF models [4,5] is typically
many orders of magnitude lower than that of LF models
[6,7], and compromises can often be made [8] to access
important heterogeneous, nonequilibrium mesoscale
phenomena.
Machine learning (ML) offers a set of tools that
potentially provide novel approaches to solving such
problems. Increasingly, ML is being used to tackle a
wide range of problems in physics, including predi-
cting disruptions in burning plasmas [9], modeling ioniza-
tion energies [10], accelerating molecular dynamics (MD)
[11], enhancing many-body sampling techniques [12],
coarse-graining molecular force fields [13], learning coher-
ent structure from spatiotemporal data [14], and aiding
inertial-confinement-fusion experimental design [15],
among many others. Here, we propose to use ML not as
a deployable algorithm that can be used to make predic-
tions, but as a data-driven discovery framework that assigns
accuracy scores to our hypotheses, allowing us to discover
symbolic rules that are then independent of the specific ML
algorithms employed.
To date, most computational physics communities do not
generate and gather results with data science in mind. For
this reason, we constructed a dataset from the extant
literature, focusing on methods from the high energy-
density community because of the range of features
available, which are the element studied, the density, and
the temperature; in thermodynamic equilibrium for a single
species, these are the only three quantities needed.
The most commonly reported quantity is the equilibrium
ion-ion radial distribution function (RDF) gðrÞ;gðrÞvalues
were digitized, and the height of the first peak was used as
our metric for accuracy, as this is where the largest
deviation between the RDFs of two models will typically
occur. While other quantities could have been chosen, gðrÞ
plays a central role in determining most equilibrium
quantities, and its peak position and height are well studied,
with the height being the more sensitive of the two
quantities [16] for most materials. (The complete dataset
is available at GitHub [17].) One-hot encoding is used to
map the ratio of the peak heights into binary form, with 0
for inaccurate and 1 for accurate, for an accuracy target,
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
0031-9007=20=125(8)=085503(6) 085503-1 © 2020 American Physical Society
which was taken to vary in the range 5%–15% in this work,
unless otherwise specified; this process converts the physi-
cal data into a classification problem. RDFs were obtained
from Kohn-Sham density functional theory molecular
dynamics (KS-DFT-MD) [18–24], orbital-free density
functional theory (DFT) [25–27], classical-map hyper-
netted chain [28,29], linear-response effective ions [30],
quantum Langevin MD [31], dynamically screened ion-ion
interactions [32], and quantum-statistical-potential MD
[33]. An initial exploration of the data revealed several
cases in which either no LF model would suffice (e.g., the
presence of molecular states) or there was an obvious error
(e.g., the RDF did not tend to unity), and these cases were
removed to leave 34 RDFs in our dataset. Our final
database reflected the diversity we desired to mitigate
inaccuracies in the data and fidelity variations among the
HF models.
Assessing fidelity requires a LF model, the simplest of
which is the Yukawa model, which is defined in terms of a
two-step process [8]. First, the physical domain of Nnuclei
is decomposed into Nspheres, each with the ion-sphere
radius a¼ð3=4πnÞ1=3. An all-electron electronic structure
calculation is then performed around each central nucleus,
where, using a suitable definition, the electrons are de-
composed into separate densities that are either strongly or
weakly interacting with the nucleus. The strongly interact-
ing electrons are assumed to be localized near the nucleus,
and their impact is to convert the nuclear charge Ze to an
ionic charge hZie. Conversely, the weakly interacting
electrons are treated in a long-wavelength linear response
model to obtain the electronic screening cloud, with
screening length λ, around the ionic core. This procedure
yields the Yukawa ion-ion pair interaction energy between
ions
uYðrÞ¼hZi2e2
rexp ð−r=λÞ;ð1Þ
which we take as our LF model. In this work, we employed
the simplest choices for the Yukawa parameters, which are
the Thomas-Fermi values of hZiand λ[8]; our goal here is
not to develop a new pair potential, but to examine how to
establish a physical accuracy rule from data using the most
widely used LF model. Yukawa RDFs were computed
using standard pair-potential MD simulations.
Two examples from the dataset are shown in Fig. 1.
Here, the HF methods KSMD [20] and QLMD [31] were
each used for two densities and temperatures. Note that the
hydrogen case is accurate for a very low temperature, but is
at an elevated density. In contrast, at much higher tempera-
tures, the Yukawa models fail to reproduce the iron results,
with moderate improvement at 10 eV. (More examples are
shown in the Supplemental Material [34].)
An alternative view of the dataset is visualized in Fig. 2.
Points are labeled as either accurate (red), where the LF
model agrees with the HF model (peak heights are within
5%), or inaccurate (blue), where the LF model does not
agree with the HF model. The upper left panel indicates that
our dataset has good coverage across temperature and
density, and that, perhaps surprisingly, no accuracy trend is
found in this plane. The next three panels reinforce this
conclusion by revealing that there is no trend in accuracy
versus temperature, density, or nuclear charge; therefore, it
is not possible to know the accuracy of the LF (Yukawa)
model based on any of these features alone.
Any ML classifier employing linear separability (a
vertical line for this 1D example) would fail; a better
approach would be to seek probability distributions using
logistic regression (LR); the LR predictions are shown as
FIG. 1. Example RDFs from the dataset: Representative RDFs
are shown for hydrogen [20] and iron [31] at various densities and
temperatures. Two curves are shown in each panel, corresponding
to the HF method (solid or black curve) and our base LF Yukawa
model (dashed or red curve).
FIG. 2. Trends in the dataset: Data points in the T−ρplane are
shown in the upper left plot, revealing good coverage within the
dataset. Red (larger) points and blue (smaller) points are accurate
and inaccurate, respectively, with accuracy defined here as
agreement in peak height within 5%. In the next three panels,
accuracy is plotted versus temperature, density, and nuclear
charge, showing that no simple rule for assessing accuracy
exists. The green curves show the results of a 1D (single-feature)
logistic regression. Note that some of the points overlap, which is
indicated through the intensity of the color.
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
085503-2
solid green lines in Fig. 2. Because of the dearth of data,
these results are only notional, but they reveal the following
rough trends. The LR curve obtained using only the
temperature feature is moderately flat, and its trend is
dominated by a single data point. The density feature yields
a very flat probability distribution, indicating no predictive
power. Finally, the nuclear-charge feature is also moder-
ately flat, with a rough trend towards increased accuracy for
lower-Zelements. (Alternate visualizations, and an appli-
cation to transport [31], are given in the Supplemental
Material [34].) We conclude that none of these three
features alone can predict the fidelity boundary and that
simple ML approaches are not particularly useful. Similar
studies were carried out in two dimensions, using pairs of
features, and in three dimensions, with similar results.
We developed a workflow to build new features in
higher-dimensional spaces. Our ML workflow is shown
in Fig. 3. The goal is to engineer features that yield
human-interpretable accuracy boundaries. We employ a
combination of feature engineering [35], feature selection
[36,37], and a linear classifier (see below) to create a
symbolic result [38]. To generate a physically meaningful
symbolic representation of the decision boundary, we begin
with the three basic features of temperature T, mass density
ρ, and nuclear charge Zto form our basic feature set
F0¼fT; ρ;Zg. As no additional physics information
exists beyond F0, we engineer new features from F0.
These new features are nonlinear combinations of those in
F0, much like those generated in kernel methods. Note that
we employ only the three most obvious and most basic
features so as not to bias the method toward requiring
specific domain knowledge of this example application.
Because our goal is a symbolic classifier, we do not
employ nonlinear ML algorithms (e.g., kernel methods,
neural networks) [39]. Rather, we employ a linear support
vector machine (lSVM) to create a linear separability
boundary in the high-dimensional space of our engineered
features. The lSVM hyperparameter Cwas optimized. The
coefficients are the weights of the nonlinear features that we
use to assign importance to. The lSVM is used in a
workflow that uses cross validation (CV) and recursive
feature elimination (RFE). RFE ranks the importance of
each feature, and CV informs us of the quality of the
prediction. This scheme is an adaptation of the use of lSVM
with RFE to down-select feature spaces as a preprocessing
step for an expensive ML algorithm; here, by adding
new nonlinear features, this scheme is essentially reversed
to create additional features that have better performance.
CV guards against overfitting by learning from various
subsets of the data and predicting the remaining
data, thereby quantifying generalizabilty as part of the
workflow.
It is difficult to represent division in ML algorithms
[40,41], so we augment Fwith inverses to extend
our feature set to Fbase ¼fT;ρ;ρ−1;Z;Z
−1g. Feature
scaling was examined with no noticeable improve-
ment except for the replacement T→logðTÞ, yielding
F¼flogðTÞ;ρ;ρ−1;Z;Z
−1g. Because the logarithm of
T−1is trivially −logðTÞ, we did not include T−1in the
feature set; thus, the three physical dimensions inherent
in F0are transformed into a 5D feature space. Next,
we construct all second-order polynomials from this
feature set to project into a much higher-dimensional
feature space containing all bilinear combinations
of the features and squares of the basic features;
for example, for the simplest case of F0we obtain
Fpoly ¼f1;T;ρ;Z;T2;Tρ;TZ;ρ2;ρZ; Z2g; importantly,
note that constants are included. Polynomial terms con-
structed from the feature vector Fcan be itemized
according to importance through RFE, which yields the
symbolic result we seek.
In practice, an iterative approach was used to find the
best combination of the basic features by updating the
feature vector based on the current best features:
Fn→Fnþ1. For example, RFE revealed that the square
of logðTÞwas a strong feature, and thus, the feature space
Fwas updated to include this feature. This iterative
procedure, which we call “recursive feature updating”
(RFU), allows for higher-order powers to appear, retains
the best features, and forces new feature rankings.
Eventually, products such as logðTÞ=Z were identified as
strong features, and RFU led to the inequality
ξ¼log2ðT=eVÞðρþ10Þ=ðg=cm3Þ
Z>2.0;ð2Þ
which gave >90% accuracy on our dataset. The ratio of
peak heights is shown versus (2) in Fig. 4, which reveals
that there is a clear boundary that separates inaccurate
predictions for small values of ξand accurate predictions
for larger values of ξ.
The decision boundary implied by ξin temperature-
density space is shown in Fig. 5. In contrast to other
metrics, such as the Coulomb coupling or degeneracy
boundaries [42] that imply that very high temperatures are
required at high density, the temperature at which a LF
model is appropriate occurs at lower with densities. This
FIG. 3. Machine-learning workflow: Our symbolic machine-
learning workflow is an iterative procedure that constructs the
best features from physical features (possibly scaled), their
inverses, and polynomial combinations. Recursive feature elimi-
nation is used to sort the quality of the features, which leads to a
new set of features.
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
085503-3
result can be understood in the context of modern
computational methods in which MD simulations of
simple properties like gðrÞare now ubiquitous: the use
of MD “solves”the ionic strongly coupling “problem,”
which no longer adds to our uncertainty. Similarly, the use
of Thomas-Fermi inputs, which are widely available,
solves the high-density problem, because the Thomas-
Fermi model becomes more accurate at higher density.
Our RFU ML approach has naturally found these trends
from the data.
While the RFU-based ML approach described above
yields a symbolic separation boundary that can be applied
independently of the lSVM used to find it, we sought
further insight into the physics. The result (2) shows that
simpler computational methods can be used when the
temperature is high and the density is high and the nuclear
charge is low. This particular combination of features is
precisely what controls the mean ionization state (MIS)
[43] of the material.
To examine this potential finding, we again form a single
feature ζand plot accuracy versus ζin Fig. 6, which should
be compared with Fig. 4. From this figure, we find an
accuracy boundary of
ζ¼hZi
Z>0.35:ð3Þ
Note that we use the fairly conservative definition of
accuracy of 10% agreement for the first peak height;
moreover, this result is conservative because some of the
fluctuations in Fig. 6may be due to imperfect (e.g., finite-
size errors) data in the database. Taken together, the two
rules (2) and (3) lead to the conclusion that neither
temperature nor density alone, nor a combination of the
two, leads to an accuracy boundary for the Yukawa model,
but rather atomic physics: the rule states that if the material
is more than half ionized, a much faster computational
model can be used. This result illustrates how the ML found
a physical feature that might have been used in the original
set of features, thereby empowering the ML with physics
guidance based on expert knowledge; here, we made no
attempt to bias the learning other than through the three
most basic features.
In summary, we have examined a framework in which
accuracy scores from ML can be used with feature
engineering and extraction to identify a symbolic boundary
using easily accessible ML libraries. To illustrate this
approach, we constructed a dataset consisting of RDFs
obtained using a wide variety of HF computational methods
and compared them with predictions from a LF model.
Simple analyses, such as LR, showed that the basic
physical features fZ; ρ;Tgare not predictive as unary
features or in pairs. More powerful ML approaches,
FIG. 5. Boundary in T−ρspace: The decision boundary is
shown for three elements, hydrogen, carbon and aluminum, in the
temperature-density plane. LF models are expected to be accurate
above the line. These curves capture the obvious trends that LF
models are applicable for higher densities (Thomas-Fermi limit),
lower nuclear charges, and higher temperatures.
FIG. 6. Mean ionization state boundary: The ratio of gðrÞpeak
heights (HF divided by LF) are shown versus the discovered
parameter (3). The colored bands indicate accuracy ranges of 5%,
10%, and 15%.
FIG. 4. Machine-learning boundary: The ratio of gðrÞpeak
heights (HF divided by LF) are shown versus the discovered
parameter ξin Eq. (2). The colored bands indicate accuracy
ranges of 5%, 10%, and 15%. The inequality for ξin Eq. (2)
arises from drawing a vertical line near the erroneous points on
the left.
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
085503-4
however, achieved a moderate accuracy in two dimensions
(considering pairs of features). In three dimensions, high
accuracy can be achieved with nonlinear ML algorithms,
although these algorithms do not reveal the decision
boundary in an interpretable way.
By considering various polynomial combinations of
features, including division, and excising weak features,
we find that the decision boundary is given symbolically as
log2ðTÞðρþ10Þ=Z. We find that this decision boundary is
closely connected to the MIS and propose a related
criterion ζ¼hZi=Z that is based on atomic physics. The
reason that atomic physics (and ionization in particular) is
the key physics involved here is that all modern methods
naturally capture ionic strong coupling and, at high enough
temperature and/or density, the free electrons are captured
well in a Thomas-Fermi approximation. This finding
suggests that pair potentials that treat the bound electrons
with much higher fidelity [28] would potentially greatly
expand the Yukawa accuracy regime shown in Fig. 5,
allowing for significantly larger simulations with little cost
to accuracy; from an uncertainty quantification perspective
[44–46], highly converged pair-potential MD could com-
pete with HF methods in some cases. In particular, based on
the insensitivity of disparate models to the MIS [43] and to
gradient corrections in the screening [47], sensitivity to
atomic physics suggests that the most important improve-
ment to Yukawa would be a more refined pseudopotential.
For example, our original database was larger than we
present here, but many of the HF results were not properly
converged (e.g., too noisy to establish a peak height),
and we were unable to use such results. Through such
improved potentials with orders of magnitude more
particles and timesteps, qualitatively different hetero-
geneous, nonequilibrium studies [8] can be performed at
the mesoscale.
The results here suggest that a more concerted effort
should be made in the computational communities to
produce high-quality data. In particular, we found that
the density ρwas a generally weak feature, although it
appears linearly in our decision boundary. Unfortunately,
most results in the literature do not systematically explore
wide density variations and report RDFs across
those variations. For example, the MIS is not monotonic
in ρ[43], although the dataset we employed suggests that it
is; the low-density portion of Fig. 5is likely the most
uncertain for these reasons. Ideally, more studies that vary
all features in F0, such as a fT; ρ;Zggrid of highly
converged HF RDFs and velocity autocorrelation functions
motivated by Fig. 5, would improve our ability to allow ML
techniques to improve our understanding of computational
techniques and the physics they address. Based on the
results of this work, we propose a dataset minimally of
the form T¼f1;5;10;20;50geV, Z¼f1;4;6;13;26g,
ρ=ρ0¼f0.1;0.5;1;2;10g, where ρ0is the standard density
of the material. Most important are density variations,
which are less commonly explored in the current literature;
moreover, building databases with more challenging quan-
tities, such as the velocity autocorrelation function, would
further strengthen the quality of future ML studies. With a
concerted effort, using a wide range of interactions beyond
Yukawa to produce high-quality data, the workflow in
Fig. 3can be adapted to a wider range of problems [48].
M. S. Murillo acknowledges support from the Air Force
Office of Scientific Research through Grant No. FA9550-
17-1-0394.
*Corresponding author.
murillom@msu.edu
†mathieu.marciante@cea.fr
‡liam.stanton@sjsu.edu
[1] M. Razi, A. Narayan, R. M. Kirby, and D. Bedrov,
Fast predictive models based on multi-fidelity sampling
of properties in molecular dynamics simulations,
Comput. Mater. Sci. 152, 125 (2018).
[2] G. Pilania, J. E. Gubernatis, and T. Lookman, Multi-fidelity
machine learning models for accurate bandgap predictions
of solids, Comput. Mater. Sci. 129, 156 (2017).
[3] M. Fernández-Godino, C. Park, N.-H. Kim, and R. T.
Haftka, Review of multi-fidelity models, arXiv:1609.07196.
[4] L. K. Wagner and D. M. Ceperley, Discovering correlated
fermions using quantum Monte Carlo, Rep. Prog. Phys. 79,
094501 (2016).
[5] K. P. Driver, F. Soubiran, and B. Militzer, Path integral
Monte Carlo simulations of warm dense aluminum,
Phys. Rev. E 97, 063207 (2018).
[6] J. R. Perilla, B. C. Goh, C. K. Cassidy, B. Liu, R. C.
Bernardi, T. Rudack, H. Yu, Z. Wu, and K. Schulten,
Molecular dynamics simulations of large macromolecular
complexes, Curr. Opin. Struct. Biol. 31, 64 (2015).
[7] T. C. Germann and K. Kadau, Trillion-atom molecular
dynamics becomes a reality, Int. J. Mod. Phys. C 19,
1315 (2008).
[8] L. G. Stanton, J. N. Glosli, and M. S. Murillo, Multiscale
Molecular Dynamics Model for Heterogeneous Charged
Systems, Phys. Rev. X 8, 021044 (2018).
[9] J. Kates-Harbeck, A. Svyatkovskiy, and W. Tang, Predicting
disruptive instabilities in controlled fusion plasmas through
deep learning, Nature (London) 568, 526 (2019).
[10] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von
Lilienfeld, Fast and Accurate Modeling of Molecular
Atomization Energies with Machine Learning, Phys. Rev.
Lett. 108, 058301 (2012).
[11] V. Botu and R. Ramprasad, Adaptive machine learning
framework to accelerate ab initio molecular dynamics, Int. J.
Quantum Chem. 115, 1074 (2015).
[12] F. No´e, S. Olsson, J. Köhler, and H. Wu, Boltzmann
generators: Sampling equilibrium states of many-body
systems with deep learning, Science 365, eaaw1147 (2019).
[13] J. Wang, S. Olsson, C. Wehmeyer, A. Perez, N. E. Charron,
G. De Fabritiis, F. Noe, and C. Clementi, Machine learning
of coarse-grained molecular dynamics force fields,
ACS Central Sci. 5, 755 (2019).
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
085503-5
[14] A. Rupe, N. Kumar, V. Epifanov, K. Kashinath, O. Pavlyk,
F. Schlimbach, M. Patwary, S. Maidanov, V. Lee, J. P.
Crutchfieldet al., Disco: Physics-based unsupervised
discovery of coherent structures in spatiotemporal systems,
arXiv:1909.11822.
[15] J. L. Peterson, K. D. Humbird, J. E. Field, S. T. Brandon,
S. H. Langer, R. C. Nora, B. K. Spears, and P. T. Springer,
Zonal flow generation in inertial confinement fusion
implosions, Phys. Plasmas 24, 032702 (2017).
[16] T. Ott and M. Bonitz, First-principle results for the
radial pair distribution function in strongly coupled
one-component plasmas Contrib. Plasma Phys. 55, 243
(2015).
[17] https://github.com/MurilloGroupMSU.
[18] D. Hohl, V. Natoli, D. M. Ceperley, and R. M. Martin,
Molecular Dynamics in Dense Hydrogen, Phys. Rev. Lett.
71, 541 (1993).
[19] S. M. Younger, Many-atom screening effects on diffusion in
dense helium, Phys. Rev. A 45, 8657 (1992).
[20] J. Kohanoff and J.-P. Hansen, Statistical properties of the
dense hydrogen plasma: An ab initio molecular dynamics
investigation, Phys. Rev. E 54, 768 (1996).
[21] P. L. Silvestrelli, No evidence of a metal-insulator transition
in dense hot aluminum: A first-principles study, Phys. Rev.
B60, 16382 (1999).
[22] W. Lorenzen, B. Holst, and R. Redmer, First-order liquid-
liquid phase transition in dense hydrogen, Phys. Rev. B 82,
195107 (2010).
[23] K. U. Plagemann, P. Sperling, R. Thiele, M. P. Desjarlais, C.
Fortmann, T. Döppner, H. J. Lee, S. H. Glenzer, and R.
Redmer, Dynamic structure factor in warm dense beryllium,
New J. Phys. 14, 055020 (2012).
[24] H. Sun, D. Kang, J. Dai, W. Ma, L. Zhou, and J. Zeng,
First-principles study on equation of states and electronic
structures of shock compressed ar up to warm dense regime.
J. Chem. Phys. 144, 124503 (2016).
[25] F. Lambert, J. Cl´erouin, and G. Z´erah, Very-high-temperature
molecular dynamics, Phys. Rev. E 73, 016403 (2006).
[26] J. Cl´erouin, Cooking strongly coupled plasmas, Mol. Phys.
113, 2403 (2015).
[27] J. Cl´erouin, P. Arnault, C. Ticknor, J. D. Kress, and L. A.
Collins, Unified Concept of Effective One Component
Plasma for Hot Dense Plasmas, Phys. Rev. Lett. 116,
115003 2016.
[28] M. W. C. Dharma-Wardana, Electron-ion and ion-ion
potentials for modeling warm dense matter: Applications
to laser-heated or shock-compressed Al and Si, Phys. Rev. E
86, 036407 (2012).
[29] R. Bredow, T. h. Bornath, W.-D. Kraeft, M. W. C.
Dharma-wardana, and R. Redmer, Classical-map hyper-
netted chain calculations for dense plasmas, Contrib. Plasma
Phys. 55, 222 (2015).
[30] E. Liberatore, C. Pierleoni, and D. M. Ceperley, Liquid-
solid transition in fully ionized hydrogen at ultra-high
pressures, J. Chem. Phys. 134, 184505 (2011).
[31] J. Dai, Y. Hou, D. Kang, H. Sun, J. Wu, and J. Yuan,
Structure, equation of state, diffusion and viscosity of warm
dense Fe under the conditions of a giant planet core, New J.
Phys. 15, 045003 (2013).
[32] K. K. Mon, N. W. Ashcroft, and G. V. Chester, Core
polarization and the structure of simple metals,
Phys. Rev. B 19, 5103 (1979).
[33] J. P. Hansen and I. R. McDonald, Microscopic Simulation
of a Hydrogen Plasma, Phys. Rev. Lett. 41, 1379 (1978).
[34] See Supplemental Material at http://link.aps.org/
supplemental/10.1103/PhysRevLett.125.085503 for addi-
tional explorations of the data set and for an application
to transport.
[35] A. Zheng and A. Casari, Feature Engineering for Machine
Learning: Principles and Techniques for Data Scientists
(O’Reilly Media, Inc., California, 2018).
[36] I. Guyon and A. Elisseeff, An introduction to variable and
feature selection, J. Mach. Learn. Res. 3, 1157 (2003).
[37] J. Hua, Z. Xiong, J. Lowey, E. Suh, and E. R. Dougherty,
Optimal number of features as a function of sample size
for various classification rules, Bioinformatics 21, 1509
(2004).
[38] D. A. Augusto and H. J. C. Barbosa, Symbolic regression
via genetic programming, in Proceedings of the Sixth
Brazilian Symposium on Neural Networks, Rio de Janeiro,
RJ, Brazil (IEEE, 2000), Vol. 1, pp. 173–178.
[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn:
Machine learning in
PYTHON
, J. Mach. Learn. Res. 12, 2825
(2011).
[40] K.-Y. Siu, J. Bruck, T. Kailath, and T. Hofmeister, Depth
efficient neural networks for division and related problems,
IEEE Trans. Inf. Theory 39, 946 (1993).
[41] S. S. Sahoo, C. H. Lampert, and G. Martius, Learning
equations for extrapolation and control, arXiv:1806.07259.
[42] M. S. Murillo, Strongly coupled plasma physics and high
energy-density matter, Phys. Plasmas 11, 2964 (2004).
[43] M. S. Murillo, J. Weisheit, S. B. Hansen, and M. W. C.
Dharma-Wardana, Partial ionization in dense plasmas:
Comparisons among average-atom density functional
models, Phys. Rev. E 87, 063113 (2013).
[44] P. Angelikopoulos, C. Papadimitriou, and P. Koumoutsakos,
Bayesian uncertainty quantification and propagation in
molecular dynamics simulations: A high performance
computing framework, J. Chem. Phys. 137, 144103 (2012).
[45] P. N. Patrone, A. Dienstfrey, A. R. Browning, S. Tucker, and
S. Christensen, Uncertainty quantification in molecular
dynamics studies of the glass transition temperature,
Polymer 87, 246 (2016).
[46] P. Angelikopoulos, C. Papadimitriou, and P. Koumoutsakos,
Data driven, predictive molecular dynamics for nanoscale
flow simulations under uncertainty, J. Phys. Chem. B 117,
14808 (2013).
[47] L. G. Stanton and M. S. Murillo, Unified description of
linear screening in dense plasmas, Phys. Rev. E 91, 033104
(2015).
[48] W. Guodong, S. Lanxiang, W. Wei, C. Tong, G. Meiting,
and Z. Peng, A feature selection method combined with
ridge regression and recursive feature elimination in quan-
titative analysis of laser induced breakdown spectroscopy,
Plasma Sci. Technol 22, 074002 (2020).
PHYSICAL REVIEW LETTERS 125, 085503 (2020)
085503-6