Content uploaded by Waiching Sun
Author content
All content in this area was uploaded by Waiching Sun on Mar 21, 2022
Content may be subject to copyright.
International Journal for Numerical Methods in Engineering manuscript No.
(will be inserted by the editor)
Molecular dynamics inferred neural network models for ﬁnitestrain1
hyperelasticity of monoclinic crystals: Sobolev training and validations2
against physical constraints3
Nikolaos N. Vlassis ·Puhan Zhao ·Ran Ma ·Tommy4
Sewell ·WaiChing Sun5
6
January 30, 20227
Abstract
We present a machine learning framework to train and validate neural networks to predict the
8
anisotropic elastic response of a monoclinic organic molecular crystal known as Octogen (
β
HMX) in the
9
geometrical nonlinear regime. A ﬁltered molecular dynamic (MD) simulations database is used to train the
10
neural networks with a Sobolev norm that uses the stress measure and a reference conﬁguration to deduce
11
the elastic stored energy functional. To improve the accuracy of the elasticity tangent predictions originating
12
from the learned stored energy, a transfer learning technique is used to introduce additional tangential
13
constraints from the data while necessary conditions (e.g. strong ellipticity, crystallographic symmetry) for
14
the correctness of the model are either introduced as additional physical constraints or incorporated in the
15
validation tests. Assessment of the neural networks is based on (1) the accuracy with which they reproduce
16
the bottomline constitutive responses predicted by MD, (2) the robustness of the models measured by
17
detailed examination of their stability and uniqueness, and (3) the admissibility of the predicted responses
18
with respect to mechanics principles in the ﬁnitedeformation regime. We compare the neural networks’
19
training efﬁciency under different Sobolev constraints and assess the models’ accuracy and robustness
20
against MD benchmarks for βHMX.21
Keywords HMX, molecular dynamics, Sobolev training, hyperelasticity, deep learning22
1 Introduction23
Plasticbonded explosives (PBXs) are highly ﬁlled polymer composites in which crystallites of one or more
24
energetic constituents are held together by a continuous polymeric binder phase. Detonation initiation in
25
PBXs is often achieved by transmitting a mechanical shock wave into the explosive charge. Shock passage
26
leads to an abrupt increase in stress, strain, and temperature in the material. In thermodynamic terms, the
27
magnitude of the increase of these properties is given by the Hugoniot jump relations, which yield the
28
locus of thermodynamic states immediately behind the shock discontinuity as a function of the input shock
29
strength (with a parametric dependence on the initial thermodynamic state of the material). The elastic
30
properties of the constituents in a PBX play an important role in determining the states on the Hugoniot
31
locus. The most obvious connection is their appearance in the reactant equation of state (EOS). For a useful
32
summary, see Hooks et al [
1
]. The isotropic EOS can be built around the isothermal compression curve,
33
typically by ﬁtting
V=V(P)
to the 3rdorder BirchMurnaghan (BM) equation of state or some other
34
convenient functional form at room temperature or zero kelvin. For the BM EOS, the ﬁtting variables are
35
the bulk modulus
K
and the initial pressure derivative
K0
. More comprehensive models may account for
36
Nikolaos N. Vlassis, Ran Ma, WaiChing Sun (corresponding author)
Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, New York
Puhan Zhao, Tommy Sewell
Department of Chemistry, University of Missouri, Columbia, Missouri
2 Nikolaos N. Vlassis et al.
crystal elastic anisotropy by incorporating the full elasticity tensor. The advantage of incorporating the full
37
elasticity tensor is the higher ﬁdelity description of the elastic response. However, identifying necessary
38
material parameters may require inverse problems under shock conditions with precise measurement of
39
the pressure and temperature dependence of the elastic coefﬁcients. Hence, the material parameters are
40
often inferred from the results of molecular dynamics simulations instead of experiments [
2
]. Furthermore,
41
the possible coupling between the volumetric and deviatoric responses may make it even more difﬁcult to
42
formulate the proper inverse problem and determine the optimal set of parameters [3,4,5].43
The substance octahydro1,3,5,7tetranitro1,3,5,7tetrazocine (HMX, also called octogen due to the
44
symmetry of the molecular structure), as is the energetic constituent in many PBXs. HMX exhibits several
45
crystal polymorphs [
6
]. The thermodynamically stable form on the 300 K isotherm, for pressures between
46
0 and approximately 30 GPa is known as
β
HMX, for which the crystal structure is monoclinic with a
47
unit cell containing two molecules [
7
]. Numerous theoretical studies of HMX physical properties and
48
thermomechanical response to shocks have been reported; we do not discuss them here, but Das et al [
8
]
49
provide a recent entry point into that literature. All MD simulations discussed below were performed for
50
βHMX in the P21/nspace group setting.51
Previous work, such as Pereverzev and Sewell [
2
] for the case of
β
HMX, has obtained pressure and
52
temperaturedependent elastic coefﬁcients by applying small strain increments to a sample at thermal
53
equilibrium at the desired thermodynamic state and determining the corresponding stress and elasticity
54
tangential tensor at that state. Here we assume that the ﬁnite strain elasticity that of a Green elastic material or
55
hyperelastic material [
9
,
10
]. As such, we postulate that (1) the state of the stress in the current conﬁguration
56
can be solely determined by the state of the deformation of the current conﬁguration relative to one choice
57
of a reference conﬁguration such as the crystal lattice vectors at (300 K, 1 atm) and (2) there exists an elastic
58
stored energy functional of which the derivative with respect to the strain measure is the energyconjugated
59
stress measure. Compared with the former approach, which tabulates the elasticity tensor at prescribed
60
states for a given pressure and temperature, the hyperelasticity approach has several distinct advantages.
61
First, the prediction of the elastic strain energy, stress measure, and elastic tangential stress are all bundled
62
together into one scalarvalued tensor function, instead of separate calculations for stress and elastic tangent
63
that might not be consistent with each other. Second, unlike the more widely used tabular approach, the
64
hyperelasticity model does not require pressure as an input to predict elastic constitutive responses and
65
hence enables consistency easily. Finally, by assuming the existence of such an elastic stored energy, the
66
stability, and uniqueness of the constitutive responses as well as other attributes such as convexity, material
67
frame indifference, and symmetry can be more easily analyzed mathematically [10,3].68
Nevertheless, with a few exceptions, such as Holzapfel and Ogden [
11
], Holzapfel et al [
12
], and Latorre
69
and Mont
´
ans [
13
], the majority of hyperelasticity models are limited to isotropic materials or materials of
70
simple symmetry such as transverse isotropic and orthotropic. Hyperelastic models for materials of lower
71
symmetry such as monoclinic or triclinic are less common [
14
]. This can be attributed to the fact that the
72
strain and the stress for anisotropic materials are not necessarily coaxial, and handcrafting a mathematical
73
expression for the energy functional that leads to accurate predictions of stress and tangent, therefore,
74
becomes a challenging task. Alternatively, predictions of elastic responses can also be established via a
75
Gaussian process to generate constitutive laws (e.g. Frankel et al.[
15
] and Wang et al. [
16
], Fuhg and Bouklas
76
[
17
]). This family of nonparametric approaches is out of the scope of this study but will be considered in
77
the future.78
To overcome this technical barrier, we introduce a transfer learning approach that generates a neural
79
network model for the hyperelastic response of
β
HMX from molecular dynamics (MD) simulations. Our
80
new contributions, to the best knowledge of the authors, are listed below:81
1.
Traditional supervised learning approaches often employ objective/loss functions that match the stress
82
strain responses [
18
,
19
,
20
,
21
], the elastic stored energy [
22
,
23
], or matching the energy, stress, or elastic
83
tangent ﬁelds [
24
,
25
] with the raw data considered as the ground truth. Many of these supervised
84
learning models are obtained via training deep neural networks. This direct approach, however, is
85
not suitable for MD data where the change of one state to another will lead to ﬂuctuation that makes
86
direct Sobolev training not productive [
26
]. To overcome this problem, we introduce a pretraining step
87
in which the data are preprocessed through a ﬁlter and the underlying nonﬂuctuating patterns are
88
extracted to train the neural network models.89
Training and validation of ML anisotropic constitutive law for βHMX 3
2.
We introduce a transfer learning approach where the additional desirable attributes (e.g. frame invari
90
ance) and necessary conditions for the correctness of the constitutive laws (e.g. material symmetry) can
91
be enforced with a simple retraining.92
3.
We also introduce a posttraining validation procedure where the focus is not only on predicting stress
93
strain responses but on the desirable properties of the elastic tangential operator. To compare to the
94
previous literature that employs measures in the geometrical linear regime to measure anisotropy, we
95
introduce a reverse mapping [
27
] that generates the inﬁnitesimal smallstrain tangent from the ﬁnite
96
strain counterpart. With these metrics available, we can examine the convexity and strong ellipticity
97
of the learned function and also evaluate whether predicted constitutive responses exhibit the same
98
evolution
of anisotropy as the MD benchmark while ensuring that the ﬁltering process does not lead to
99
nonphysical responses at the continuum level. The accuracy of the model is assessed by comparing
100
MDsimulated and learned stresses as functions of strain, and by comparing the pressuredependent
101
tangent stiffness from the learned model against explicit predictions of the elastic tensor reported recently
102
[
2
] for
β
HMX states on the
300 K
hydrostatic isothermal compression curve. The latter comparison, in
103
particular, provides an incisive test of the accuracy of the learned functional, as this information was not
104
used explicitly as part of the training set.105
The rest of the paper is organized as follows. We ﬁrst provide a brief account of the database generation
106
procedure, including pertinent details of the MD simulations, the procedure to generate stressstrain data
107
from the MD predictions, and the procedure to ﬁlter out the highfrequency responses (Section 2). We
108
brieﬂy review the setup of our hyperelastic model (Section 3) and then outline the major ingredients for
109
the supervised learning of the hyperelastic energy functional, including the Sobolev training, the Hessian
110
sampling techniques for controlling the higherorder derivatives and the way to incorporate the physical
111
constraints in the training procedure (Section 4). This section is followed by the validation procedure that
112
tests the attributes of the learned hyperelasticity models with physical constraints not included in the
113
training problems (Section 5). The results of the numerical experiments are reported in Section 6followed
114
by concluding remarks in Section 7.115
As for notations and symbols, boldfaced and blackboard boldfaced letters denote tensors (including
116
vectors which are rankone tensors); the symbol ’
·
’ denotes a single contraction of adjacent indices of two
117
tensors (e.g.,
a·b=aibi
or
c·d=cij djk
); the symbol ‘:’ denotes a double contraction of adjacent indices of
118
tensors of rank two or higher (e.g.,
C:ε
=
Cijk l εkl
); the symbol ‘
⊗
’ denotes a juxtaposition of two vectors
119
(e.g.,
a⊗b=aibj
) or two symmetric secondorder tensors [e.g.,
(α⊗β)ijkl =αij βkl
]. We also deﬁne identity
120
tensors:
I=δij
,
I=δikδjl
, and
¯
I=δil δjk
, where
δij
is the Kronecker delta. We denote the Eulerian coordinate
121
as
{x1
,
x2
,
x3}
and the corresponding three orthogonal basis vectors as
e1
,
e2
, and
e3
accordingly. As for sign
122
conventions, unless speciﬁed, the directions of the tensile stress and dilative pressure are considered as
123
positive.124
2 Database generation via molecular dynamics simulations125
In this section, we discuss the speciﬁcs of the MD simulation setup used to generate the database used for
126
the hyperelastic energy functional discovery. We provide a theoretical background for the simulations as
127
well as details on the system setup. We demonstrate the output results for the simulations and describe the
128
postprocessing procedure to render them suitable for our machine learning algorithms.129
Training data for the neural networks are obtained by computing the Cauchy stress tensor for isothermal
130
samples as functions of imposed tensorial strains. The strains used correspond variously to uniaxial
131
compression or tension, pure shear, and combination strains. The imposed strains are restricted to states
132
below the threshold for mechanical failure of
β
HMX as predicted by the MD. By learning the underlying
133
freeenergy functional, we can extract the hyperelastic response from secondorder and higherorder strain
134
derivatives. Note that whereas the MD reﬂects the underlying free energy, it does not yield the energy
135
functional property in a simply computable way.136
4 Nikolaos N. Vlassis et al.
2.1 Force ﬁeld137
The MD simulations were performed using LAMMPS [
28
] in conjunction with a modiﬁed version of the
138
allatom, fully ﬂexible, nonreactive force ﬁeld originally developed for HMX by Smith and Bharadwaj (SB).
139
[
29
,
30
,
31
,
32
,
33
] Intramolecular interactions are modeled using harmonic functions for covalent bonds,
140
threecenter angles, and improper dihedral (”wag”) angles; and truncated cosine expansions for proper
141
dihedrals. Intermolecular nonbonded interactions between atoms separated by three or more covalent
142
bonds are modeled using Buckinghampluscharge (exponential61) pair terms. Here and in Refs. [
34
,
35
,
8
],
143
a steep repulsive pair potential was incorporated between nonbonded atom pairs to prevent ‘overtopping’
144
of the exponential61 potential at short nonbonded separations
R
, which can occur under shockwave
145
loading due to the global maximum in the potential at distances of approximately 1
˚
A with a divergence
146
to negative inﬁnity as
R→
0. Evaluation of dispersion and Coulomb pair terms was computed using the
147
particleparticle particlemesh (PPPM) kspace method [
36
] with a cutoff value of 11
˚
A and with the PPPM
148
precision set to 10−6.149
2.2 MD Simulation cell setup150
Threedimensionally periodic (3D) primary simulation cells were generated starting from the unitcell
151
lattice parameters for
β
HMX (P2
1
/n space group setting) predicted by the force ﬁeld (at 300 K and
152
1 atm), by simple replication of the unit cell in 3D space. This results in a monoclinicshaped primary
153
simulation cell. The mapping of the crystal frame to the Cartesian lab frame is
akˆ
x
,
bkˆy
, and
c
in the +z
154
space. Starting primary cell sizes for the uniaxial compressive and uniaxial tensile deformation cases were
155
approximately 30 nm parallel to the strain direction and approximately 10 nm transverse to it; those for
156
pure shear deformation were approximately 10 nm
×
10 nm
×
10 nm; and those for biaxial compression
157
were approximately 30 nm
×
30 nm
×
30 nm. Figure 1depicts a unit cell of
β
HMX and snapshots of
158
representative simulation cells prior to the beginning of deformation. Table 1contains details of the system
159
sizes used.160
Fig. 1: Unit cell of
β
HMX (panel (a)) and snapshots of representative simulation cells for (b) uniaxial
compressive and tensile deformation, (c) shear deformation, and (d) biaxial compression. Cyan for carbon,
navy for nitrogen, red for oxygen, and white for hydrogen.
Training and validation of ML anisotropic constitutive law for βHMX 5
Table 1: System sizes for uniaxial compressive and tensile deformation, pure shear deformation, and biaxial
compression production simulations.
Simulation Lx(nm) Ly(nm) Lz(nm) Number of Molecules
Compression/Tension along ˆ
x30.3 10.5 10.6 12,880
Compression/Tension along ˆy10.5 30.3 10.6 12,992
Compression/Tension along ˆz10.5 10.5 30.4 12,800
Shear deformation 10.5 10.5 10.6 4,480
Biaxial compression 30.3 30.3 30.4 106,720
2.3 Simulation details161
MD trajectories were propagated using the velocity Verlet integrator in LAMMPS [
37
,
38
]. Primary cells
162
constructed as described in the preceding paragraph were thermally equilibrated in the isochoricisothermal
163
(NVT) ensemble at 300 K by initially selecting atomic velocities from the 300 K Maxwell distribution followed
164
by 20 ps of trajectory integration. Temperature control was achieved using the Nos
´
eHoover thermostat
165
[
39
,
40
] as implemented in LAMMPS with the damping parameter set to
50.0 fs
. A 0.2 fs time step was used
166
for the thermal equilibration.167
Fifteen isothermal MD production simulations, comprising three apiece for uniaxial compression,
168
uniaxial tension, and biaxial compression, and six for pure shear (i.e., positive and negative shear directions
169
for three distinct shear cases) were performed at
T=
300 K using NVT integration in conjunction with the
170
LAMMPS ﬁx deform command. The integration time step was 0.20 fs and the thermostat damping parameter
171
was set to 20.0 fs. The system potential energy, temperature, pressure, Cauchy stresstensor components,
172
and primary cell lattice vectors were recorded at 10 fs intervals for subsequent analysis.173
For the uniaxial compressive and tensile deformation simulations, the prescribed strain was applied
174
parallel to the long direction of the primary cell while holding both the transverse cell lengths and the tilt
175
factors constant. The strain rate was set to the constant value
±
0.1/100 ps, applied uniformly at each time
176
step. The uniaxial deformation simulations were performed for 300 ps, resulting in a total strain of 0.3 for
177
those cases.178
For the shear simulations, the system was deformed along with one of the three tilt factors (i.e., xy,
179
xz, and yz) while the cell edge lengths were maintained at constant values. A constant strain rate of
±180
0.1/100 ps was applied for 300 ps, resulting in total positive or negative shear strains of 0.3.181
For the biaxial compression simulations, the primary cell was compressed along two axes simultaneously
182
in the lab frame (i.e.,
x
and
y
,
y
and
z
, or
x
and
z
) while holding the third cell length and the tilt factors
183
constant. The strain rate was set to
±
0.05/100 ps along both directions. Trajectory integration was performed
184
for 300 ps resulting in a strain of 0.15 along each of the two affected directions.185
2.3.1 MD results186
Figure 2contains the system potential energy, pressure, Cauchy stresstensor components, and lattice vectors
187
vs. time for the case of uniaxial compressive deformation along
ˆy
. The effects of deformation are evident in
188
the potential energy and stresstensor components (panels (a) and (c)), where it can be seen that the sample
189
yields at
t≈190 ps
. Data collected from the beginning of the simulations up to approximately 10 ps before
190
failure were used to train the energy functional.191
The Cauchy stress is obtained from the standard LAMMPS command and the expression can be found
192
there (cf. [41]).193
2.4 Filtering MD simulation data194
The raw data from the MD simulations are not expected to be smooth, due to thermal ﬂuctuations. These
195
ﬂuctuations may depend on the thermostat employed and the size of the system. This temperature ﬂuc
196
6 Nikolaos N. Vlassis et al.
Fig. 2: From MD, system (a) potential energy, (b) pressure, (c) Cauchy stresstensor components, and (d)
lattice vectors vs. time for uniaxial compressive deformation along ˆy.
tuation, however, is not supposed to be captured by the hyperelasticity energy functional, which is only
197
designed to capture the macroscopic constitutive responses.198
To deal with the MD data, we can either introduce a regularization process during the machine learning
199
training or we can simply ﬁlter out the Gaussian noise that might otherwise affect the convexity and
200
therefore the stability of the hyperelasticity model.201
While one can ﬁlter the Cauchy stress tensor on a componentbycomponent basis, such a strategy may
202
lead to a ﬁltered Cauchy stress that depends on the coordinate system. Thus, this strategy should be avoided.
203
While there are potentially more sophisticated techniques for ﬁltering tensorial and multidimensional data
204
(e.g. Muti and Bourennane [
42
]), here we introduce a spectral decomposition on the Cauchy stress such that
205
σ=
3
∑
a=1
σana⊗na. (1)
Following this step, a 1D moving average ﬁlter is applied to each of the eigenvalues of the Cauchy stress
206
and to the Euler angles that represent the orthogonal basis vector—
na
. To remove the noise, we used a 1D
207
uniform ﬁlter on the data series that works similar to a rollingaverage window. The temporal length of the
208
ﬁlter window is equal to that of 3 ps (300 MD observations). This length of the ﬁlter window is selected after
209
a manual trialanderror such that we may suppress the noise of the tensorial time series without greatly
210
distorting the global recorded constitutive response. Note that highly ﬂuctuated stress data may increase
211
the difﬁculty of Sobolev training the hyperelasticity energy functional but also affect the stability of the
212
constitutive responses at the continuum scale. Hence, this preliminary step is necessary.213
To examine whether the ﬁlter introduces signiﬁcant bias to the ﬁlter data, we apply our ﬁltering
214
procedure to two MD simulations with the same strain path but initiated from different initial conditions
215
and using different values for the thermostat coupling parameter. The ﬁltered and unﬁltered constitutive
216
responses are compared for both cases, as shown in Fig. 3. The two MD simulations demonstrate different217
ﬂuctuation patterns but the ﬁltered responses are very close The uniform ﬁlter used to process the data
218
appears to capture almost identical behaviors for both simulations.219
Training and validation of ML anisotropic constitutive law for βHMX 7
−0.030−0.025−0.020−0.015−0.010−0.005
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S11 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
−0.06 −0.05 −0.04 −0.03 −0.02 −0.01
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S12 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
−0.06−0.04−0.02
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S12 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
Fig. 3: Filtering of MD simulation data with a uniform ﬁlter for a compression test along the
x1
axis. The
ﬁltering is performed for two MD simulations with different thermostat coupling parameters and thus
different RMS ﬂuctuations about the local mean value of the stress along the trajectories.
3 Finite strain hyperelastic neural network functional for βHMX220
In this work, we will approximate a ﬁnite strain hyperelastic energy functional for
β
HMX using a feed
221
forward neural network architecture trained with a modiﬁed Sobolev training loss function that incorporates
222
additional physical constraints via a transfer learning technique.223
The following assumptions and setup have been made:224
1.
There exists one reference conﬁguration for the
β
HMX for which the stored elastic energy is zero. This
225
conﬁguration constitutes the reference conﬁguration for the deformation mapping.226
2. We assume that all the data used in the training are purely elastic with no path dependence.227
3. Thermomechanical and ratedependence effects on the elasticity are neglected.228
4. A ﬁlter is used to reduce the highfrequency responses.229
The stored energy functional
¯
ψ
can be written as a function of the deformation gradient
F
. The ﬁrst
230
PiolaKirchhoff stress
P
is conjugate to the deformation gradient
F
and can be obtained from the following
231
relation,232
P(F) = ∂¯
ψ(F)
∂F. (2)
Notice that a necessary condition for this energy functional to be correct is the materialframe indifference.
233
Here the deformation gradient is not sensitive to rigidbody translation. However, to ensure the
SO(
3
)234
equivalence, the machine learning generated energy functional must satisfy the following constraint,235
¯
ψ(F) = ¯
ψ(QF),∀Q∈SO(3)(3)
A possible way to bypass the need to introduce additional constraints in the loss function is to derive
236
the energy functional as a function of the Green strain tensor Efor which:237
E0=1
2(C0−I) = 1
2(F0T·F0−I) = 1
2(FT·QT·Q·F−I) = 1
2(FT·F−I) = 1
2(C−I) = E, (4)
so we then acquire an equivalent expression:238
¯
ψ(F) = ψ(E). (5)
The second PiolaKirchhoff stress Sis conjugate to the Green strain E, which is derived as:239
S(E) = ∂ψ
∂E. (6)
8 Nikolaos N. Vlassis et al.
The transformations between the two stress measures
(P)
and
S
and the Cauchy stress tensor as recorded
240
by the MD simulations are deﬁned as:241
P=JσF−T,S=JF−1σF−Tand S=F−1P. (7)
where Jis the determinant of the deformation gradient F.242
In addition to the frame invariance, another major beneﬁt of expressing the energy functional in terms of
243
the Green strain tensor is that the resultant stress measure is symmetric and the elastic tangential operator244
possesses both major and minor symmetries. These symmetries may reduce the dimension of the input
245
parametric space 9 to 6 and hence simplify the training. Furthermore, while
C
and
E
can both be used as
246
the input for the inherently frameindifferent energy functional that yields
S
as the ﬁrst derivative,
E=0247
implies the energy functional becomes zero. Meanwhile, training
¯
ψ(F)
as the learned function can be more
248
convenient for implicit total Lagrangian solvers where the tangent corresponding to
P−F
is required to
249
solve the linearized system of equation.250
As such, we will train two hyperelasticity functionals,
¯
ψ(F)
and
ψ(E)
, which take the deformation
251
gradient and the Green strain tensor as inputs respectively. We will then compare the results obtained from
252
numerical experiments. The relationships among elasticity tangential tensors corresponding to different
253
stressstrain conjugate pairs will also be discussed in Section 5.254
255
Remark 1
It should be noticed that there are other feasible choices, such as the cofactors of deformation
256
gradient or strain invariants, that may ensure material symmetry [
12
], guarantee polyconvexity [
43
], and
257
ensure material frame indifference. Our choices of directly using the deformation gradient and the Green
258
strain tensor are mainly for convenience and ease of implementation. In the case of
¯
ψ(F)
, the training
259
procedure is more complicated due to the necessity of enforcing frame invariance. However, the direct
260
access of the
P−F
conjugate tangential stiffness may simplify the implementation of total Lagrangian code.
261
262
4 Stressbased Sobolev training for storedenergy function263
We introduce a neural network training technique that constructs the hyperelasticity energy functional using
264
solely the stress data and a single reference conﬁguration where
F=I
. Recall that a feedforward neural
265
network can be trained to approximate an energy functional
ψ
that takes the GreenLagrange deformation
266
tensor
E
as input. This energy function is parametrized by weights
W
and biases
b
. The supervised learning
267
that minimizes the inner product of the difference between the true
ψ
and the approximated
ˆ
ψ
for
N
number
268
of data samples can be written as269
W0,b0=argmin
W,b 1
N
N
∑
i=1
ψi−ˆ
ψi
2
2!, (8)
where
ψi=ψ(Ei)
and
ˆ
ψi=ˆ
ψ(Ei)
accordingly. While this approach could reduce the discrepancy of the
270
predicted and true free energy values—if the energy data are available, minimizing the energy discrepancy
271
does not guarantee that the stress predictions are accurate.272
In principle, calculating the Helmholtz freeenergy from the detailed atomistic conﬁgurations is possible
273
[
44
]. However, in this work, our focus is on the cases where we have no direct access to sufﬁcient Helmholtz
274
freeenergy data. In the following numerical experiments on
β
HMX, we instead only use a reference
275
conﬁguration as well as the stress data collected from multiple deformed conﬁgurations to reconstruct the
276
elastic stored energy. Consequently, we introduce two trained neural networks that takes a proper strain
277
measure as input and output the elastic stored energy. The Sobolev training then attempts to adjust the
278
weights and biases of the neurons such that the derivatives of the stored energy matches the stress measures
279
conjugate to the input strain measure. In other words, we show that it is possible to have labels used for the
280
training that are different than the input and output of the neural network. This ﬂexibility is proven to be
281
useful for tasks in which not all data are necessary available or of sufﬁcient ﬁdelity.282
Training and validation of ML anisotropic constitutive law for βHMX 9
4.1 Sobolev constraints for the hyperelastic energy functional283
To introduce a hyperelasticity model suitable to incorporate into numerical solvers for boundary value
284
problems, the accuracy, stability, robustness, smoothness, and uniqueness of the hyperelasticity responses
285
are all important. Unlike neural networks that directly generate stress predictions, a hyperelasticity model
286
is required to be sufﬁciently smooth and differentiable to avoid discontinuity in the predicted stress and
287
elastic tangent [24,25,22].288
4.1.1 Hyperelastic energy functional ˆ
ψ(E)289
Consider the storedenergy functional solely constructed with the following data.290
1.
A reference conﬁguration where the Green strain tensor equals to
Eref
with the corresponding second
291
PiolaKirchhoff stress Sref and reference energy ψref;292
2.
A set of second PiolaKirchhoff stress
Si
, i=1,2, ..., N calculated from Cauchy stress measured at
N293
number of deformed conﬁgurations inferred from MD simulations.294
The corresponding loss function reads,295
W0,b0=argmin
W,b
wψref
ψref −ˆ
ψref
2
2+wSref
Sref −∂ˆ
ψ
∂EE=Eref
2
2
+wS
N
N
∑
i=1
Si−∂ˆ
ψ
∂EE=Ei
2
2
, (9)
where
ψref =ψ(Eref)
and
ˆ
ψref =ψ(Eref)
are the true and approximated values of the energy functional at
296
strain
E0
,
N
is the number of nontrivial stress data points, and
wψref
,
wSref
and
wS
are the weighting factors
297
for the multiobjective optimization. In this work, we use the conﬁguration at (
300 K
,
1 atm
) as the reference
298
conﬁguration and we assume that this conﬁguration is undeformed such that Eref =0.299
4.1.2 Hyperelastic energy functional ¯
ψ(F)300
Another feasible option is to directly train the energy functional that related to the
P−F
pair. The drawback
301
of this option is the more complex training, owning to the fact that the deformation gradient
F
is a twopoint
302
tensor that is not necessarily symmetric. Hence, both the dimensions of the labels for the supervised training
303
increased. Furthermore, it is also necessary to introduce additional training step to ensure material frame
304
indifference which could be avoid if invariants or
E
is used as the input [
43
]. However, if such a training is
305
successful, the Hessian of this energy functional may give us the tangential stiffness tensor corresponding
306
to the
P−F
pair. The bases of this tangential stiffness tensor makes it easy to incorporate into the linearized
307
system of equation for a total Lagrangian ﬁnite element solver, without requiring any additional algebraic308
operations to pullback or pushforward between conﬁgurations. As such, this option is provided here. Here,
309
we assume that the data provided in Section 4.1.1 is provided and the identical reference condﬁguration is
310
used. The corresponding loss function for the energy conjugate pair P−Fhyperelastic model is:311
W0,b0=argmin
W,b
w¯
ψref kψref −¯
ψrefk2
2+wPref
Pref −∂¯
ψ
∂FF=Fref
2
2
+wP
N
N
∑
i=1
Pi−∂¯
ψ
∂FF=Fi
2
2
, (10)
where
¯
ψref =ψ(Fref)
and
¯
ψref =ψ(Fref)
are the true and approximated values of the energy functional at
312
strain
E0
,
N
is the number of nontrivial stress data points, and
wψref
,
wPref
and
wP
are the weighting factors
313
for the multiobjective optimization.314
10 Nikolaos N. Vlassis et al.
4.1.3 Transfer learning to enforce frame indifference for ¯
ψ(F)315
A hyperelastic model described by the conjugate pair
P−F
tensors is expected to satisfy the frame invariance
316
conditions described in Eq.
(3)
. To ensure that the frame invariance is preserved during training, we reuse a
317
previously trained neural network but modifying the loss function by introducing a number
L
of random
318
rotations
Ql
,
l=
1, 2, ...,
L
and penalizing the violation of the objectivity for a randomly selected subsample
319
of size Lfrom the initial training sample pool by adding the following weighted objectives:320
wψ1
L
L
∑
l=1
ˆ
ψ(QlFl)−ˆ
ψ(Fl)
+wP
1
L
L
∑
l=1
ˆ
P(QlFl)−Qlˆ
P(Fl)
2
2
+wC
1
L
L
∑
l=1
ˆ
A(QlFl)−QlQlˆ
A(Fl)
2
F,
(11)
where
ˆ
P
,
ˆ
A
are the neural network approximated stress and elastic stiffness tensors respectively and
wC
is
321
a weight for the multiobjective minimization. Note that, this additional step is not necessary for energy
322
functional ψ(E)since (QF)TQF =FTQTQF =C, for any rotation tensor Q∈SO(3).323
4.2 Transfer learning to enforce crystal symmetries324
The monoclinic unit cell of the single crystal
β
HMX in the
P
2
1/n
space group setting is shown in Figure 4.
325
The covariant crystal basis vectors
M1
,
M2
, and
M3
represent the crystal axis in the crystal conﬁguration as
326
shown in Figure 4.327
α
γ
β
M1 = [100]
M2= [010]
M3M3= [001]
a
b
c
c*
e1e2
e3
Fig. 4: Monoclinic unit cell of
β
HMX in the
P
2
1/n
space group setting. The lattice constants are
a=6.53 ˚
A
,
b=11.03 ˚
A
,
c=7.35 ˚
A
,
α=γ=90◦
, and
β=102.689◦
(at
295 K
) [
45
]. The Miller indices are associated
with the monoclinic crystal directions, while the vectors
e1
,
e2
, and
e3
denote the basis vectors of the global
Cartesian coordinate system.
Note that, under the monoclinic material symmetry as shown in Figure 4, the crystal structure renders
328
2fold rotational symmetry. The crystal structure remains unchanged when the unit cell is rotated
180◦
with
329
respect to [010]. Therefore, the symmetry group of this monoclinic unit cell reads330
VQ={QQ=exp kπspn(M2)
kM2k,k∈Z}. (12)
Here, the inﬁnitesimal rotation map and the ﬁnite rotation map are deﬁned as [46],
spn(θ) = −ε·θ, exp [spn(θ)]=I+sin(θ)
θspn(θ) + 1−cos(θ)
θ2spn(θ)2,
where εis the third order permutation tensor and θ=kθkis the rotation angle.331
Consider two elastic deformations of the crystal,
F
and
F+
, where
F
is an arbitrary deformation and
332
F+=FQ,∀Q∈VQ. The material symmetry of the βHMX crystal requires that333
ψ(F+) = ψ(F),P+=∂ψ
∂FF+=PQ,A+
aBcD =AaMc N QMB QND ,∀Q∈VQ, (13)
Training and validation of ML anisotropic constitutive law for βHMX 11
where ψis the elastic free energy, P+and Pare the ﬁrst PiolaKirchhoff stress tensors evaluated at F+and334
F, and A+and Aare the elastic stiffness tensors, such that ˙
P=A:˙
Fand ˙
P+=A+:˙
F+.335
To ensure that the crystal symmetry is preserved, we reuse the previously trained functions
(10)
and
336
(11)
, and modify the loss function by introducing
M
number of rotations
Qm∈VQ
,
m=
1, 2, ...,
M
based on
337
the material symmetry type, which serves as the penalty to the violation of the material symmetry to a
N338
number of samples:339
M
∑
m=1 wψ1
N
N
∑
i=1
ˆ
ψ(FiQm)−ˆ
ψ(Fi)
+wˆ
P
1
N
N
∑
i=1
P(FiQm)−ˆ
P(Fi)Qm
2
2
+wC
1
N
N
∑
i=1
ˆ
A(FiQm)−ˆ
A(Fi)QmQm
2
F!.
(14)
5 Posttraining validation of the predicted elastic tangential operators340
In this section, we introduce numerical tests to determine whether the predicted constitutive responses are
341
thermodynamically admissible, preserve the symmetry, and lead to unique and stable elastic responses. A342
subset of these criteria is required to constitute a correct constitutive law (e.g. material frame invariance),
343
while others such as the convexity and the strong ellipticity are not necessary conditions but are desirable
344
properties for stability and uniqueness of the boundary value problem. While in principle many of these
345
physics constraints/laws can be incorporated into the loss function in the supervised learning process,
346
putting all the constraints explicitly into the loss function is not necessarily always ideal, as the multiple
347
constraints may alter the landscape of the loss function and thus complicate the search for the optimal
348
energy functional [47].349
As such, our goal is to introduce a suite of necessary conditions which the learned hyperelasticity consti
350
tutive law must fulﬁll. These necessary conditions, along with the fact that the hyperelasticity constitutive351
law must be capable of generating predictions within a threshold error, are necessary but not sufﬁcient to
352
guarantee the safety of using the machine learning model for highconsequence highrisk predictions (such
353
as those for explosives).354
5.1 Mapping between ﬁnite and inﬁnitesimal kinematics355
To examine the admissibility of the hyperelasticity model and compare the ﬁnite strain model with other
356
published results based on the inﬁnitesimal strain assumption, the connections among the tangents of
357
different energyconjugate pairs are provided below for completeness. Here our ﬁrst goal is to obtain an
358
underlying smallstrain tangent of the ﬁnitestrain counterpart by using the logarithmic and exponential
359
mappings, such that the elasticity tensors predicted here and those from the literature can be compared.
360
Recall that the logarithmic elastic strain ecan be deﬁned as [27],361
e=ln U2=1
2ln C, , (15)
where
U
is the rightstretch tensor and
C
is the right CauchyGreen strain tensor. The smallstrain elastic
362
tensor Cσ−ecan be obtained from the chain rule,363
Cσ−e=∂σ
∂e =∂σ
∂S:∂S
∂E:∂E
∂e=1
2
∂σ
∂S:∂S
∂E:∂C
∂e=1
2
∂σ
∂S:∂S
∂E:∂exp 2e
∂e, (16)
where
σ=J−1F·S·FT
is the Cauchy stress. To compute the smallstrain elasticity tensor, one ﬁrst rewrites
364
Eq. (15) in an inﬁnite series representation,365
C=exp 2e=
∞
∑
n=0
1
n!(2e)n. (17)
12 Nikolaos N. Vlassis et al.
As such, the Cartesian component of the derivative ∂C/∂ereads [48],366
[∂C
∂e]ijkl =
∞
∑
n=1
2n
n!
∞
∑
m=1
[em−1
ik ][en−m
lj ]. (18)
Notice that is a inﬁnite series. In practice, we may only include a sufﬁcient but ﬁnite number of terms in
367
Eq.
(18)
to approximate the partial derivative
∂C/∂e
. Convergence studies and benchmark data on inﬁnite
368
series representation can be found in Ortiz et al. [
49
]. An alternative representation based on spectral
369
decomposition is also possible (cf. Miehe [50]) but is out of the scope of this paper.370
The ﬁrst tangential tensor
CP−F
can be related to the second derivative of the hyperelastic energy
371
functional ψ(E),372
CP−F=∂P
∂F=∂S
∂E·F·F·g+S⊗δ, (19)
where
g
is the metric tensor. For the Cartesian coordinate system used in our training loss function, the
373
indice notation of the metric tensor is simply
gij =δij
.This expression is derived from Marsden and Hughes
374
[
9
] (see page 215), where we simply use the chain rule to link the tangents
∂S/∂E
with
∂S/∂C
. Note that this
375
tensor corresponds to the ﬁrst PiolaKirchhoff stress and the deformation gradient, and does not possess
376
minor symmetry.377
In both Eq.
(16)
and Eq.
(19)
, the derivative
∂S/∂E
is obtained from the neural elastic stored energy,
378
while the rest of the terms can be obtained via either analytical solution or automatic differentiation.379
5.2 Strong ellipticity380
While many works are dedicated to training neural network to predict elastic responses of solids [
51
,
52
,
22
,
381
53
,
54
,
55
,
24
,
25
], surprisingly few among these analyze the stability and uniqueness of the learned neural
382
network constitutive laws or provide any evidence of the wellposedness for the trained model. Recent work
383
by Dominik et al [
43
] address this issue by enforcing the polyconvexity of hyperelastic energy functional
384
via invariants (cf. Hartmann and Neff [
56
]). Note that the onset of the loss of strong ellipticity does not
385
necessarily indicate that the learned elastic energy functional is erroneous. Rather, it is considered as an
386
indicator for the onset of materials instability or failure [
9
,
57
,
58
]. Physically, the loss of the strong ellipticity
387
may also lead to the vanished wave propagation speed [
9
]. As such, our focus here is not necessarily on
388
preventing the loss of strong ellipticity but rather on the search the points at which the onset of loss of strong
389
ellipticity may occur to provide more interpertable physical insight on the stability of the βHMX crystal.390
Consider
A
to be the acoustic tensor corresponding to
CPF
and that
CPF
is the elastic tangential operator
391
for the energy conjugate pairs (P,F), that is,392
A(N) = N·CPF ·N(20)
The LegendreHadamard condition requires that for any pair of vectors
N
and
m
, the following condition
393
holds:394
m·A·m≥0, (21)
where
N
is a Lagrangian unit vector and
m
is an Eulerian vector. Because we assume that
β
HMX is a
395
Greenelastic material, the necessary and sufﬁcient conditions for strong ellipticity are (cf. Ogden [
10
] page
396
392)397
Aii (N)>0, i∈ {1, 2, 3}(22)
Aii (N)Ajj(N)−Aij(N)2>0, j6=i∈ {1, 2, 3}(23)
det A(N)>0 (24)
for any
N∈R3
. Notice that the material response is nonlinear and the acoustic tensor may vary according
398
to the Eulerian vector
m
. A simple way to ensure the conditions
(22)

(24)
are satisﬁed is to create the
399
worstcase scenario, that is, ﬁnd the inﬁmum, and the unit vectors
N
that minimize
Aii (N)
,
Aii (N)Ajj(N)−400
Training and validation of ML anisotropic constitutive law for βHMX 13
Aij (N)2
, and
det A(N)
accordingly and check whether the three terms remain positive. Depending on the
401
parameterization, the corresponding minimization problems can be written as402
f(q) = Aii (N(q)), argmin
q
f(q),N(q)∈S2(25)
g(q) = Aii (N(q))Ajj (N(q)) −Aij (N(q))2, argmin
q
g(q),N(q)∈S2(26)
d(q) = det A(N(q)), argmin
q
d(q),N(q)∈S2, (27)
where
q
represents a parametrization of the unit vector
N(q)
. Mota et al [
59
] provide a comprehensive
403
review of how different parameterizations, namely the spherical, stereographic, projective and tangent
404
parameterizations, may lead to different local mininizers of the acoustic tensor in the parametric space. For
405
spherical parameterization, a unit vector
N
is an element of the unit sphere
S2
which can be parameterized
406
by the spherical coordinates, that is, the polar angle φ∈[0, π]and the azimuthal angle θ∈[0, π]:407
N(φ,θ) = sin φcos θe1+sin φsin θe2+cos φe3, (28)
where {e1,e2,e3}is the the orthogonal basis for R3.408
To ensure stability for any given admissible deformation, we must ensure that Eqs.
(22)

(24)
are valid
409
for any
F
. While this can be, in principle, determined analytically for handcrafted energy functionals,
410
the expression of the neural network energy functional would likely be too complicated to analyze. As
411
such, we again resort to constructing a test to check the hypothesis that the material demonstrates strongly
412
ellicipticity, via an attempt to ﬁnd the minima, that is,413
f0(q,F) = Aii (N(q),F), argmin
q,F
f0(q,F),N(q)∈S2,F∈GL+(3)(29)
g0(q,F) = Aii (N(q),F)Ajj (N(q),F)−Aij (N(q),F)2, argmin
q,F
g0(q),N(q)∈S2,F∈GL+(3)(30)
d0(q,F) = det A(N(q),F), argmin
q,F
d0(q,F),N(q)∈S2,F∈GL+(3). (31)
It is impossible to test all the possible deformation gradients in the MD simulations while maintaining the414
path independence of the constitutive responses, so we instead construct a test where we only consider a
415
range of possible deformation gradients and search for the minima within this range.416
The numerical strong ellipticity test is conducted via the following three steps.417
1.
We create two sets of point clouds in the parametric space with uniform spacing,
Vq={q1
,
q2
,
q3
, ....
}418
and
VF={F1
,
F2
,
F3
, ...
}
, and select the combination of
(q
,
F)
that minimizes
f0
,
g0
,
d0
. If there exist
419
other
(q
,
F)
combinations that yield a value sufﬁciently close to the minimum (say within 5% difference),
420
then the additional coordinates will be stored as the candidate position(s) for the gradientfree search.
421
This treatment is to ensure that more local optimal points can be identiﬁed and compared and to avoid
422
the issues exhibited in Mota et al [59].423
2.
We then use the candidate position determined from the previous step as the starting point and apply a
424
gradientfree optimizer via the thirdparty gradientfree optimizer library (cf. Blanke [
60
]) to examine
425
whether we can ﬁnd new coordinates for which the functions f0(q,F),g0(q,F), and d0(q,F)are smaller426
than the candidate position(s) identiﬁed in Step 1.427
3.
If Eqs.
(22)

(24)
are not violated in the worst case obtained from Step 2, then we consider the neural
428
network functional to have passed the strong ellipticity test.429
5.3 Convexity and growth conditions430
In nonlinear elasticity in the ﬁnite strain regime, convexity is not necessary and can be overrestrictive for
431
physical phenomena that involve instability or buckling [
14
]. Nevertheless, the convexity condition has
432
to be satisﬁed to predict stable elastic responses under large deformation. The convexity condition can be
433
stated as (cf. [10]),434
14 Nikolaos N. Vlassis et al.
ψ(F0)−ψ(F)−tr(P·(F−F0)) ≥0 (32)
Because convexity is not a requirement for realistic simulations (although it might be expected for HMX),
435
we do not incorporate this criterion in the training of the neural network. However, the uniqueness and
436
stability of the elasticity model are not only important for predicting realistic elastic responses but crucial if
437
the model will be deployed as the underlying elasticity model for crystal plasticity and damage models.438
Another important condition to prevent degenerated elastic behavior is from Rosakis and Simpson [
61
]
439
which requires440
ψ(F)→∞as det F→0+. (33)
Recall that
det F→
0 only happen if the distance between two material points that was nonzero in the
441
reference conﬁguration vanishes in the current conﬁguration. Note that it is unlikely a material would
442
remain elastic if the volumetric deformation is extremely large. furthermore, enforcing these constraints
443
explicitly in the loss function is difﬁcult due to the inﬁnity. Nevertheless, the constraint may provide a
444
helpful indicator of the admissibility of the machine learning extrapolated predictions. As a result, we
445
suggest a posttraining validation test where we generate the response for deformation gradients with
det F446
approaching zero and observe whether the resultant energy is monotonically increasing.447
5.4 Material Anisotropy448
A predictive elasticity model must preserve the overall crystal symmetry while capturing how the anisotropy
449
of the elasticity tensor evolves under arbitrary deformation. The degree of anisotropy of the elastic response
450
can be measured by various metrics available in the literature (cf. [
62
,
63
,
64
]). Many of these anisotropy
451
metrics (or indices) are intended for components of the elasticity tensor. Typically, the distinction between
452
the secant and tangential elastic tensors is not taken into account. This can be confusing for materials
453
undergoing ﬁnite deformation where both material and geometrical nonlinearities play important roles
454
in the degree of anisotropy of the constitutive response. More importantly, the impacts of the former and
455
latter types of nonlinearity should be distinguished properly such that a meaningful evaluation can be
456
conducted.457
5.4.1 Ledbetter and Migliori general anisotropy index458
Here, we use the idea from previous work due to Ledbetter and Migliori [
65
], where the ratio between the
459
maximum and minimum shearwave speed is used to deﬁne a degree of anisotropy measure. Interestingly,
460
this method can also be used to detect instability as the vanishing of the slowest wave speed is accompanied
461
by divergence of the LedbetterMigliori index.462
This measure can be easily extended to the ﬁnite strain regime by replacing the inﬁnitesimal elasticity
463
tangent with the elasticity tensor corresponding to the ﬁrst PiolaKirchhoff stress and deformation gradient
464
[10]. This idea can be summarized into the following steps.465
1. Generate as many unit vectors Nas possible.466
2. Solve the Christoffel equation for each unit vector N, that is,467
det N·C(F)·N−ρv2I=0 (34)
3. Pick the largest solution v2and the smallest solution v1. Then, the anistropy index is simply468
AI=v2
2/v2
1(35)
Here, instead of a Monte Carlo search, we can leverage the search formulated in Section 5.2 to obtain the
469
smallest eigenvalue
v1
and largest eigenvalue
v2
of the acoustic tensor. Again, the optimization is conducted
470
by using a uniformly spaced point cloud to search for the initial guess, then a gradientfree optimizer is
471
used to ﬁnd the normal vectors that maximize and minimize v.472
Training and validation of ML anisotropic constitutive law for βHMX 15
6 Results473
In this section, we discuss the performance of neural network models for discovering the hyperelastic
474
energy functional from the
β
HMX MD simulation data. We describe the training setup of the networks
475
and compare the performance of the architectures. We then demonstrate the predictive capabilities of the
476
models against the present MD simulation data and elastic constants taken from the literature for the same
477
MD force ﬁeld used here. Finally, we investigate the energy functional models in terms of how well they
478
satisfy desired properties from the hyperelasticity literature.479
6.1 Training performance and learning capacity480
In this section, we discuss the performance of the neural network architectures for the Sobolev constraints481
described in Section 4. We ﬁrst demonstrate how we trained the neural networks to generate a hyperelastic
482
energy functional data from the MD simulation data. We use two different architectures to discover the
483
hyperelastic energy functional for
β
HMX. The ﬁrst architecture is based on the energy conjugate pair
S−E484
(Model
M1
). The input and output variables are symmetric tensors and, thus, can be described by six
485
components. The second architecture is based on the energy conjugate pair
P−F
(Model
M2
). In addition,
486
we also retrain Model
M2
with an additional material frame indifference constraint (Eq.
(11)
)in the loss
487
function (model
M3
). As the difference in the predictions obtained from Models
M2
and
M3
is minor, we
488
did not enforce the Eq.
(11)
explicitly in the the last model we trained (Model
M4
). Instead, only monoclinic
489
symmetry is enforced as an additional term for the weighted loss function in the retraining step to ensure
490
that the material symmetry is preserved.491
Table 2: Summary of the trained models.
Model
Description
M1
Energy conjugate pair
S−F
model trained via the loss function described
in Eq. 9.
M2
Energy conjugate pair
P−F
model trained via the loss function described
in Eq. 10.
M3
Energy conjugate pair
P−F
model trained with pretrained model
M2
and additional loss function Eq.
(11)
to enforce material frame indiffer
ence.
M4
Energy conjugate pair
P−F
model trained pretrained model
M2
and
additional loss function Eq. (14) to enforce monoclinic symmetry.
The energy functional neural networks have a feedforward architecture consisting of a hidden dense
492
layer (100 neurons / ReLU), followed by two multiply layers (cf. Vlassis and Sun [
25
]), then another hidden
493
dense layer (100 neurons / ReLU), and ﬁnally an output dense layer (Linear). The training and validation
494
procedures of the neural network are implemented in Python with machine learning libraries Keras [
66
] and
495
Tensorﬂow [
67
]. The kernel weight matrix of the layers was initialized with a Glorot uniform distribution
496
and the bias vector with a zero distribution.497
In total, 233,430 data points are generated from 15 MD simulations. As described in Section 2.3, this data
498
set includes multiple loading scenario such as uniaxial compressive, tensile, shear and biaxial compressive
499
cases. This set of MD data is partitioned randomly into two subsets that are mutually exclusively to each
500
other. 70% of data (163,400 data points) are used to train the neural network energy functional, while 30% of
501
data (70,030 data points), which we refer to unseen data herein, are used to crossvalidate the results. All the
502
models were trained for 1000 epochs with a batch size of 512, using the Nadam optimizer [
68
] initialized
503
with default values in the Keras library.504
16 Nikolaos N. Vlassis et al.
100101102103
Epoch
10−6
10−5
10−4
10−3
10−2
Loss
Stress Training Loss
100101102103
Epoch
10−22
10−19
10−16
10−13
10−10
10−7
10−4
Loss
ψoTraining Loss
100101102103
Epoch
10−10
10−9
10−8
10−7
10−6
10−5
10−4
10−3
Loss
So/PoTraining Loss
(a) (b) (c)
102
Epoch
10−6
10−5
10−4
10−3
10−2
Loss
Stress Training Loss
Model M1Training Loss
Model M1Validation Loss
Model M2Training Loss
Model M2Validation Loss
Fig. 5: Comparison of the training loss curves for the energy conjugate pair
S−E
model (
M1
) and the
energy conjugate pair
P−F
model (
M2
) for (a) the stress, (b) the energy, and (c) stress value at the state of
zero strain.
The loss function training curves for the architectures
M1
and
M2
are demonstrated in Fig. 5. The two
505
architectures appear to have similar accuracy so they will be used interchangeably below. The predictive
506
capabilities of M1and M2are further demonstrated in Section 6.2.1.507
100101102103
Epoch
10−8
10−7
10−6
10−5
10−4
Loss
Frame invariance energy constraint Training Loss
100101102103
Epoch
10−6
10−5
10−4
Loss
Frame invariance stress constraint Training Loss
(a) (b)
Model M2Training Loss
Model M2Validation Loss
Model M3Training Loss
Model M3Validation Loss
Fig. 6: Comparison of the training loss curves for (a) the energy and (b) stress frame invariance constraints
for the energy conjugate pair
P−F
model (
M2
) without any additional constraints in the loss function and
the energy conjugate pair
P−F
model (
M3
) trained with the additional frame invariance constraint loss
function Eq. (11).
To check and, if necessary, enforce the frame invariance of the neural network hyperelastic models as
508
described in Section 4.1.3, we conduct a transfer learning experiment by retraining the neural network
509
model
M2
. We ﬁrst train the energy conjugate pair
P−F
model (
M2
) for 1000 epochs without any frame
510
invariance constraints in the loss function (i.e., Eq.
(10)
). We record the frame invariance metrics during
511
training by applying random rotation
Q
tensors on the input deformation gradient tensors and examine
512
whether the material response is frame invariant; that is, whether the predicted energy remains the same
513
before and after rotation and whether the predicted stress tensor rotates accordingly. The trained model
514
M2
is then retrained with the additional frame invariance constraints in Eq.
(11)
for another 1000 epochs
515
(model
M3
). The comparison of the training curves for
M2
and
M3
is shown in Fig. 6. Model
M2
appears
516
Training and validation of ML anisotropic constitutive law for βHMX 17
to already satisfy well the frame invariant properties, with the additional constraints of model
M3
mostly
517
improving the frame invariance energy constraints.518
100101102103
Epoch
10−9
10−8
10−7
10−6
10−5
10−4
10−3
Loss
Symmetry energy constraint Training Loss
100101102103
Epoch
10−8
10−7
10−6
10−5
10−4
Loss
Symmetry stress constraint Training Loss
(a) (b)
Model M2Training Loss
Model M2Validation Loss
Model M4Training Loss
Model M4Validation Loss
Fig. 7: Comparison of the training loss curves for (a) the energy and (b) stress symmetry constraints for the
energy conjugate pair
P−F
model (
M2
) without any symmetry constraints in the loss function and the
energy conjugate pair
P−F
model (
M4
) trained with the additional symmetryconstraint loss function
Eq. (14).
We also perform a transfer learning experiment by retraining the neural network model
M2
to ensure
519
it retains the observed
β
HMX crystal symmetries as described in Section 4.2. We ﬁrst train the energy
520
conjugate pair
P−F
model (
M2
) for 1000 epochs without any symmetry constraints in the loss function
521
and record the symmetry metrics during training. By applying a rotation
Qsym
on the input deformation
522
gradient tensors, we check for the material response to retain the expected monoclinic symmetry behavior.
523
The check includes the constraints up to the ﬁrstorder derivatives of the network. The trained model
M2
is
524
then retrained with the additional symmetry constraints in Eq.
(14)
for another 1000 epochs (model
M4
).
525
The results for the two training experiments are shown in Fig. 7, where the additional symmetry constraints
526
appear to be improving both the energy and the stress symmetry constraints.527
Remark 1.
Rescaling of the training data
. As a preprocessing step, we have normalized all data to avoid
528
the vanishing or exploding gradient problem that may occur during the backpropagation process [
69
]. The
529
Xisample of a measure Xis scaled to a unit interval via,530
Xi:=Xi−Xmin
Xmax −Xmin
, (36)
where
Xi
is the normalized sample point.
Xmin
and
Xmax
are the minimum and maximum values of the
531
measure
X
in the training data set such that all different types of data used in this paper (e.g. strain, stress,
532
etc) are all normalized within the range
[
0, 1
]
. After scaling all the measures involved in the training of the
533
neural networks to the unit interval, it is noted that no further ﬁnetuning of the multiobjective weight
534
parameters that are present in the loss functions is necessary for convergence.535
6.2 Validation of the constitutive responses536
In this section, we validate the neural network predicted constitutive response against MD simulation data
537
as well as
β
HMX elastic coefﬁcients from the literature. We also monitor the learned physical properties for
538
the trained models, such as the strong ellipticity, the energy growth, and the anisotropy index.539
18 Nikolaos N. Vlassis et al.
6.2.1 Validation against unseen MD simulations540
We validate the predictive performance of the learned models against unseen MD simulation loading paths.
541
The neural network architectures considered in this section are the energy conjugate pair
S−E
model (
M1
)
542
and the energy conjugate pair P−Fmodel (M2).543
−0.10 −0.05 0.00 0.05 0.10
E11
−7
−6
−5
−4
−3
−2
−1
0
1
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10
E22
−12
−10
−8
−6
−4
−2
0
2
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.10 −0.05 0.00 0.05 0.10
E33
−6
−5
−4
−3
−2
−1
0
1
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
(a) (b) (c)
Fig. 8: Comparison of the predicted 2nd PiolaKirchhoff stress response against three uniaxial deformation
MD simulations for the conjugate pair
S−E
model (
M1
). (a) Uniaxial compressive and tensile deforma
tion along the
x1
axis. (b) Uniaxial compressive and tensile deformation along the
x2
axis. (c) Uniaxial
compressive and tensile deformation along the x3axis.
.
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E12
−1.0
−0.5
0.0
0.5
1.0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E23
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E13
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
(a) (b) (c)
Fig. 9: Comparison of the predicted 2nd PiolaKirchhoff stress response against three shear MD simulations
for the conjugate pair
S−E
model (
M1
). (a) Shear tests for positive and negative directions along the
e1⊗e2
direction. (b) Shear tests for positive and negative directions along the
e2⊗e3
direction. (c) Shear tests for
positive and negative directions along the e1⊗e3direction.
The stress predictions of the networks against three uniaxial strains along the axes
x1
,
x2
, and
x3
are
544
demonstrated in Fig. 8and Fig. 11. All the symmetric stress tensor components are plotted against the main
545
loading direction of the MD simulation experiment. The predictions are compared against the raw MD
546
simulation data before the ﬁltering preprocessing described in Section 2.4. The stress predictions for three
547
pure shear MD experiments in the positive and negative
e1⊗e2
,
e2⊗e3
, and
e1⊗e3
directions are shown
548
in Fig. 9and Fig. 12. Finally, the stress predictions for three biaxial compression tests along the
x1
and
x2
,
x2
549
and
x3
, and
x1
and
x3
axes are shown in Fig. 10 and Fig. 13. It is noted that the stress ﬂuctuations in the MD
550
shear data appear to have a larger magnitude than those of the axial simulations. However, the magnitude
551
of the ﬂuctuations of the stress components is similar across all simulations; it appears to be larger in the
552
shear simulations due to the smaller scale of the stress response.553
Training and validation of ML anisotropic constitutive law for βHMX 19
−0.10 −0.08 −0.06 −0.04 −0.02 0.00
E11
−10
−8
−6
−4
−2
0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.08 −0.06 −0.04 −0.02 0.00
E22
−8
−6
−4
−2
0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.10 −0.08 −0.06 −0.04 −0.02 0.00
E11
−14
−12
−10
−8
−6
−4
−2
0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
(a) (b) (c)
Fig. 10: Comparison of the predicted stress response against three biaxial MD simulations for the energy
conjugate pair
S−E
model (
M1
). (a) Biaxial compression along the
x1
and
x2
axes. (b) Biaxial compression
along the x2and x3axes. (c) Biaxial compression along the x1and x3axes.
Both models are able to accurately capture the shear behavior of
β
HMX, which differs greatly in the
554
positive vs. negative directions as seen in Fig. 9and Fig. 12. The shear stress response of the material appears
555
to be highly nonlinear and exhibits directional dependence. This behavior is not expected to be captured
556
by a material model with an invariant formulation, as it requires speciﬁc treatment of the shear response
557
along different directions to replicate the directional dependent behavior even qualitatively. Here, however,
558
a more general representation of the material using the full secondorder stress and strain tensors allows for
559
the neural network to automatically recover this behavior and rather precisely.560
Remark 2.As seen in Fig. 11, the predictions of the models
M2
and
M4
are very close. We have also
561
examined the other predictions and the discrepancies of the stress predictions inferred from
M2
and
M4
562
are also very minor. Hence, we do not include those comparisons in the paper for brevity. In the following
563
sections, the validation tests of the energy conjugate pair
P−F
models’ properties will be performed on the
564
model M2as the behavior of the models M2,M3, and M4was observed to be similar.565
6.2.2 Validation of Strong ellipticity566
In this section, we perform the strong ellipticity tests as described in Section 5.2 on the trained neural
567
network. The neural network architecture used in this comparison is model
M2
, which uses the energy
568
conjugate pair
P−F
. This model was chosen for the convenience of obtaining the fourthorder elasticity
569
tensor needed for the acoustic tensor checks.570
This check is performed by initially predicting the fourthorder elasticity tensor at a speciﬁc deformation
571
gradient level. We sample 1000 unit vectors
N
on the unit sphere
S2
in spherical coordinates by sampling
572
the polar angle
φ∈[
0,
π]
and the azimuthal angle
θ∈[
0,
π]
in a uniform grid, following Eq.
(28)
. These
573
vectors can be used to construct 1000 initial acoustic tensors following Eq.
(20)
. The acoustic tensors will
574
be used as the initial grid landscape for a gradientfree optimizer set to discover the minimum values of
575
the three strong ellipticity tests described in Eqs.
(29)
,
(30)
, and
(31)
. We use a Hill Climbing gradientfree
576
optimizer search, using the library implemented by Blanke [
60
], to ﬁnd the pair of
(φ
,
θ)
that minimizes
577
the strong ellipticity check values. The Hill Climbing algorithm performs 10000 iterations of the search per
578
test to discover the minimum value of the check, which in most tests was obtained within the ﬁrst 5000
579
iterations of the search.580
The predicted strong ellipticity test and the corresponding optimizer search for the minimum values are
581
demonstrated in Fig. 14 and Fig. 15 for two different elasticity tensors. In Fig. 14, we show the ellipticity test
582
results for the elasticity tensor close to the relaxed reference state, that is when the deformation gradient is
583
the identity tensor. The neural network passes all three ellipticity tests, discovering the minimum of all tests
584
to be greater than zero in the unit vector search space. In Fig. 15, we show the ﬁrst strain state of a biaxial
585
compression simulation along the
x1
and
x2
axes where the strong ellipticity test fails – the acoustic tensor
586
determinant for Eq.
(29)
is found to be less than zero for the ﬁrst time (compression of approximately 8%
587
along the x1and x2axes).588
20 Nikolaos N. Vlassis et al.
0.90 0.95 1.00 1.05 1.10
F11
−7
−6
−5
−4
−3
−2
−1
0
1
PiJ (∂ψ
∂FiJ ) (GPa)
0.85 0.90 0.95 1.00 1.05 1.10
F22
−12
−10
−8
−6
−4
−2
0
2
PiJ (∂ψ
∂FiJ ) (GPa)
0.90 0.95 1.00 1.05 1.10
F33
−6
−5
−4
−3
−2
−1
0
1
PiJ (∂ψ
∂FiJ ) (GPa)
(a) (b) (c)
0.90 0.95 1.00 1.05 1.10
F11
−7
−6
−5
−4
−3
−2
−1
0
1
PiJ (∂ψ
∂FiJ ) (GPa)
0.85 0.90 0.95 1.00 1.05 1.10
F22
−12
−10
−8
−6
−4
−2
0
2
PiJ (∂ψ
∂FiJ ) (GPa)
0.90 0.95 1.00 1.05 1.10
F33
−6
−5
−4
−3
−2
−1
0
1
PiJ (∂ψ
∂FiJ ) (GPa)
(d) (e) (f)
−0.10.00.1
F13
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
2.5
Pij (∂ψ
∂Fij ) (GPa)
P11
P22
P33
P12
P23
P13
P21
P32
P31
Fig. 11: Comparison of the predicted 1st PiolaKirchhoff stress response against three uniaxial deformation
MD simulations for the energy conjugate pair
P−F
models
M2
and
M4
. (a,d) Uniaxial compressive and
tensile deformation along the
x1
axis for models
M2
and
M4
respectively. (b,e) Uniaxial compressive and
tensile deformation along the
x2
axis for models
M2
and
M4
respectively. (c,f) Uniaxial compressive and
tensile deformation along the x3axis for models M2and M4respectively.
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
F12
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
PiJ (∂ψ
∂FiJ ) (GPa)
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
F23
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
PiJ (∂ψ
∂FiJ ) (GPa)
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
F13
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
2.5
PiJ (∂ψ
∂FiJ ) (GPa)
(a) (b) (c)
−0.10.00.1
F13
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
2.5
Pij (∂ψ
∂Fij ) (GPa)
P11
P22
P33
P12
P23
P13
P21
P32
P31
Fig. 12: Comparison of the predicted 1st PiolaKirchhoff stress response against three shear MD simulations
for the energy conjugate pair
P−F
model (
M2
). (a) Shear tests along the asymmetric positive and negative
e1⊗e2
direction. (b) Shear tests along the asymmetric positive and negative
e2⊗e3
direction. (c) Shear tests
for the asymmetric positive and negative e1⊗e3direction.
Given that the machine learning generated constitutive responses match very well with the ﬁltered
589
MD simulations (as shown in Figs. 813), the acoustic tensor losing positive deﬁniteness is an indication of
590
Training and validation of ML anisotropic constitutive law for βHMX 21
0.90 0.92 0.94 0.96 0.98 1.00
F11
−10
−8
−6
−4
−2
0
PiJ (∂ψ
∂FiJ ) (GPa)
0.92 0.94 0.96 0.98 1.00
F22
−8
−6
−4
−2
0
PiJ (∂ψ
∂FiJ ) (GPa)
0.90 0.92 0.94 0.96 0.98 1.00
F11
−12
−10
−8
−6
−4
−2
0
PiJ (∂ψ
∂FiJ ) (GPa)
(a) (b) (c)
−0.10.00.1
F13
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
2.5
Pij (∂ψ
∂Fij ) (GPa)
P11
P22
P33
P12
P23
P13
P21
P32
P31
Fig. 13: Comparison of the predicted 1st PiolaKirchhoff stress response against three biaxial MD simulations
for the energy conjugate pair
P−F
model (
M2
). (a) Biaxial compression along the
x1
and
x2
axes. (b) Biaxial
compression along the x2and x3axes. (c) Biaxial compression along the x1and x3axes.
unstable elastic responses corresponding to the shear mode along the
N
direction which could be potentially
591
physical (C. Picu, personal communication, 2021).592
6.2.3 Validation of energy growth for extrapolated predictions593
In this section, we perform the validation check described in Section 5.3 to monitor if the behavior of the
594
predicted energy functional degenerates for very large deformations. To test that, we impose deformation
595
gradients on the neural network model spanning several orders of magnitude with the
det F
decreasing
596
towards zero. The test is performed on the neural network architecture for the energy conjugate pair
597
of
P−F
(model
M2
). As the Jacobian decreases, the energy functional values are expected to increase
598
monotonically (Eq.
(33)
). Therefore, we apply a sequence of volumetric compression deformation gradients
599
with the Jacobian approaching zero and plot the energy against the increasing pure volumetric deformation.
600
The results are shown in Fig. 16.601
Note that the
β
HMX may exhibit plastic yielding or damage under high pressure. When this occurs, it
602
is not physically feasible to have an elastic response. Yet the continuum mechanics theory validation does
603
require that the stored energy approach inﬁnity as a ﬁnite volume of HMX crystal collapses into a point [
61
].
604
This does not happen in our trained neural network model even though the growth rate within the training
605
data interval seems reasonable.606
A similar extrapolation issue has been investigated previously in Versino et al [
70
] in which the symbolic
607
regression requires an additional artiﬁcial data point added in order to prevent an incorrect prediction of
608
softening. Presumably, a similar treatment can also be applied either by adding a very large artiﬁcial data
609
point with a very large energy at the supposedly singular point or by rigorously enforcing the singularity
610
in the learned energy functional. At this point, robust ways to introduce singular data into the neural
611
network and the formulation of the loss function are not clear, but we intend to examine it in future studies.
612
Nevertheless, the results do reveal that the energy functional trained by the neural network may only be
613
valid within the interval of the data and that any extrapolated results outside of the data interval must
614
be used with caution, even if a signiﬁcant number of physical constraints (e.g. material symmetry) have
615
already been applied as auxiliary objectives for the supervised learning.616
6.2.4 Stressdependent anisotropy of HMX crystal617
In this section, we recover the predicted material response anisotropy index as described in Section 5.4.1
618
to monitor the evolution of the material’s degree of anisotropy. The anisotropy index is acquired for the
619
neural network architecture of the energy conjugate pair of
P−F
(model
M2
). To obtain the index, the
620
22 Nikolaos N. Vlassis et al.
(a) (b) (c)
(d) (e) (f)
Fig. 14: Validation of strong ellipticity conditions (a), (b), (c) with the criteria in Eq.
(25)
, Eq.
(26)
, and Eq.
(27)
,
respectively, for an elasticity tensor close to the reference strain state. The unit vectors were sampled from
the surface of a unit sphere to perform the validation via a Hill Climbing gradientfree optimizer search (d,
e, f). The minimum value of the condition value discovered by the optimizer is marked.
fourthorder elasticity tensor is predicted at different deformation gradients along a prescribed loading
621
path. We then sample 1000 unit vectors
N
per deformation gradient in a uniform grid, following Eq.
(28)
,
622
by sampling the polar angle
φ∈[
0,
π]
and the azimuthal angle
θ∈[
0,
π]
. For each elasticity tensor, 1000
623
initial acoustic tensors are constructed according to Eq.
(20)
. The Hill Climbing algorithm performs 10000
624
iterations of the search deformation gradient sample to discover the minimum
v2
1
and the maximum
v2
2
625
values of each acoustic tensor. The anisotropy index
AI
is then calculated using Eq.
(35)
. The anisotropy
626
index calculated for three loading paths is demonstrated in Fig. 17.627
Interestingly, for all three uniaxial compressive deformation cases, the LedbetterMigliori anisotropy
628
index, which is the ratio of the fastest and slowest shear wave speeds of the
β
HMX crystal, all tend to
629
increase signiﬁcantly. The most signiﬁcant changes occur when the uniaxial deformation is more than 8%. In
630
all three cases, the elastic degree of anisotropy is not very profound when the deformation is small. However,
631
in all three cases, the anisotropy index jumps from less than 10 to more than 40 in the deformation along the
632
x1
direction and more than two orders in the
x2
and
x3
direction. These results signify the importance of
633
capturing the evolving anisotropy of the HMX materials.634
6.2.5 Comparisons with literature calculations on elasticity635
We now provide the coefﬁcients of the elastic tangents for the
S−E
and
P−F
conjugated pairs obtained
636
from the trained neural network energy functionals, models
M1
and
M2
and compare them with the
637
elasticity tangent for the σ−econjugated pairs previously reported by Pereverzev and Sewell [2].638
In the present work, the strain measure is obtained differently in the sense that the models in these
639
papers introduce only one reference conﬁguration such that
F=I
when the Cauchy pressure is at 10
−4
640
GPa. Meanwhile, the strain measure in the
β
HMX in [
2
] is reset at different reference pressure where the
641
MD simulation begins. This difference is minor for the atmospheric pressure case, for which the geometrical
642
Training and validation of ML anisotropic constitutive law for βHMX 23
(a) (b) (c)
(d) (e) (f)
Fig. 15: (a) Loss of the strong ellipticity condition in the prediction of a biaxial compression test along the
x1
and
x2
axes. The unit vectors were sampled from the surface of a unit sphere to perform the validation via a
Hill Climbing gradientfree optimizer search (b). The minimum value of the condition value discovered by
the optimizer is marked.
10−610−510−410−310−210−1100
detF
0
1000
2000
3000
4000
W(F)
min detF
Fig. 16: Results for the growth condition check, Eq.
(33)
. The predicted energy is monotonically increasing
as det Fapproaches 0. The minimum det Fin the training data set is also marked.
nonlinearity is insigniﬁcant, but may lead to signiﬁcant differences in the values of the elastic tangent
643
coefﬁcients for highpressure cases. Note that, due to the anisotropic nature of the elastic responses, the
644
imposed Cauchy pressure may also lead to isochoric deformation due to volumetricdeviatoric coupling.
645
As such, the coordinates of the reference and current conﬁgurations
xi
and
XI
are not necessarily coaxial.
646
Hence, a direct comparison of the values of the coefﬁcient is not productive.647
Furthermore, discrepancies may also be caused by the different data denoising processes employed in
648
Pereverzev and Sewell [
2
]. In this paper, we employ a denoising algorithm to ﬁlter out the highfrequent
649
24 Nikolaos N. Vlassis et al.
0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
F11
0
10
20
30
40
anisotropy index, AI
0.90 0.92 0.94 0.96 0.98 1.00
F22
0
20
40
60
80
100
anisotropy index, AI
0.95 0.96 0.97 0.98 0.99 1.00
F11
0
20
40
60
80
100
120
140
anisotropy index, AI
(a) (b) (c)
Fig. 17: Anisotropy index
AI
calculated for the energy conjugate pair
P−F
model (
M2
) for (a) a uniaxial
compressive deformation test along the
x1
axis, (b) a uniaxial compressive deformation test along the
x2
axis, and (c) a biaxial compression test along the x1and x2axes.
oscillation in the constitutive responses before the supervised learning is conducted whereas Pereverzev and
650
Sewell [
2
] employs a ﬁnitedifference approximation with a sufﬁciently large strain increment to calculate
651
the elasticity tensor.652
Nevertheless, a comparison of elasticity tangent operators from previous MD simulations, as well
653
as those obtained for different conjugate stressstrain pairs, does indicate the signiﬁcance of geometric
654
nonlinearity in the material responses and the importance of taking it into consideration in numerical
655
simulations.656
In the MD simulation, the crystal cell of
β
HMX is ﬁrst equilibrated at a target temperature and pressure
657
through an isochoricisothermal (NVT) simulation and then strains are imposed at different directions to
658
obtain the stress information for the differentiation. The comparison between the predicted and literature
659
reported elastic coefﬁcients at 300 K temperature at pressure 10
−4
GPa and 5 GPa is demonstrated in Tables 3
660
and 4, respectively, for the neural network models
M1
and
M2
as described in Section 6.1. It is noted that
661
for the energy conjugate pair of
P−F
Model
M2
, the full tangent requires a
(
9
×
9
)
matrix to represent it in
662
the Voigt notation.663
Table 3: Comparison of the predicted
β
HMX elastic coefﬁcients (GPa) at pressure 10
−4
GPa for the energy
conjugate pair
S−E
(Model
M1
) and the energy conjugate pair
P−F
(Model
M2
) to the ones reported by
Pereverzev and Sewell [
2
] for (
300 K
,
10−4GPa
). Note that
CP−F
is not symmetric but the additional terms
are not shown for brevity.
Model M1Model M2Pereverzev and Sewell [2]
Dij Cijkl CS−ECP−FCσ−e
D11 C1111 21.354 25.861 22.97
D22 C2222 22.149 18.092 22.62
D33 C3333 21.314 21.627 21.67
D44 C1212 8.616 5.8335 8.645
D55 C2323 10.982 8.225 10.407
D66 C1313 9.497 10.078 9.527
D12 C1122 8.789 6.898 9.2
D13 C1133 12.348 12.7828 12.32
D23 C2233 15.913 13.375 12.37
D15 C1123 0.998 0.584 0.43
D25 C2223 4.247 0.877 4.47
D35 C3323 2.192 0.792 1.84
D46 C1213 2.484 1.571 2.248
Training and validation of ML anisotropic constitutive law for βHMX 25
Table 3shows the results of different elastic tangents obtained from the neural network calculation and
664
those obtained from Pereverzev and Sewell [
2
]. While there are differences among the three tangents, they
665
are relatively minor. This is expected as the geometrical nonlinearity is not signiﬁcant.666
Table 4: Comparison of the predicted
β
HMX elastic coefﬁcients (GPa) at pressure 5 GPa for the energy
conjugate pair
S−E
(model
M1
) and the energy conjugate pair
P−F
(model
M2
) with the ones reported
in Pereverzev and Sewell [
2
]. Note that
CP−F
is not symmetric but the additional terms are not shown for
brevity.
Model M1Model M2Pereverzev and Sewell [2]
Dij Cijkl CS−ECP−FCσ−e
D11 C1111 80.157 87.556 87.71
D22 C2222 68.666 53.441 67.08
D33 C3333 71.453 72.011 62.11
D44 C1212 0.358 0.033 19.461
D55 C2323 3.71 2.813 34.08
D66 C1313 2.999 1.736 19.662
D12 C1122 44.048 26.828 36.93
D13 C1133 46.187 32.423 52.95
D23 C2233 55.267 52.603 46.49
D15 C1123 0.939 0.639 11.32
D25 C2223 10.358 6.108 11.1
D35 C3323 0.421 6.120 2.48
D46 C1213 5.546 4.082 6.06
Table 4, on the other hand, shows a more signiﬁcant difference in the numerical values of the coefﬁcients
667
for different energyconjugated pairs. This is consistent with the derivation in Section 5.1 where the defor
668
mation gradient at this point is no longer inﬁnitesimal and the incorporation of the geometrical nonlinearity
669
is necessary to capture the elastic constitutive responses properly.670
7 Conclusions671
This paper introduce a mechanistic machine learning framework to infer anisotropic hyperelasticity energy
672
functional from l from molecular dynamic simulations for
β

HMX
. Conventionally, machine learning
673
constitutive laws are often formulated to match experimental data. As such, the discrepancy between
674
experimental data and the predictions is often the only term in the loss function for training and validation.
675
Here we attempt to formulate the training of hyperelastic model not only to mininizing the discrepancy of
676
data but also introduce additional objectives to ensure that the learned hyperelastic model obey the physics
677
constraints. To ensure the robustness of the predictions, we also introduce a set of validation tests to examine
678
the admissibility (e.g. preserving material symmetry, obeying growth conditions) and stability (convexity,
679
strong ellipticity) of the constitutive responses generated from the trained neural networks. With the usage
680
of Soblev training and automatic differentiation to facilitate the training of constitutive laws, the resultant
681
model exhibit highly accurate predictions within the training data range. These treatments are shown to be
682
effective in improving the accuracy and robustness of the predictions, while the theoretical validation may
683
provide the muchneeded post hoc interpretability of the neural network constitutive laws to understand
684
the properties of the machine learning models. More importantly, the validation exercise may provide a
685
reliable way to reveal the weakness of the models and safeguard against cherrypicking interpretation,
686
which could be a key ingredient to make blackbox neural network predictions more trustworthy.687
26 Nikolaos N. Vlassis et al.
8 Data availability statements688
The code used to conduct the validation tests will be available in a Github repository upon the publication
689
of this manuscript. The datasets generated and/or analyzed during the current study are available from the
690
authors upon reasonable request.691
9 Acknowledgements692
The authors are grateful for the insightful feedback and constructive suggestions given by the three reviewers.
693
Fruitful discussions with Andrey Pereverzev and Bahador Bahmani are gratefully acknowledged. The
694
efforts and labor hours are primarily supported by the Air Force Ofﬁce of Scientiﬁc Research under grant
695
contracts FA95501910318, with additional support provided to WCS and NNV from the the NSF CAREER
696
grant at National Science Foundation under grant contracts CMMI1846875 and OAC1940203, and high
697
performance computing resources provided by Air Force Ofﬁce of Scientiﬁc Research under grant contracts
698
FA95502110027.699
References700
1.
Hooks DE, Ramos KJ, Bolme CA, Cawkwell MJ. Elasticity of crystalline molecular explosives. Propellants,
701
Explosives, Pyrotechnics 2015; 40(3): 333–350.702
2.
Pereverzev A, Sewell T. Elastic Coefﬁcients of
β
HMX as Functions of Pressure and Temperature from
703
Molecular Dynamics. Crystals 2020; 10(12): 1123.704
3. Borja RI. Plasticity: modeling & computation. Springer Science & Business Media . 2013.705
4.
Bryant EC, Sun W. A mixedmode phase ﬁeld fracture model in anisotropic rocks with consistent
706
kinematics. Computer Methods in Applied Mechanics and Engineering 2018; 342: 561–584.707
5.
Ma R, Sun W, Picu CR. Atomisticmodel informed pressuresensitive crystal plasticity for crystalline
708
HMX. International Journal of Solids and Structures 2021; 232: 111170.709
6.
Cady HH, Smith L. Studies on the Polymorphs of HMX. 2652. Los Alamos Scientiﬁc Laboratory of the
710
University of California . 1962.711
7.
Cady HH, Larson AC, Cromer DT. The crystal structure of
α
HMX and a reﬁnement of the structure of
712
βHMX. Acta crystallographica 1963; 16(7): 617–623.713
8.
Das P, Zhao P, Perera D, Sewell T, Udaykumar HS. Molecular dynamicsguided material model for the
714
simulation of shockinduced pore collapse in
β
octahydro1,3,5,7tetranitro1,3,5,7tetrazocine (
β

HMX
).
715
J. Appl. Phys. 2021; 130(8): 085901.716
9. Marsden JE, Hughes TJ. Mathematical foundations of elasticity. Courier Corporation . 1994.717
10. Ogden RW. Nonlinear elastic deformations. Courier Corporation . 1997.718
11.
Holzapfel GA, Ogden RW. On planar biaxial tests for anisotropic nonlinearly elastic solids. A continuum
719
mechanical framework. Mathematics and mechanics of solids 2009; 14(5): 474–489.720
12.
Holzapfel GA, Sommer G, Regitnig P. Anisotropic mechanical properties of tissue components in human
721
atherosclerotic plaques. J. Biomech. Eng. 2004; 126(5): 657–665.722
13.
Latorre M, Mont
´
ans FJ. Anisotropic ﬁnite strain viscoelasticity based on the Sidoroff multiplicative
723
decomposition and logarithmic strains. Computational Mechanics 2015; 56(3): 503–531.724
14. Clayton JD. Nonlinear mechanics of crystals. 177. Springer Science & Business Media . 2010.725
15.
Frankel AL, Jones RE, Swiler LP. Tensor basis gaussian process models of hyperelastic materials. Journal
726
of Machine Learning for Modeling and Computing 2020; 1(1).727
16.
Wang J, Li T, Cui F, Hui CY, Yeo J, Zehnder AT. Metamodeling of constitutive model using Gaussian
728
process machine learning. Journal of the Mechanics and Physics of Solids 2021: 104532.729
17.
Fuhg JN, Bouklas N. On physicsinformed datadriven isotropic and anisotropic constitutive models
730
through probabilistic machine learning and spaceﬁlling sampling. arXiv preprint arXiv:2109.11028 2021.
731
18.
Ghaboussi J, Garrett Jr J, Wu X. Knowledgebased modeling of material behavior with neural networks.
732
Journal of engineering mechanics 1991; 117(1): 132–153.733
19.
Leﬁk M, Schreﬂer BA. Artiﬁcial neural network as an incremental nonlinear constitutive model for a
734
ﬁnite element code. Computer methods in applied mechanics and engineering 2003; 192(2830): 3265–3283.735
Training and validation of ML anisotropic constitutive law for βHMX 27
20.
Heider Y, Wang K, Sun W. SO (3)invariance of informedgraphbased deep neural network for
736
anisotropic elastoplastic materials. Computer Methods in Applied Mechanics and Engineering 2020; 363:
737
112875.738
21.
Frankel AL, Jones RE, Alleman C, Templeton JA. Predicting the mechanical response of oligocrystals
739
with deep learning. Computational Materials Science 2019; 169: 109099.740
22.
Le B, Yvonnet J, He QC. Computational homogenization of nonlinear elastic materials using neural
741
networks. International Journal for Numerical Methods in Engineering 2015; 104(12): 1061–1084.742
23.
Teichert GH, Natarajan A, Ven V. dA, Garikipati K. Machine learning materials physics: Integrable deep
743
neural networks enable scale bridging by learning free energy functions. Computer Methods in Applied
744
Mechanics and Engineering 2019; 353: 201–216.745
24.
Vlassis NN, Ma R, Sun W. Geometric deep learning for computational mechanics Part I: Anisotropic
746
Hyperelasticity. Computer Methods in Applied Mechanics and Engineering 2020; 371: 113299.747
25.
Vlassis NN, Sun W. Sobolev training of thermodynamicinformed neural networks for interpretable
748
elastoplasticity models with level set hardening. Computer Methods in Applied Mechanics and Engineering
749
2021; 377: 113695.750
26.
Czarnecki WM, Osindero S, Jaderberg M,
´
Swirszcz G, Pascanu R. Sobolev Training for Neural Networks.
751
In: ; 2017; Long Beach, CA, USA.752
27.
Cuitino A, Ortiz M. A materialindependent method for extending stress update algorithms from
753
smallstrain plasticity to ﬁnite plasticity with multiplicative kinematics. Engineering computations 1992.754
28. Plimpton S. Fast Parallel Algorithms for ShortRange Molecular Dynamics. J. Comp. Phys. 1995; 117: 1.755
29.
Smith GD, Bharadwaj RK. Quantum Chemistry Based Force Field for Simulations of
HMX
.J. Phys.
756
Chem. B 1999; 103: 3570.757
30.
Bedrov D, Smith GD, Sewell TD. Thermal Conductivity of Liquid Octahydro1,3,57Tetranitro1,3,5,7
758
Tetrazocine (HMX) from Molecular Dynamics Simulations. Chem. Phys. Lett. 2000; 324: 64.759
31. Kroonblawd MP, Mathew N, Jiang S, Sewell TD. A Generalized CrystalCutting Method for Modeling760
Arbitrarily Oriented Crystals in 3D Periodic Simulation Cells with Applications to CrystalCrystal
761
Interfaces. Comput. Phys. Commun. 2016; 207: 232.762
32.
Mathew N, Sewell T. PressureDependent Elastic Coefﬁcients of
β

HMX
from Molecular Simulations.
763
Prop., Explos., Pyrotech. 2018; 43: 233.764
33.
Chitsazi R, Kroonblawd MP, Pereverzev A, Sewell TD. A Molecular Dynamics Simulation Study of
765
Thermal Conductivity Anisotropy in
β
Octahydro1,3,5,7Tetranitro1,3,5,7Tetrazocine (
β

HMX
). Model.
766
Simul. Mater. Sc. 2020; 28: 025008.767
34.
Zhao P, Lee S, Sewell T, Udaykumar HS. Tandem Molecular Dynamics and Continuum Studies of
768
ShockInduced Pore Collapse in TATB. Propellants, Explos. Pyrotech. 2020; 45: 1.769
35.
Kroonblawd MP, Fried LE. High Explosive Ignition through Chemically Activated Nanoscale Shear
770
Bands. Phys. Rev. Lett. 2020; 124: 206002.771
36. Hockney RW, Eastwood JW. Computer Simulation Using Particles. New York, NY: Hilger . 1988.772
37.
Verlet L. Computer “Experiments” on Classical Fluids. I. Thermodynamical Properties of LennardJones
773
Molecules. Phys. Rev. 1967; 159: 98.774
38.
Swope WC, Andersen HC, Berens PH, Wilson KR. A Computer Simulation Method for the Calculation
775
of Equilibrium Constants for the Formation of Physical Clusters of Molecules: Application to Small
776
Water Clusters. J. Chem. Phys. 1982; 76: 637.777
39.
Nos
´
e S. A Uniﬁed Formulation of the ConstantTemperature MolecularDynamics Methods. J. Chem.
778
Phys. 1984; 81: 511.779
40. Hoover WG. Canonical Dynamics: Equilibrium PhaseSpace Distributions. Phys. Rev. A 1985; 31: 1695.780
41. LAMMPS Molecular Dynamic Simulator. LAMMPS is available at http://lammps.sandia.gov.781
42.
Muti D, Bourennane S. Multidimensional ﬁltering based on a tensor approach. Signal Processing 2005;
782
85(12): 2338–2353.783
43.
Klein D, Fern
´
andez M, Martin RJ, Neff P, Weeger O. Polyconvex anisotropic hyperelasticity with neural
784
networks. 2021.785
44.
Vogiatzis GG, Breemen vLC, Theodorou DN, H
¨
utter M. Free energy calculations by molecular simula
786
tions of deformed polymer glasses. Computer Physics Communications 2020; 249: 107008.787
45.
Eiland PF, Pepinsky R. The crystal structure of cyclotetramethylene tetranitramine. Zeitschrift f¨ur
788
KristallographieCrystalline Materials 1954; 106(16): 273–298.789
28 Nikolaos N. Vlassis et al.
46.
Simo J, Fox D, Rifai M. On a stress resultant geometrically exact shell model. Part II: The linear theory;
790
computational aspects. Computer Methods in Applied Mechanics and Engineering 1989; 73(1): 53–92.791
47.
Mavrotas G. Effective implementation of the
ε
constraint method in multiobjective mathematical
792
programming problems. Applied mathematics and computation 2009; 213(2): 455–465.793
48.
Abraham R, Marsden JE, Ratiu T. Manifolds, tensor analysis, and applications. 75. Springer Science &
794
Business Media . 2012.795
49.
Ortiz M, Radovitzky R, Repetto E. The computation of the exponential and logarithmic mappings
796
and their ﬁrst and second linearizations. International Journal for Numerical Methods in Engineering 2001;
797
52(12): 1431–1441.798
50.
Miehe C. Comparison of two algorithms for the computation of fourthorder isotropic tensor functions.
799
Computers & structures 1998; 66(1): 37–43.800
51.
Ghaboussi J, Pecknold DA, Zhang M, HajAli RM. Autoprogressive training of neural network constitu
801
tive models. International Journal for Numerical Methods in Engineering 1998; 42(1): 105–126.802
52.
Pernot S, Lamarque CH. Application of neural networks to the modelling of some constitutive laws.
803
Neural Networks 1999; 12(2): 371–392.804
53.
Hoerig C, Ghaboussi J, Insana MF. Datadriven elasticity imaging using cartesian neural network
805
constitutive models and the autoprogressive method. IEEE transactions on medical imaging 2018; 38(5):
806
1150–1160.807
54.
Fuhg JN, Marino M, Bouklas N. Local approximate Gaussian process regression for datadriven con
808
stitutive laws: Development and comparison with neural networks. arXiv preprint arXiv:2105.04554
809
2021.810
55.
Huang DZ, Xu K, Farhat C, Darve E. Learning constitutive relations from indirect observations using
811
deep neural networks. Journal of Computational Physics 2020; 416: 109491.812
56.
Hartmann S, Neff P. Polyconvexity of generalized polynomialtype hyperelastic strain energy functions
813
for nearincompressibility. International journal of solids and structures 2003; 40(11): 2767–2791.814
57.
Merodio J, Neff P. A note on tensile instabilities and loss of ellipticity for a ﬁberreinforced nonlinearly
815
elastic solid. Archives of Mechanics 2006; 58(3): 293–303.816
58.
Miehe C, Schr
¨
oder J, Becker M. Computational homogenization analysis in ﬁnite elasticity: material
817
and structural instabilities on the microand macroscales of periodic composites and their interaction.
818
Computer Methods in Applied Mechanics and Engineering 2002; 191(44): 4971–5005.819
59.
Mota A, Chen Q, Foulk III JW, Ostien JT, Lai Z. A Cartesian parametrization for the numerical analysis
820
of material instability. International Journal for Numerical Methods in Engineering 2016; 108(2): 156–180.821
60.
Simon Blanke . GradientFreeOptimizers: Simple and reliable optimization with local, global,
822
populationbased and sequential techniques in numerical search spaces..
https://github.com/823
SimonBlanke; since 2020.824
61.
Rosakis P, Simpson HC. On the relation between polyconvexity and rankone convexity in nonlinear
825
elasticity. Journal of elasticity 1994; 37(2): 113–137.826
62.
Li Z, Bradt RC. The singlecrystal elastic constants of cubic (3C) SiC to 1000 C. Journal of materials science
827
1987; 22(7): 2557–2559.828
63. Kube CM. Elastic anisotropy of crystals. AIP Advances 2016; 6(9): 095209.829
64.
Ranganathan SI, OstojaStarzewski M. Universal elastic anisotropy index. Physical Review Letters 2008;
830
101(5): 055504.831
65.
Ledbetter H, Migliori A. A general elasticanisotropy measure. Journal of applied physics 2006; 100(6):
832
063516.833
66. Chollet F, others . Keras. https://keras.io; 2015.834
67.
Abadi M, Agarwal A, Barham P, et al. TensorFlow: LargeScale Machine Learning on Heterogeneous
835
Systems. 2015. Software available from tensorﬂow.org.836
68. Dozat T. Incorporating nesterov momentum into adam. 2016.837
69. Bishop CM, others . Neural networks for pattern recognition. Oxford university press . 1995.838
70.
Versino D, Tonda A, Bronkhorst CA. Data driven modeling of plastic deformation. Computer Methods in
839
Applied Mechanics and Engineering 2017; 318: 981–1004.840