Content uploaded by Waiching Sun
Author content
All content in this area was uploaded by Waiching Sun on Mar 21, 2022
Content may be subject to copyright.
International Journal for Numerical Methods in Engineering manuscript No.
(will be inserted by the editor)
Molecular dynamics inferred neural network models for ﬁnitestrain1
hyperelasticity of monoclinic crystals: Sobolev training and validations2
against physical constraints3
Nikolaos N. Vlassis ·Puhan Zhao ·Ran Ma ·Tommy4
Sewell ·WaiChing Sun5
6
January 30, 20227
Abstract
We present a machine learning framework to train and validate neural networks to predict the
8
anisotropic elastic response of a monoclinic organic molecular crystal known as Octogen (
β
HMX) in the
9
geometrical nonlinear regime. A ﬁltered molecular dynamic (MD) simulations database is used to train the
10
neural networks with a Sobolev norm that uses the stress measure and a reference conﬁguration to deduce
11
the elastic stored energy functional. To improve the accuracy of the elasticity tangent predictions originating
12
from the learned stored energy, a transfer learning technique is used to introduce additional tangential
13
constraints from the data while necessary conditions (e.g. strong ellipticity, crystallographic symmetry) for
14
the correctness of the model are either introduced as additional physical constraints or incorporated in the
15
validation tests. Assessment of the neural networks is based on (1) the accuracy with which they reproduce
16
the bottomline constitutive responses predicted by MD, (2) the robustness of the models measured by
17
detailed examination of their stability and uniqueness, and (3) the admissibility of the predicted responses
18
with respect to mechanics principles in the ﬁnitedeformation regime. We compare the neural networks’
19
training efﬁciency under different Sobolev constraints and assess the models’ accuracy and robustness
20
against MD benchmarks for βHMX.21
Keywords HMX, molecular dynamics, Sobolev training, hyperelasticity, deep learning22
1 Introduction23
Plasticbonded explosives (PBXs) are highly ﬁlled polymer composites in which crystallites of one or more
24
energetic constituents are held together by a continuous polymeric binder phase. Detonation initiation in
25
PBXs is often achieved by transmitting a mechanical shock wave into the explosive charge. Shock passage
26
leads to an abrupt increase in stress, strain, and temperature in the material. In thermodynamic terms, the
27
magnitude of the increase of these properties is given by the Hugoniot jump relations, which yield the
28
locus of thermodynamic states immediately behind the shock discontinuity as a function of the input shock
29
strength (with a parametric dependence on the initial thermodynamic state of the material). The elastic
30
properties of the constituents in a PBX play an important role in determining the states on the Hugoniot
31
locus. The most obvious connection is their appearance in the reactant equation of state (EOS). For a useful
32
summary, see Hooks et al [
1
]. The isotropic EOS can be built around the isothermal compression curve,
33
typically by ﬁtting
V=V(P)
to the 3rdorder BirchMurnaghan (BM) equation of state or some other
34
convenient functional form at room temperature or zero kelvin. For the BM EOS, the ﬁtting variables are
35
the bulk modulus
K
and the initial pressure derivative
K0
. More comprehensive models may account for
36
Nikolaos N. Vlassis, Ran Ma, WaiChing Sun (corresponding author)
Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, New York
Puhan Zhao, Tommy Sewell
Department of Chemistry, University of Missouri, Columbia, Missouri
2 Nikolaos N. Vlassis et al.
crystal elastic anisotropy by incorporating the full elasticity tensor. The advantage of incorporating the full
37
elasticity tensor is the higher ﬁdelity description of the elastic response. However, identifying necessary
38
material parameters may require inverse problems under shock conditions with precise measurement of
39
the pressure and temperature dependence of the elastic coefﬁcients. Hence, the material parameters are
40
often inferred from the results of molecular dynamics simulations instead of experiments [
2
]. Furthermore,
41
the possible coupling between the volumetric and deviatoric responses may make it even more difﬁcult to
42
formulate the proper inverse problem and determine the optimal set of parameters [3,4,5].43
The substance octahydro1,3,5,7tetranitro1,3,5,7tetrazocine (HMX, also called octogen due to the
44
symmetry of the molecular structure), as is the energetic constituent in many PBXs. HMX exhibits several
45
crystal polymorphs [
6
]. The thermodynamically stable form on the 300 K isotherm, for pressures between
46
0 and approximately 30 GPa is known as
β
HMX, for which the crystal structure is monoclinic with a
47
unit cell containing two molecules [
7
]. Numerous theoretical studies of HMX physical properties and
48
thermomechanical response to shocks have been reported; we do not discuss them here, but Das et al [
8
]
49
provide a recent entry point into that literature. All MD simulations discussed below were performed for
50
βHMX in the P21/nspace group setting.51
Previous work, such as Pereverzev and Sewell [
2
] for the case of
β
HMX, has obtained pressure and
52
temperaturedependent elastic coefﬁcients by applying small strain increments to a sample at thermal
53
equilibrium at the desired thermodynamic state and determining the corresponding stress and elasticity
54
tangential tensor at that state. Here we assume that the ﬁnite strain elasticity that of a Green elastic material or
55
hyperelastic material [
9
,
10
]. As such, we postulate that (1) the state of the stress in the current conﬁguration
56
can be solely determined by the state of the deformation of the current conﬁguration relative to one choice
57
of a reference conﬁguration such as the crystal lattice vectors at (300 K, 1 atm) and (2) there exists an elastic
58
stored energy functional of which the derivative with respect to the strain measure is the energyconjugated
59
stress measure. Compared with the former approach, which tabulates the elasticity tensor at prescribed
60
states for a given pressure and temperature, the hyperelasticity approach has several distinct advantages.
61
First, the prediction of the elastic strain energy, stress measure, and elastic tangential stress are all bundled
62
together into one scalarvalued tensor function, instead of separate calculations for stress and elastic tangent
63
that might not be consistent with each other. Second, unlike the more widely used tabular approach, the
64
hyperelasticity model does not require pressure as an input to predict elastic constitutive responses and
65
hence enables consistency easily. Finally, by assuming the existence of such an elastic stored energy, the
66
stability, and uniqueness of the constitutive responses as well as other attributes such as convexity, material
67
frame indifference, and symmetry can be more easily analyzed mathematically [10,3].68
Nevertheless, with a few exceptions, such as Holzapfel and Ogden [
11
], Holzapfel et al [
12
], and Latorre
69
and Mont
´
ans [
13
], the majority of hyperelasticity models are limited to isotropic materials or materials of
70
simple symmetry such as transverse isotropic and orthotropic. Hyperelastic models for materials of lower
71
symmetry such as monoclinic or triclinic are less common [
14
]. This can be attributed to the fact that the
72
strain and the stress for anisotropic materials are not necessarily coaxial, and handcrafting a mathematical
73
expression for the energy functional that leads to accurate predictions of stress and tangent, therefore,
74
becomes a challenging task. Alternatively, predictions of elastic responses can also be established via a
75
Gaussian process to generate constitutive laws (e.g. Frankel et al.[
15
] and Wang et al. [
16
], Fuhg and Bouklas
76
[
17
]). This family of nonparametric approaches is out of the scope of this study but will be considered in
77
the future.78
To overcome this technical barrier, we introduce a transfer learning approach that generates a neural
79
network model for the hyperelastic response of
β
HMX from molecular dynamics (MD) simulations. Our
80
new contributions, to the best knowledge of the authors, are listed below:81
1.
Traditional supervised learning approaches often employ objective/loss functions that match the stress
82
strain responses [
18
,
19
,
20
,
21
], the elastic stored energy [
22
,
23
], or matching the energy, stress, or elastic
83
tangent ﬁelds [
24
,
25
] with the raw data considered as the ground truth. Many of these supervised
84
learning models are obtained via training deep neural networks. This direct approach, however, is
85
not suitable for MD data where the change of one state to another will lead to ﬂuctuation that makes
86
direct Sobolev training not productive [
26
]. To overcome this problem, we introduce a pretraining step
87
in which the data are preprocessed through a ﬁlter and the underlying nonﬂuctuating patterns are
88
extracted to train the neural network models.89
Training and validation of ML anisotropic constitutive law for βHMX 3
2.
We introduce a transfer learning approach where the additional desirable attributes (e.g. frame invari
90
ance) and necessary conditions for the correctness of the constitutive laws (e.g. material symmetry) can
91
be enforced with a simple retraining.92
3.
We also introduce a posttraining validation procedure where the focus is not only on predicting stress
93
strain responses but on the desirable properties of the elastic tangential operator. To compare to the
94
previous literature that employs measures in the geometrical linear regime to measure anisotropy, we
95
introduce a reverse mapping [
27
] that generates the inﬁnitesimal smallstrain tangent from the ﬁnite
96
strain counterpart. With these metrics available, we can examine the convexity and strong ellipticity
97
of the learned function and also evaluate whether predicted constitutive responses exhibit the same
98
evolution
of anisotropy as the MD benchmark while ensuring that the ﬁltering process does not lead to
99
nonphysical responses at the continuum level. The accuracy of the model is assessed by comparing
100
MDsimulated and learned stresses as functions of strain, and by comparing the pressuredependent
101
tangent stiffness from the learned model against explicit predictions of the elastic tensor reported recently
102
[
2
] for
β
HMX states on the
300 K
hydrostatic isothermal compression curve. The latter comparison, in
103
particular, provides an incisive test of the accuracy of the learned functional, as this information was not
104
used explicitly as part of the training set.105
The rest of the paper is organized as follows. We ﬁrst provide a brief account of the database generation
106
procedure, including pertinent details of the MD simulations, the procedure to generate stressstrain data
107
from the MD predictions, and the procedure to ﬁlter out the highfrequency responses (Section 2). We
108
brieﬂy review the setup of our hyperelastic model (Section 3) and then outline the major ingredients for
109
the supervised learning of the hyperelastic energy functional, including the Sobolev training, the Hessian
110
sampling techniques for controlling the higherorder derivatives and the way to incorporate the physical
111
constraints in the training procedure (Section 4). This section is followed by the validation procedure that
112
tests the attributes of the learned hyperelasticity models with physical constraints not included in the
113
training problems (Section 5). The results of the numerical experiments are reported in Section 6followed
114
by concluding remarks in Section 7.115
As for notations and symbols, boldfaced and blackboard boldfaced letters denote tensors (including
116
vectors which are rankone tensors); the symbol ’
·
’ denotes a single contraction of adjacent indices of two
117
tensors (e.g.,
a·b=aibi
or
c·d=cij djk
); the symbol ‘:’ denotes a double contraction of adjacent indices of
118
tensors of rank two or higher (e.g.,
C:ε
=
Cijk l εkl
); the symbol ‘
⊗
’ denotes a juxtaposition of two vectors
119
(e.g.,
a⊗b=aibj
) or two symmetric secondorder tensors [e.g.,
(α⊗β)ijkl =αij βkl
]. We also deﬁne identity
120
tensors:
I=δij
,
I=δikδjl
, and
¯
I=δil δjk
, where
δij
is the Kronecker delta. We denote the Eulerian coordinate
121
as
{x1
,
x2
,
x3}
and the corresponding three orthogonal basis vectors as
e1
,
e2
, and
e3
accordingly. As for sign
122
conventions, unless speciﬁed, the directions of the tensile stress and dilative pressure are considered as
123
positive.124
2 Database generation via molecular dynamics simulations125
In this section, we discuss the speciﬁcs of the MD simulation setup used to generate the database used for
126
the hyperelastic energy functional discovery. We provide a theoretical background for the simulations as
127
well as details on the system setup. We demonstrate the output results for the simulations and describe the
128
postprocessing procedure to render them suitable for our machine learning algorithms.129
Training data for the neural networks are obtained by computing the Cauchy stress tensor for isothermal
130
samples as functions of imposed tensorial strains. The strains used correspond variously to uniaxial
131
compression or tension, pure shear, and combination strains. The imposed strains are restricted to states
132
below the threshold for mechanical failure of
β
HMX as predicted by the MD. By learning the underlying
133
freeenergy functional, we can extract the hyperelastic response from secondorder and higherorder strain
134
derivatives. Note that whereas the MD reﬂects the underlying free energy, it does not yield the energy
135
functional property in a simply computable way.136
4 Nikolaos N. Vlassis et al.
2.1 Force ﬁeld137
The MD simulations were performed using LAMMPS [
28
] in conjunction with a modiﬁed version of the
138
allatom, fully ﬂexible, nonreactive force ﬁeld originally developed for HMX by Smith and Bharadwaj (SB).
139
[
29
,
30
,
31
,
32
,
33
] Intramolecular interactions are modeled using harmonic functions for covalent bonds,
140
threecenter angles, and improper dihedral (”wag”) angles; and truncated cosine expansions for proper
141
dihedrals. Intermolecular nonbonded interactions between atoms separated by three or more covalent
142
bonds are modeled using Buckinghampluscharge (exponential61) pair terms. Here and in Refs. [
34
,
35
,
8
],
143
a steep repulsive pair potential was incorporated between nonbonded atom pairs to prevent ‘overtopping’
144
of the exponential61 potential at short nonbonded separations
R
, which can occur under shockwave
145
loading due to the global maximum in the potential at distances of approximately 1
˚
A with a divergence
146
to negative inﬁnity as
R→
0. Evaluation of dispersion and Coulomb pair terms was computed using the
147
particleparticle particlemesh (PPPM) kspace method [
36
] with a cutoff value of 11
˚
A and with the PPPM
148
precision set to 10−6.149
2.2 MD Simulation cell setup150
Threedimensionally periodic (3D) primary simulation cells were generated starting from the unitcell
151
lattice parameters for
β
HMX (P2
1
/n space group setting) predicted by the force ﬁeld (at 300 K and
152
1 atm), by simple replication of the unit cell in 3D space. This results in a monoclinicshaped primary
153
simulation cell. The mapping of the crystal frame to the Cartesian lab frame is
akˆ
x
,
bkˆy
, and
c
in the +z
154
space. Starting primary cell sizes for the uniaxial compressive and uniaxial tensile deformation cases were
155
approximately 30 nm parallel to the strain direction and approximately 10 nm transverse to it; those for
156
pure shear deformation were approximately 10 nm
×
10 nm
×
10 nm; and those for biaxial compression
157
were approximately 30 nm
×
30 nm
×
30 nm. Figure 1depicts a unit cell of
β
HMX and snapshots of
158
representative simulation cells prior to the beginning of deformation. Table 1contains details of the system
159
sizes used.160
Fig. 1: Unit cell of
β
HMX (panel (a)) and snapshots of representative simulation cells for (b) uniaxial
compressive and tensile deformation, (c) shear deformation, and (d) biaxial compression. Cyan for carbon,
navy for nitrogen, red for oxygen, and white for hydrogen.
Training and validation of ML anisotropic constitutive law for βHMX 5
Table 1: System sizes for uniaxial compressive and tensile deformation, pure shear deformation, and biaxial
compression production simulations.
Simulation Lx(nm) Ly(nm) Lz(nm) Number of Molecules
Compression/Tension along ˆ
x30.3 10.5 10.6 12,880
Compression/Tension along ˆy10.5 30.3 10.6 12,992
Compression/Tension along ˆz10.5 10.5 30.4 12,800
Shear deformation 10.5 10.5 10.6 4,480
Biaxial compression 30.3 30.3 30.4 106,720
2.3 Simulation details161
MD trajectories were propagated using the velocity Verlet integrator in LAMMPS [
37
,
38
]. Primary cells
162
constructed as described in the preceding paragraph were thermally equilibrated in the isochoricisothermal
163
(NVT) ensemble at 300 K by initially selecting atomic velocities from the 300 K Maxwell distribution followed
164
by 20 ps of trajectory integration. Temperature control was achieved using the Nos
´
eHoover thermostat
165
[
39
,
40
] as implemented in LAMMPS with the damping parameter set to
50.0 fs
. A 0.2 fs time step was used
166
for the thermal equilibration.167
Fifteen isothermal MD production simulations, comprising three apiece for uniaxial compression,
168
uniaxial tension, and biaxial compression, and six for pure shear (i.e., positive and negative shear directions
169
for three distinct shear cases) were performed at
T=
300 K using NVT integration in conjunction with the
170
LAMMPS ﬁx deform command. The integration time step was 0.20 fs and the thermostat damping parameter
171
was set to 20.0 fs. The system potential energy, temperature, pressure, Cauchy stresstensor components,
172
and primary cell lattice vectors were recorded at 10 fs intervals for subsequent analysis.173
For the uniaxial compressive and tensile deformation simulations, the prescribed strain was applied
174
parallel to the long direction of the primary cell while holding both the transverse cell lengths and the tilt
175
factors constant. The strain rate was set to the constant value
±
0.1/100 ps, applied uniformly at each time
176
step. The uniaxial deformation simulations were performed for 300 ps, resulting in a total strain of 0.3 for
177
those cases.178
For the shear simulations, the system was deformed along with one of the three tilt factors (i.e., xy,
179
xz, and yz) while the cell edge lengths were maintained at constant values. A constant strain rate of
±180
0.1/100 ps was applied for 300 ps, resulting in total positive or negative shear strains of 0.3.181
For the biaxial compression simulations, the primary cell was compressed along two axes simultaneously
182
in the lab frame (i.e.,
x
and
y
,
y
and
z
, or
x
and
z
) while holding the third cell length and the tilt factors
183
constant. The strain rate was set to
±
0.05/100 ps along both directions. Trajectory integration was performed
184
for 300 ps resulting in a strain of 0.15 along each of the two affected directions.185
2.3.1 MD results186
Figure 2contains the system potential energy, pressure, Cauchy stresstensor components, and lattice vectors
187
vs. time for the case of uniaxial compressive deformation along
ˆy
. The effects of deformation are evident in
188
the potential energy and stresstensor components (panels (a) and (c)), where it can be seen that the sample
189
yields at
t≈190 ps
. Data collected from the beginning of the simulations up to approximately 10 ps before
190
failure were used to train the energy functional.191
The Cauchy stress is obtained from the standard LAMMPS command and the expression can be found
192
there (cf. [41]).193
2.4 Filtering MD simulation data194
The raw data from the MD simulations are not expected to be smooth, due to thermal ﬂuctuations. These
195
ﬂuctuations may depend on the thermostat employed and the size of the system. This temperature ﬂuc
196
6 Nikolaos N. Vlassis et al.
Fig. 2: From MD, system (a) potential energy, (b) pressure, (c) Cauchy stresstensor components, and (d)
lattice vectors vs. time for uniaxial compressive deformation along ˆy.
tuation, however, is not supposed to be captured by the hyperelasticity energy functional, which is only
197
designed to capture the macroscopic constitutive responses.198
To deal with the MD data, we can either introduce a regularization process during the machine learning
199
training or we can simply ﬁlter out the Gaussian noise that might otherwise affect the convexity and
200
therefore the stability of the hyperelasticity model.201
While one can ﬁlter the Cauchy stress tensor on a componentbycomponent basis, such a strategy may
202
lead to a ﬁltered Cauchy stress that depends on the coordinate system. Thus, this strategy should be avoided.
203
While there are potentially more sophisticated techniques for ﬁltering tensorial and multidimensional data
204
(e.g. Muti and Bourennane [
42
]), here we introduce a spectral decomposition on the Cauchy stress such that
205
σ=
3
∑
a=1
σana⊗na. (1)
Following this step, a 1D moving average ﬁlter is applied to each of the eigenvalues of the Cauchy stress
206
and to the Euler angles that represent the orthogonal basis vector—
na
. To remove the noise, we used a 1D
207
uniform ﬁlter on the data series that works similar to a rollingaverage window. The temporal length of the
208
ﬁlter window is equal to that of 3 ps (300 MD observations). This length of the ﬁlter window is selected after
209
a manual trialanderror such that we may suppress the noise of the tensorial time series without greatly
210
distorting the global recorded constitutive response. Note that highly ﬂuctuated stress data may increase
211
the difﬁculty of Sobolev training the hyperelasticity energy functional but also affect the stability of the
212
constitutive responses at the continuum scale. Hence, this preliminary step is necessary.213
To examine whether the ﬁlter introduces signiﬁcant bias to the ﬁlter data, we apply our ﬁltering
214
procedure to two MD simulations with the same strain path but initiated from different initial conditions
215
and using different values for the thermostat coupling parameter. The ﬁltered and unﬁltered constitutive
216
responses are compared for both cases, as shown in Fig. 3. The two MD simulations demonstrate different217
ﬂuctuation patterns but the ﬁltered responses are very close The uniform ﬁlter used to process the data
218
appears to capture almost identical behaviors for both simulations.219
Training and validation of ML anisotropic constitutive law for βHMX 7
−0.030−0.025−0.020−0.015−0.010−0.005
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S11 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
−0.06 −0.05 −0.04 −0.03 −0.02 −0.01
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S12 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
−0.06−0.04−0.02
E11
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
S12 (GPa)
MD Simulation A
Filtered Data A
MD Simulation B
Filtered Data B
Fig. 3: Filtering of MD simulation data with a uniform ﬁlter for a compression test along the
x1
axis. The
ﬁltering is performed for two MD simulations with different thermostat coupling parameters and thus
different RMS ﬂuctuations about the local mean value of the stress along the trajectories.
3 Finite strain hyperelastic neural network functional for βHMX220
In this work, we will approximate a ﬁnite strain hyperelastic energy functional for
β
HMX using a feed
221
forward neural network architecture trained with a modiﬁed Sobolev training loss function that incorporates
222
additional physical constraints via a transfer learning technique.223
The following assumptions and setup have been made:224
1.
There exists one reference conﬁguration for the
β
HMX for which the stored elastic energy is zero. This
225
conﬁguration constitutes the reference conﬁguration for the deformation mapping.226
2. We assume that all the data used in the training are purely elastic with no path dependence.227
3. Thermomechanical and ratedependence effects on the elasticity are neglected.228
4. A ﬁlter is used to reduce the highfrequency responses.229
The stored energy functional
¯
ψ
can be written as a function of the deformation gradient
F
. The ﬁrst
230
PiolaKirchhoff stress
P
is conjugate to the deformation gradient
F
and can be obtained from the following
231
relation,232
P(F) = ∂¯
ψ(F)
∂F. (2)
Notice that a necessary condition for this energy functional to be correct is the materialframe indifference.
233
Here the deformation gradient is not sensitive to rigidbody translation. However, to ensure the
SO(
3
)234
equivalence, the machine learning generated energy functional must satisfy the following constraint,235
¯
ψ(F) = ¯
ψ(QF),∀Q∈SO(3)(3)
A possible way to bypass the need to introduce additional constraints in the loss function is to derive
236
the energy functional as a function of the Green strain tensor Efor which:237
E0=1
2(C0−I) = 1
2(F0T·F0−I) = 1
2(FT·QT·Q·F−I) = 1
2(FT·F−I) = 1
2(C−I) = E, (4)
so we then acquire an equivalent expression:238
¯
ψ(F) = ψ(E). (5)
The second PiolaKirchhoff stress Sis conjugate to the Green strain E, which is derived as:239
S(E) = ∂ψ
∂E. (6)
8 Nikolaos N. Vlassis et al.
The transformations between the two stress measures
(P)
and
S
and the Cauchy stress tensor as recorded
240
by the MD simulations are deﬁned as:241
P=JσF−T,S=JF−1σF−Tand S=F−1P. (7)
where Jis the determinant of the deformation gradient F.242
In addition to the frame invariance, another major beneﬁt of expressing the energy functional in terms of
243
the Green strain tensor is that the resultant stress measure is symmetric and the elastic tangential operator244
possesses both major and minor symmetries. These symmetries may reduce the dimension of the input
245
parametric space 9 to 6 and hence simplify the training. Furthermore, while
C
and
E
can both be used as
246
the input for the inherently frameindifferent energy functional that yields
S
as the ﬁrst derivative,
E=0247
implies the energy functional becomes zero. Meanwhile, training
¯
ψ(F)
as the learned function can be more
248
convenient for implicit total Lagrangian solvers where the tangent corresponding to
P−F
is required to
249
solve the linearized system of equation.250
As such, we will train two hyperelasticity functionals,
¯
ψ(F)
and
ψ(E)
, which take the deformation
251
gradient and the Green strain tensor as inputs respectively. We will then compare the results obtained from
252
numerical experiments. The relationships among elasticity tangential tensors corresponding to different
253
stressstrain conjugate pairs will also be discussed in Section 5.254
255
Remark 1
It should be noticed that there are other feasible choices, such as the cofactors of deformation
256
gradient or strain invariants, that may ensure material symmetry [
12
], guarantee polyconvexity [
43
], and
257
ensure material frame indifference. Our choices of directly using the deformation gradient and the Green
258
strain tensor are mainly for convenience and ease of implementation. In the case of
¯
ψ(F)
, the training
259
procedure is more complicated due to the necessity of enforcing frame invariance. However, the direct
260
access of the
P−F
conjugate tangential stiffness may simplify the implementation of total Lagrangian code.
261
262
4 Stressbased Sobolev training for storedenergy function263
We introduce a neural network training technique that constructs the hyperelasticity energy functional using
264
solely the stress data and a single reference conﬁguration where
F=I
. Recall that a feedforward neural
265
network can be trained to approximate an energy functional
ψ
that takes the GreenLagrange deformation
266
tensor
E
as input. This energy function is parametrized by weights
W
and biases
b
. The supervised learning
267
that minimizes the inner product of the difference between the true
ψ
and the approximated
ˆ
ψ
for
N
number
268
of data samples can be written as269
W0,b0=argmin
W,b 1
N
N
∑
i=1
ψi−ˆ
ψi
2
2!, (8)
where
ψi=ψ(Ei)
and
ˆ
ψi=ˆ
ψ(Ei)
accordingly. While this approach could reduce the discrepancy of the
270
predicted and true free energy values—if the energy data are available, minimizing the energy discrepancy
271
does not guarantee that the stress predictions are accurate.272
In principle, calculating the Helmholtz freeenergy from the detailed atomistic conﬁgurations is possible
273
[
44
]. However, in this work, our focus is on the cases where we have no direct access to sufﬁcient Helmholtz
274
freeenergy data. In the following numerical experiments on
β
HMX, we instead only use a reference
275
conﬁguration as well as the stress data collected from multiple deformed conﬁgurations to reconstruct the
276
elastic stored energy. Consequently, we introduce two trained neural networks that takes a proper strain
277
measure as input and output the elastic stored energy. The Sobolev training then attempts to adjust the
278
weights and biases of the neurons such that the derivatives of the stored energy matches the stress measures
279
conjugate to the input strain measure. In other words, we show that it is possible to have labels used for the
280
training that are different than the input and output of the neural network. This ﬂexibility is proven to be
281
useful for tasks in which not all data are necessary available or of sufﬁcient ﬁdelity.282
Training and validation of ML anisotropic constitutive law for βHMX 9
4.1 Sobolev constraints for the hyperelastic energy functional283
To introduce a hyperelasticity model suitable to incorporate into numerical solvers for boundary value
284
problems, the accuracy, stability, robustness, smoothness, and uniqueness of the hyperelasticity responses
285
are all important. Unlike neural networks that directly generate stress predictions, a hyperelasticity model
286
is required to be sufﬁciently smooth and differentiable to avoid discontinuity in the predicted stress and
287
elastic tangent [24,25,22].288
4.1.1 Hyperelastic energy functional ˆ
ψ(E)289
Consider the storedenergy functional solely constructed with the following data.290
1.
A reference conﬁguration where the Green strain tensor equals to
Eref
with the corresponding second
291
PiolaKirchhoff stress Sref and reference energy ψref;292
2.
A set of second PiolaKirchhoff stress
Si
, i=1,2, ..., N calculated from Cauchy stress measured at
N293
number of deformed conﬁgurations inferred from MD simulations.294
The corresponding loss function reads,295
W0,b0=argmin
W,b
wψref
ψref −ˆ
ψref
2
2+wSref
Sref −∂ˆ
ψ
∂EE=Eref
2
2
+wS
N
N
∑
i=1
Si−∂ˆ
ψ
∂EE=Ei
2
2
, (9)
where
ψref =ψ(Eref)
and
ˆ
ψref =ψ(Eref)
are the true and approximated values of the energy functional at
296
strain
E0
,
N
is the number of nontrivial stress data points, and
wψref
,
wSref
and
wS
are the weighting factors
297
for the multiobjective optimization. In this work, we use the conﬁguration at (
300 K
,
1 atm
) as the reference
298
conﬁguration and we assume that this conﬁguration is undeformed such that Eref =0.299
4.1.2 Hyperelastic energy functional ¯
ψ(F)300
Another feasible option is to directly train the energy functional that related to the
P−F
pair. The drawback
301
of this option is the more complex training, owning to the fact that the deformation gradient
F
is a twopoint
302
tensor that is not necessarily symmetric. Hence, both the dimensions of the labels for the supervised training
303
increased. Furthermore, it is also necessary to introduce additional training step to ensure material frame
304
indifference which could be avoid if invariants or
E
is used as the input [
43
]. However, if such a training is
305
successful, the Hessian of this energy functional may give us the tangential stiffness tensor corresponding
306
to the
P−F
pair. The bases of this tangential stiffness tensor makes it easy to incorporate into the linearized
307
system of equation for a total Lagrangian ﬁnite element solver, without requiring any additional algebraic308
operations to pullback or pushforward between conﬁgurations. As such, this option is provided here. Here,
309
we assume that the data provided in Section 4.1.1 is provided and the identical reference condﬁguration is
310
used. The corresponding loss function for the energy conjugate pair P−Fhyperelastic model is:311
W0,b0=argmin
W,b
w¯
ψref kψref −¯
ψrefk2
2+wPref
Pref −∂¯
ψ
∂FF=Fref
2
2
+wP
N
N
∑
i=1
Pi−∂¯
ψ
∂FF=Fi
2
2
, (10)
where
¯
ψref =ψ(Fref)
and
¯
ψref =ψ(Fref)
are the true and approximated values of the energy functional at
312
strain
E0
,
N
is the number of nontrivial stress data points, and
wψref
,
wPref
and
wP
are the weighting factors
313
for the multiobjective optimization.314
10 Nikolaos N. Vlassis et al.
4.1.3 Transfer learning to enforce frame indifference for ¯
ψ(F)315
A hyperelastic model described by the conjugate pair
P−F
tensors is expected to satisfy the frame invariance
316
conditions described in Eq.
(3)
. To ensure that the frame invariance is preserved during training, we reuse a
317
previously trained neural network but modifying the loss function by introducing a number
L
of random
318
rotations
Ql
,
l=
1, 2, ...,
L
and penalizing the violation of the objectivity for a randomly selected subsample
319
of size Lfrom the initial training sample pool by adding the following weighted objectives:320
wψ1
L
L
∑
l=1
ˆ
ψ(QlFl)−ˆ
ψ(Fl)
+wP
1
L
L
∑
l=1
ˆ
P(QlFl)−Qlˆ
P(Fl)
2
2
+wC
1
L
L
∑
l=1
ˆ
A(QlFl)−QlQlˆ
A(Fl)
2
F,
(11)
where
ˆ
P
,
ˆ
A
are the neural network approximated stress and elastic stiffness tensors respectively and
wC
is
321
a weight for the multiobjective minimization. Note that, this additional step is not necessary for energy
322
functional ψ(E)since (QF)TQF =FTQTQF =C, for any rotation tensor Q∈SO(3).323
4.2 Transfer learning to enforce crystal symmetries324
The monoclinic unit cell of the single crystal
β
HMX in the
P
2
1/n
space group setting is shown in Figure 4.
325
The covariant crystal basis vectors
M1
,
M2
, and
M3
represent the crystal axis in the crystal conﬁguration as
326
shown in Figure 4.327
α
γ
β
M1 = [100]
M2= [010]
M3M3= [001]
a
b
c
c*
e1e2
e3
Fig. 4: Monoclinic unit cell of
β
HMX in the
P
2
1/n
space group setting. The lattice constants are
a=6.53 ˚
A
,
b=11.03 ˚
A
,
c=7.35 ˚
A
,
α=γ=90◦
, and
β=102.689◦
(at
295 K
) [
45
]. The Miller indices are associated
with the monoclinic crystal directions, while the vectors
e1
,
e2
, and
e3
denote the basis vectors of the global
Cartesian coordinate system.
Note that, under the monoclinic material symmetry as shown in Figure 4, the crystal structure renders
328
2fold rotational symmetry. The crystal structure remains unchanged when the unit cell is rotated
180◦
with
329
respect to [010]. Therefore, the symmetry group of this monoclinic unit cell reads330
VQ={QQ=exp kπspn(M2)
kM2k,k∈Z}. (12)
Here, the inﬁnitesimal rotation map and the ﬁnite rotation map are deﬁned as [46],
spn(θ) = −ε·θ, exp [spn(θ)]=I+sin(θ)
θspn(θ) + 1−cos(θ)
θ2spn(θ)2,
where εis the third order permutation tensor and θ=kθkis the rotation angle.331
Consider two elastic deformations of the crystal,
F
and
F+
, where
F
is an arbitrary deformation and
332
F+=FQ,∀Q∈VQ. The material symmetry of the βHMX crystal requires that333
ψ(F+) = ψ(F),P+=∂ψ
∂FF+=PQ,A+
aBcD =AaMc N QMB QND ,∀Q∈VQ, (13)
Training and validation of ML anisotropic constitutive law for βHMX 11
where ψis the elastic free energy, P+and Pare the ﬁrst PiolaKirchhoff stress tensors evaluated at F+and334
F, and A+and Aare the elastic stiffness tensors, such that ˙
P=A:˙
Fand ˙
P+=A+:˙
F+.335
To ensure that the crystal symmetry is preserved, we reuse the previously trained functions
(10)
and
336
(11)
, and modify the loss function by introducing
M
number of rotations
Qm∈VQ
,
m=
1, 2, ...,
M
based on
337
the material symmetry type, which serves as the penalty to the violation of the material symmetry to a
N338
number of samples:339
M
∑
m=1 wψ1
N
N
∑
i=1
ˆ
ψ(FiQm)−ˆ
ψ(Fi)
+wˆ
P
1
N
N
∑
i=1
P(FiQm)−ˆ
P(Fi)Qm
2
2
+wC
1
N
N
∑
i=1
ˆ
A(FiQm)−ˆ
A(Fi)QmQm
2
F!.
(14)
5 Posttraining validation of the predicted elastic tangential operators340
In this section, we introduce numerical tests to determine whether the predicted constitutive responses are
341
thermodynamically admissible, preserve the symmetry, and lead to unique and stable elastic responses. A342
subset of these criteria is required to constitute a correct constitutive law (e.g. material frame invariance),
343
while others such as the convexity and the strong ellipticity are not necessary conditions but are desirable
344
properties for stability and uniqueness of the boundary value problem. While in principle many of these
345
physics constraints/laws can be incorporated into the loss function in the supervised learning process,
346
putting all the constraints explicitly into the loss function is not necessarily always ideal, as the multiple
347
constraints may alter the landscape of the loss function and thus complicate the search for the optimal
348
energy functional [47].349
As such, our goal is to introduce a suite of necessary conditions which the learned hyperelasticity consti
350
tutive law must fulﬁll. These necessary conditions, along with the fact that the hyperelasticity constitutive351
law must be capable of generating predictions within a threshold error, are necessary but not sufﬁcient to
352
guarantee the safety of using the machine learning model for highconsequence highrisk predictions (such
353
as those for explosives).354
5.1 Mapping between ﬁnite and inﬁnitesimal kinematics355
To examine the admissibility of the hyperelasticity model and compare the ﬁnite strain model with other
356
published results based on the inﬁnitesimal strain assumption, the connections among the tangents of
357
different energyconjugate pairs are provided below for completeness. Here our ﬁrst goal is to obtain an
358
underlying smallstrain tangent of the ﬁnitestrain counterpart by using the logarithmic and exponential
359
mappings, such that the elasticity tensors predicted here and those from the literature can be compared.
360
Recall that the logarithmic elastic strain ecan be deﬁned as [27],361
e=ln U2=1
2ln C, , (15)
where
U
is the rightstretch tensor and
C
is the right CauchyGreen strain tensor. The smallstrain elastic
362
tensor Cσ−ecan be obtained from the chain rule,363
Cσ−e=∂σ
∂e =∂σ
∂S:∂S
∂E:∂E
∂e=1
2
∂σ
∂S:∂S
∂E:∂C
∂e=1
2
∂σ
∂S:∂S
∂E:∂exp 2e
∂e, (16)
where
σ=J−1F·S·FT
is the Cauchy stress. To compute the smallstrain elasticity tensor, one ﬁrst rewrites
364
Eq. (15) in an inﬁnite series representation,365
C=exp 2e=
∞
∑
n=0
1
n!(2e)n. (17)
12 Nikolaos N. Vlassis et al.
As such, the Cartesian component of the derivative ∂C/∂ereads [48],366
[∂C
∂e]ijkl =
∞
∑
n=1
2n
n!
∞
∑
m=1
[em−1
ik ][en−m
lj ]. (18)
Notice that is a inﬁnite series. In practice, we may only include a sufﬁcient but ﬁnite number of terms in
367
Eq.
(18)
to approximate the partial derivative
∂C/∂e
. Convergence studies and benchmark data on inﬁnite
368
series representation can be found in Ortiz et al. [
49
]. An alternative representation based on spectral
369
decomposition is also possible (cf. Miehe [50]) but is out of the scope of this paper.370
The ﬁrst tangential tensor
CP−F
can be related to the second derivative of the hyperelastic energy
371
functional ψ(E),372
CP−F=∂P
∂F=∂S
∂E·F·F·g+S⊗δ, (19)
where
g
is the metric tensor. For the Cartesian coordinate system used in our training loss function, the
373
indice notation of the metric tensor is simply
gij =δij
.This expression is derived from Marsden and Hughes
374
[
9
] (see page 215), where we simply use the chain rule to link the tangents
∂S/∂E
with
∂S/∂C
. Note that this
375
tensor corresponds to the ﬁrst PiolaKirchhoff stress and the deformation gradient, and does not possess
376
minor symmetry.377
In both Eq.
(16)
and Eq.
(19)
, the derivative
∂S/∂E
is obtained from the neural elastic stored energy,
378
while the rest of the terms can be obtained via either analytical solution or automatic differentiation.379
5.2 Strong ellipticity380
While many works are dedicated to training neural network to predict elastic responses of solids [
51
,
52
,
22
,
381
53
,
54
,
55
,
24
,
25
], surprisingly few among these analyze the stability and uniqueness of the learned neural
382
network constitutive laws or provide any evidence of the wellposedness for the trained model. Recent work
383
by Dominik et al [
43
] address this issue by enforcing the polyconvexity of hyperelastic energy functional
384
via invariants (cf. Hartmann and Neff [
56
]). Note that the onset of the loss of strong ellipticity does not
385
necessarily indicate that the learned elastic energy functional is erroneous. Rather, it is considered as an
386
indicator for the onset of materials instability or failure [
9
,
57
,
58
]. Physically, the loss of the strong ellipticity
387
may also lead to the vanished wave propagation speed [
9
]. As such, our focus here is not necessarily on
388
preventing the loss of strong ellipticity but rather on the search the points at which the onset of loss of strong
389
ellipticity may occur to provide more interpertable physical insight on the stability of the βHMX crystal.390
Consider
A
to be the acoustic tensor corresponding to
CPF
and that
CPF
is the elastic tangential operator
391
for the energy conjugate pairs (P,F), that is,392
A(N) = N·CPF ·N(20)
The LegendreHadamard condition requires that for any pair of vectors
N
and
m
, the following condition
393
holds:394
m·A·m≥0, (21)
where
N
is a Lagrangian unit vector and
m
is an Eulerian vector. Because we assume that
β
HMX is a
395
Greenelastic material, the necessary and sufﬁcient conditions for strong ellipticity are (cf. Ogden [
10
] page
396
392)397
Aii (N)>0, i∈ {1, 2, 3}(22)
Aii (N)Ajj(N)−Aij(N)2>0, j6=i∈ {1, 2, 3}(23)
det A(N)>0 (24)
for any
N∈R3
. Notice that the material response is nonlinear and the acoustic tensor may vary according
398
to the Eulerian vector
m
. A simple way to ensure the conditions
(22)

(24)
are satisﬁed is to create the
399
worstcase scenario, that is, ﬁnd the inﬁmum, and the unit vectors
N
that minimize
Aii (N)
,
Aii (N)Ajj(N)−400
Training and validation of ML anisotropic constitutive law for βHMX 13
Aij (N)2
, and
det A(N)
accordingly and check whether the three terms remain positive. Depending on the
401
parameterization, the corresponding minimization problems can be written as402
f(q) = Aii (N(q)), argmin
q
f(q),N(q)∈S2(25)
g(q) = Aii (N(q))Ajj (N(q)) −Aij (N(q))2, argmin
q
g(q),N(q)∈S2(26)
d(q) = det A(N(q)), argmin
q
d(q),N(q)∈S2, (27)
where
q
represents a parametrization of the unit vector
N(q)
. Mota et al [
59
] provide a comprehensive
403
review of how different parameterizations, namely the spherical, stereographic, projective and tangent
404
parameterizations, may lead to different local mininizers of the acoustic tensor in the parametric space. For
405
spherical parameterization, a unit vector
N
is an element of the unit sphere
S2
which can be parameterized
406
by the spherical coordinates, that is, the polar angle φ∈[0, π]and the azimuthal angle θ∈[0, π]:407
N(φ,θ) = sin φcos θe1+sin φsin θe2+cos φe3, (28)
where {e1,e2,e3}is the the orthogonal basis for R3.408
To ensure stability for any given admissible deformation, we must ensure that Eqs.
(22)

(24)
are valid
409
for any
F
. While this can be, in principle, determined analytically for handcrafted energy functionals,
410
the expression of the neural network energy functional would likely be too complicated to analyze. As
411
such, we again resort to constructing a test to check the hypothesis that the material demonstrates strongly
412
ellicipticity, via an attempt to ﬁnd the minima, that is,413
f0(q,F) = Aii (N(q),F), argmin
q,F
f0(q,F),N(q)∈S2,F∈GL+(3)(29)
g0(q,F) = Aii (N(q),F)Ajj (N(q),F)−Aij (N(q),F)2, argmin
q,F
g0(q),N(q)∈S2,F∈GL+(3)(30)
d0(q,F) = det A(N(q),F), argmin
q,F
d0(q,F),N(q)∈S2,F∈GL+(3). (31)
It is impossible to test all the possible deformation gradients in the MD simulations while maintaining the414
path independence of the constitutive responses, so we instead construct a test where we only consider a
415
range of possible deformation gradients and search for the minima within this range.416
The numerical strong ellipticity test is conducted via the following three steps.417
1.
We create two sets of point clouds in the parametric space with uniform spacing,
Vq={q1
,
q2
,
q3
, ....
}418
and
VF={F1
,
F2
,
F3
, ...
}
, and select the combination of
(q
,
F)
that minimizes
f0
,
g0
,
d0
. If there exist
419
other
(q
,
F)
combinations that yield a value sufﬁciently close to the minimum (say within 5% difference),
420
then the additional coordinates will be stored as the candidate position(s) for the gradientfree search.
421
This treatment is to ensure that more local optimal points can be identiﬁed and compared and to avoid
422
the issues exhibited in Mota et al [59].423
2.
We then use the candidate position determined from the previous step as the starting point and apply a
424
gradientfree optimizer via the thirdparty gradientfree optimizer library (cf. Blanke [
60
]) to examine
425
whether we can ﬁnd new coordinates for which the functions f0(q,F),g0(q,F), and d0(q,F)are smaller426
than the candidate position(s) identiﬁed in Step 1.427
3.
If Eqs.
(22)

(24)
are not violated in the worst case obtained from Step 2, then we consider the neural
428
network functional to have passed the strong ellipticity test.429
5.3 Convexity and growth conditions430
In nonlinear elasticity in the ﬁnite strain regime, convexity is not necessary and can be overrestrictive for
431
physical phenomena that involve instability or buckling [
14
]. Nevertheless, the convexity condition has
432
to be satisﬁed to predict stable elastic responses under large deformation. The convexity condition can be
433
stated as (cf. [10]),434
14 Nikolaos N. Vlassis et al.
ψ(F0)−ψ(F)−tr(P·(F−F0)) ≥0 (32)
Because convexity is not a requirement for realistic simulations (although it might be expected for HMX),
435
we do not incorporate this criterion in the training of the neural network. However, the uniqueness and
436
stability of the elasticity model are not only important for predicting realistic elastic responses but crucial if
437
the model will be deployed as the underlying elasticity model for crystal plasticity and damage models.438
Another important condition to prevent degenerated elastic behavior is from Rosakis and Simpson [
61
]
439
which requires440
ψ(F)→∞as det F→0+. (33)
Recall that
det F→
0 only happen if the distance between two material points that was nonzero in the
441
reference conﬁguration vanishes in the current conﬁguration. Note that it is unlikely a material would
442
remain elastic if the volumetric deformation is extremely large. furthermore, enforcing these constraints
443
explicitly in the loss function is difﬁcult due to the inﬁnity. Nevertheless, the constraint may provide a
444
helpful indicator of the admissibility of the machine learning extrapolated predictions. As a result, we
445
suggest a posttraining validation test where we generate the response for deformation gradients with
det F446
approaching zero and observe whether the resultant energy is monotonically increasing.447
5.4 Material Anisotropy448
A predictive elasticity model must preserve the overall crystal symmetry while capturing how the anisotropy
449
of the elasticity tensor evolves under arbitrary deformation. The degree of anisotropy of the elastic response
450
can be measured by various metrics available in the literature (cf. [
62
,
63
,
64
]). Many of these anisotropy
451
metrics (or indices) are intended for components of the elasticity tensor. Typically, the distinction between
452
the secant and tangential elastic tensors is not taken into account. This can be confusing for materials
453
undergoing ﬁnite deformation where both material and geometrical nonlinearities play important roles
454
in the degree of anisotropy of the constitutive response. More importantly, the impacts of the former and
455
latter types of nonlinearity should be distinguished properly such that a meaningful evaluation can be
456
conducted.457
5.4.1 Ledbetter and Migliori general anisotropy index458
Here, we use the idea from previous work due to Ledbetter and Migliori [
65
], where the ratio between the
459
maximum and minimum shearwave speed is used to deﬁne a degree of anisotropy measure. Interestingly,
460
this method can also be used to detect instability as the vanishing of the slowest wave speed is accompanied
461
by divergence of the LedbetterMigliori index.462
This measure can be easily extended to the ﬁnite strain regime by replacing the inﬁnitesimal elasticity
463
tangent with the elasticity tensor corresponding to the ﬁrst PiolaKirchhoff stress and deformation gradient
464
[10]. This idea can be summarized into the following steps.465
1. Generate as many unit vectors Nas possible.466
2. Solve the Christoffel equation for each unit vector N, that is,467
det N·C(F)·N−ρv2I=0 (34)
3. Pick the largest solution v2and the smallest solution v1. Then, the anistropy index is simply468
AI=v2
2/v2
1(35)
Here, instead of a Monte Carlo search, we can leverage the search formulated in Section 5.2 to obtain the
469
smallest eigenvalue
v1
and largest eigenvalue
v2
of the acoustic tensor. Again, the optimization is conducted
470
by using a uniformly spaced point cloud to search for the initial guess, then a gradientfree optimizer is
471
used to ﬁnd the normal vectors that maximize and minimize v.472
Training and validation of ML anisotropic constitutive law for βHMX 15
6 Results473
In this section, we discuss the performance of neural network models for discovering the hyperelastic
474
energy functional from the
β
HMX MD simulation data. We describe the training setup of the networks
475
and compare the performance of the architectures. We then demonstrate the predictive capabilities of the
476
models against the present MD simulation data and elastic constants taken from the literature for the same
477
MD force ﬁeld used here. Finally, we investigate the energy functional models in terms of how well they
478
satisfy desired properties from the hyperelasticity literature.479
6.1 Training performance and learning capacity480
In this section, we discuss the performance of the neural network architectures for the Sobolev constraints481
described in Section 4. We ﬁrst demonstrate how we trained the neural networks to generate a hyperelastic
482
energy functional data from the MD simulation data. We use two different architectures to discover the
483
hyperelastic energy functional for
β
HMX. The ﬁrst architecture is based on the energy conjugate pair
S−E484
(Model
M1
). The input and output variables are symmetric tensors and, thus, can be described by six
485
components. The second architecture is based on the energy conjugate pair
P−F
(Model
M2
). In addition,
486
we also retrain Model
M2
with an additional material frame indifference constraint (Eq.
(11)
)in the loss
487
function (model
M3
). As the difference in the predictions obtained from Models
M2
and
M3
is minor, we
488
did not enforce the Eq.
(11)
explicitly in the the last model we trained (Model
M4
). Instead, only monoclinic
489
symmetry is enforced as an additional term for the weighted loss function in the retraining step to ensure
490
that the material symmetry is preserved.491
Table 2: Summary of the trained models.
Model
Description
M1
Energy conjugate pair
S−F
model trained via the loss function described
in Eq. 9.
M2
Energy conjugate pair
P−F
model trained via the loss function described
in Eq. 10.
M3
Energy conjugate pair
P−F
model trained with pretrained model
M2
and additional loss function Eq.
(11)
to enforce material frame indiffer
ence.
M4
Energy conjugate pair
P−F
model trained pretrained model
M2
and
additional loss function Eq. (14) to enforce monoclinic symmetry.
The energy functional neural networks have a feedforward architecture consisting of a hidden dense
492
layer (100 neurons / ReLU), followed by two multiply layers (cf. Vlassis and Sun [
25
]), then another hidden
493
dense layer (100 neurons / ReLU), and ﬁnally an output dense layer (Linear). The training and validation
494
procedures of the neural network are implemented in Python with machine learning libraries Keras [
66
] and
495
Tensorﬂow [
67
]. The kernel weight matrix of the layers was initialized with a Glorot uniform distribution
496
and the bias vector with a zero distribution.497
In total, 233,430 data points are generated from 15 MD simulations. As described in Section 2.3, this data
498
set includes multiple loading scenario such as uniaxial compressive, tensile, shear and biaxial compressive
499
cases. This set of MD data is partitioned randomly into two subsets that are mutually exclusively to each
500
other. 70% of data (163,400 data points) are used to train the neural network energy functional, while 30% of
501
data (70,030 data points), which we refer to unseen data herein, are used to crossvalidate the results. All the
502
models were trained for 1000 epochs with a batch size of 512, using the Nadam optimizer [
68
] initialized
503
with default values in the Keras library.504
16 Nikolaos N. Vlassis et al.
100101102103
Epoch
10−6
10−5
10−4
10−3
10−2
Loss
Stress Training Loss
100101102103
Epoch
10−22
10−19
10−16
10−13
10−10
10−7
10−4
Loss
ψoTraining Loss
100101102103
Epoch
10−10
10−9
10−8
10−7
10−6
10−5
10−4
10−3
Loss
So/PoTraining Loss
(a) (b) (c)
102
Epoch
10−6
10−5
10−4
10−3
10−2
Loss
Stress Training Loss
Model M1Training Loss
Model M1Validation Loss
Model M2Training Loss
Model M2Validation Loss
Fig. 5: Comparison of the training loss curves for the energy conjugate pair
S−E
model (
M1
) and the
energy conjugate pair
P−F
model (
M2
) for (a) the stress, (b) the energy, and (c) stress value at the state of
zero strain.
The loss function training curves for the architectures
M1
and
M2
are demonstrated in Fig. 5. The two
505
architectures appear to have similar accuracy so they will be used interchangeably below. The predictive
506
capabilities of M1and M2are further demonstrated in Section 6.2.1.507
100101102103
Epoch
10−8
10−7
10−6
10−5
10−4
Loss
Frame invariance energy constraint Training Loss
100101102103
Epoch
10−6
10−5
10−4
Loss
Frame invariance stress constraint Training Loss
(a) (b)
Model M2Training Loss
Model M2Validation Loss
Model M3Training Loss
Model M3Validation Loss
Fig. 6: Comparison of the training loss curves for (a) the energy and (b) stress frame invariance constraints
for the energy conjugate pair
P−F
model (
M2
) without any additional constraints in the loss function and
the energy conjugate pair
P−F
model (
M3
) trained with the additional frame invariance constraint loss
function Eq. (11).
To check and, if necessary, enforce the frame invariance of the neural network hyperelastic models as
508
described in Section 4.1.3, we conduct a transfer learning experiment by retraining the neural network
509
model
M2
. We ﬁrst train the energy conjugate pair
P−F
model (
M2
) for 1000 epochs without any frame
510
invariance constraints in the loss function (i.e., Eq.
(10)
). We record the frame invariance metrics during
511
training by applying random rotation
Q
tensors on the input deformation gradient tensors and examine
512
whether the material response is frame invariant; that is, whether the predicted energy remains the same
513
before and after rotation and whether the predicted stress tensor rotates accordingly. The trained model
514
M2
is then retrained with the additional frame invariance constraints in Eq.
(11)
for another 1000 epochs
515
(model
M3
). The comparison of the training curves for
M2
and
M3
is shown in Fig. 6. Model
M2
appears
516
Training and validation of ML anisotropic constitutive law for βHMX 17
to already satisfy well the frame invariant properties, with the additional constraints of model
M3
mostly
517
improving the frame invariance energy constraints.518
100101102103
Epoch
10−9
10−8
10−7
10−6
10−5
10−4
10−3
Loss
Symmetry energy constraint Training Loss
100101102103
Epoch
10−8
10−7
10−6
10−5
10−4
Loss
Symmetry stress constraint Training Loss
(a) (b)
Model M2Training Loss
Model M2Validation Loss
Model M4Training Loss
Model M4Validation Loss
Fig. 7: Comparison of the training loss curves for (a) the energy and (b) stress symmetry constraints for the
energy conjugate pair
P−F
model (
M2
) without any symmetry constraints in the loss function and the
energy conjugate pair
P−F
model (
M4
) trained with the additional symmetryconstraint loss function
Eq. (14).
We also perform a transfer learning experiment by retraining the neural network model
M2
to ensure
519
it retains the observed
β
HMX crystal symmetries as described in Section 4.2. We ﬁrst train the energy
520
conjugate pair
P−F
model (
M2
) for 1000 epochs without any symmetry constraints in the loss function
521
and record the symmetry metrics during training. By applying a rotation
Qsym
on the input deformation
522
gradient tensors, we check for the material response to retain the expected monoclinic symmetry behavior.
523
The check includes the constraints up to the ﬁrstorder derivatives of the network. The trained model
M2
is
524
then retrained with the additional symmetry constraints in Eq.
(14)
for another 1000 epochs (model
M4
).
525
The results for the two training experiments are shown in Fig. 7, where the additional symmetry constraints
526
appear to be improving both the energy and the stress symmetry constraints.527
Remark 1.
Rescaling of the training data
. As a preprocessing step, we have normalized all data to avoid
528
the vanishing or exploding gradient problem that may occur during the backpropagation process [
69
]. The
529
Xisample of a measure Xis scaled to a unit interval via,530
Xi:=Xi−Xmin
Xmax −Xmin
, (36)
where
Xi
is the normalized sample point.
Xmin
and
Xmax
are the minimum and maximum values of the
531
measure
X
in the training data set such that all different types of data used in this paper (e.g. strain, stress,
532
etc) are all normalized within the range
[
0, 1
]
. After scaling all the measures involved in the training of the
533
neural networks to the unit interval, it is noted that no further ﬁnetuning of the multiobjective weight
534
parameters that are present in the loss functions is necessary for convergence.535
6.2 Validation of the constitutive responses536
In this section, we validate the neural network predicted constitutive response against MD simulation data
537
as well as
β
HMX elastic coefﬁcients from the literature. We also monitor the learned physical properties for
538
the trained models, such as the strong ellipticity, the energy growth, and the anisotropy index.539
18 Nikolaos N. Vlassis et al.
6.2.1 Validation against unseen MD simulations540
We validate the predictive performance of the learned models against unseen MD simulation loading paths.
541
The neural network architectures considered in this section are the energy conjugate pair
S−E
model (
M1
)
542
and the energy conjugate pair P−Fmodel (M2).543
−0.10 −0.05 0.00 0.05 0.10
E11
−7
−6
−5
−4
−3
−2
−1
0
1
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10
E22
−12
−10
−8
−6
−4
−2
0
2
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.10 −0.05 0.00 0.05 0.10
E33
−6
−5
−4
−3
−2
−1
0
1
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
(a) (b) (c)
Fig. 8: Comparison of the predicted 2nd PiolaKirchhoff stress response against three uniaxial deformation
MD simulations for the conjugate pair
S−E
model (
M1
). (a) Uniaxial compressive and tensile deforma
tion along the
x1
axis. (b) Uniaxial compressive and tensile deformation along the
x2
axis. (c) Uniaxial
compressive and tensile deformation along the x3axis.
.
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E12
−1.0
−0.5
0.0
0.5
1.0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E23
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15
E13
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
Sij (∂ψ
∂Eij ) (GPa)
S11
S22
S33
S12
S23
S13
(a) (b) (c)
Fig. 9: Comparison of the predicted 2nd PiolaKirchhoff stress response against three shear MD simulations
for the conjugate pair
S−E
model (
M1
). (a) Shear tests for positive and negative directions along the
e1⊗e2
direction. (b) Shear tests for positive and negative directions along the
e2⊗e3