ArticlePDF Available

Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening


Abstract and Figures

We introduce a deep learning framework designed to train smoothed elastoplasticity models with interpretable components, such as stored elastic energy function, field surface, and plastic flow that may evolve based on a set of deep neural network predictions. By recasting the yield function as an evolving level set, we introduce a deep learning approach to deduce the solutions of the Hamilton-Jacobi equation that governs the hardening/softening mechanism. This machine learning hardening law may recover any classical hand-crafted hardening rules and discover new mechanisms that are either unbeknownst or difficult to express with mathematical expressions. Leveraging Sobolev training to gain control over the derivatives of the learned functions, the resultant machine learning elastoplasticity models are thermody-namically consistent, interpretable, while exhibiting excellent learning capacity. Using a 3D FFT solver to create a polycrystal database, numerical experiments are conducted and the implementations of each component of the models are individually verified. Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than those obtained from black-box deep neural network models such as the recurrent neural network, the 1D convolutional neural network, and the multi-step feed-forward models.
Content may be subject to copyright.
Computer Methods in Applied Mechanics and Engineering manuscript No.
(will be inserted by the editor)
Sobolev training of thermodynamic-informed neural networks for
interpretable elasto-plasticity models with level set hardening
Nikolaos N. Vlassis ·WaiChing Sun
Received: December 22, 2020/ Accepted: date
Abstract We introduce a deep learning framework designed to train smoothed elastoplasticity models
with interpretable components, such as stored elastic energy function, ield surface, and plastic flow that
may evolve based on a set of deep neural network predictions. By recasting the yield function as an evolv-
ing level set, we introduce a deep learning approach to deduce the solutions of the Hamilton-Jacobi equa-
tion that governs the hardening/softening mechanism. This machine learning hardening law may recover
any classical hand-crafted hardening rules and discover new mechanisms that are either unbeknownst or
difficult to express with mathematical expressions. Leveraging Sobolev training to gain control over the
derivatives of the learned functions, the resultant machine learning elastoplasticity models are thermody-
namically consistent, interpretable, while exhibiting excellent learning capacity. Using a 3D FFT solver to
create a polycrystal database, numerical experiments are conducted and the implementations of each com-
ponent of the models are individually verified. Our numerical experiments reveal that this new approach
provides more robust and accurate forward predictions of cyclic stress paths than those obtained from
black-box deep neural network models such as the recurrent neural network, the 1D convolutional neural
network, and the multi-step feed-forward models.
Keywords Sobolev training; multiscale; polycrystals; isotropic yield function, recurrent neural network,
physics-informed constraints
1 Introduction
Plastic deformation of materials is a history-dependent process manifested by irreversible and perma-
nent changes of microstructures, such as dislocation, pore collapses, growth of defects, and phase transi-
tion. Macroscopic constitutive models designed to capture history-dependent constitutive responses can
be categorized into multiple families, such as hypoplasticity, elastoplasticity, and generalized plasticity
(Pastor et al.,1990;Zienkiewicz et al.,1999;Wang et al.,2019a,2020). For example, hypoplasticity models
often do not distinguish the reversible and irreversible strain (Dafalias,1986;Kolymbas,1991;Wang et al.,
2016). Unlike the classical elastoplasticity models where the plastic flow is normal to the stress gradient of
the plastic potential and the evolution of it is governed by a set of hardening rules (Rice,1971;Hill,1998;
Sun,2013;Bryant and Sun,2019), hypoplasticity models do not employ a yield function to characterize the
initial yielding. Instead, the relationship between the strain rate and the stress rate is captured by a set of
evolution laws originated from a combination of phenomenological observations and physics constraints.
Interestingly, the early design of neural network models such as Ghaboussi et al. (1991); Furukawa and
Yagawa (1998); Pernot and Lamarque (1999); Lefik et al. (2009), would often adopt this approach with a
purely supervised learning strategy to adjust the weights of neurons to minimize the errors. Using the
Corresponding author: WaiChing Sun
Associate Professor, Department of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail
Code: 4709, New York, NY 10027 Tel.: 212-854-3143, Fax: 212-854-6267, E-mail:
2 Nikolaos N. Vlassis, WaiChing Sun
strain from current and previous time steps to estimate the current stress, these models would essentially
predict the stress rate without utilizing a yield function and, hence, can be viewed as hypoplasticity models
with machine learning derived evolution laws. The major issue that limits the adaptations of neural net-
work constitutive models is the lack of interpretability and the vulnerability to over-fitting. While there are
existing regularization techniques such as dropout layers (Wang and Sun,2018), cross-validation (Heider
et al.,2020;Vlassis et al.,2020), and/or increasing the size of the database that could be helpful, it re-
mains difficult to assess their credibility without the interpretability of the underlying laws deduced from
the neural network. Another approach could involve symbolic regression through reinforcement learning
(Wang et al.,2019a) or genetic algorithms (Versino et al.,2017) that may lead to explicitly written evolution
laws, however, the fitness of these equations is often at the expense of readability.
Another common approach to predict plastic deformation is the classical elasto-plasticity model where
an elasticity model is coupled with a yield function that evolves with a set of internal variables that rep-
resents the history of the materials. Within the framework of the classical elasto-plasticity model – where
constitutive models are driven by the evolution of yield surface and the underlying elastic model, there has
been a significant number of works dedicated to refining the initial shapes and forms of the yield functions
in the stress space (e.g. Mises (1913); Prager (1955); William and Warnke (1974)) and the corresponding
hardening laws that govern the evolution of these yield functions with the plastic strain (e.g. Drucker
(1950); Borja and Amies (1994); Taiebat and Dafalias (2008); Nielsen and Tvergaard (2010); Foster et al.
(2005); Sun et al. (2014)). Generalized plasticity (Pastor et al.,1990;Zienkiewicz et al.,1999;Wang et al.,
2019a) bypasses the usage of both the stress gradient of the plastic potential or the yield surface to predict
plastic flow direction. Instead, an additional phenomenological relation is deduced from experiments to
predict the plastic flow direction as a function of internal variables. In both cases, there are several key
upshots brought by the existence of the yield function. For instance, the existence of a yield function facil-
itates the geometric interpretation of plasticity and, therefore, enables us to connect mechanics concepts,
such as the thermodynamic law, with geometric concepts, such as convexity in the principal stress space
(Miehe et al.,2002;Borja,2013;Vlassis et al.,2020). Furthermore, the existence of a distinctive elastic region
in the stress or strain space also allows introducing a multi-step transfer learning strategy. In this case, the
machine learning of the elastic responses can be viewed as a pre-training step for the plasticity machine
learning where predicted elastic responses can be used to determine the underlying split of the elastic
and plastic strain upon the initial yielding and, hence, allows one to deduce more accurate hardening and
plastic flow rules.
1.1 Why Sobolev training for plasticity
Recently, there have been attempts to rectify the limitations of machine learning models that do not
distinguish or partition the elastic and plastic strain. Xu et al. (2020), for instance, introduce a smooth
transition function to create a finite transition zone between the elastic and plastic ranges for an incremental
constitutive law generated from supervised learning.
Previous works on machine learning plasticity are often black-box models where strain history is used
as input to predict the stress via a deep neural net trained with a pure supervised learning strategy Lefik
and Schrefler (2003); Zhang et al. (2020); Bessa et al. (2017). Since the rationale behind the predictions is
stored in the weights of the neurons, it is difficult to interpret. To circumvent this limited interpretability,
previous work such as Mozaffar et al. (2019) and later Zhang and Mohr (2020) introduce machine learning
techniques to deduce the yield function and subsequently deduce the optimal linear or distortion harden-
ing mechanism that minimizes the L2norm of the yield function discrepancy. Wang et al. (2019a), on the
other hand, views the possibility of building different plasticity models as a directed multi-graph and intro-
duces a reinforcement learning algorithm to deduce the optimal configuration of plasticity models among
all the available options (e.g. isotropic, kinematic, rotation, and frictional hardening, isotropic/anisotropic
elasticity) to generate fully interpretable plasticity models. Nevertheless, these previous approaches are not
capable of deducing new hardening/softening mechanisms previously unbeknownst to modelers. This re-
search has three goals: (1) creating interpretable machine learning elastoplasticity models, (2) formulating
a new learning strategy that enables the discovery of new softening/hardening mechanisms, and (3) lever-
aging the interpretability of the models to enforce thermodynamic laws to make predictions compatible to
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 3
universal principles of mechanics. Sobolev training plays an important role for us to achieve these goals by
ensuring the regularity of the energy functionals and level set functions. By adopting the Lode’s coordinate
system to simplify the parametrization, we introduce a simple training program that can generate accurate
and robust stress predictions but also yield the elastic energy and elasto-plastic tangent operator that is
sufficiently smooth for numerical predictions – one of the technical barriers that prevented the adoption of
neural network models since their inception in the 90s (Hashash et al.,2004).
1.2 Organization of the content and notations
The organization of the rest of the paper is as follows. We first provide a detailed account of the different
designs of Sobolev higher-order training introduced to generate the elastic stored energy functional, yield
function, flow rules, and hardening models in their corresponding parametric space. The setup of con-
trol experiments with other common alternative black-box models is then described. We then demonstrate
how to leverage this new design of an interpretable machine learning framework to analyze the thermo-
dynamic behavior of the machine learning derived constitutive laws while illustrating the geometrical
interpretation of the proposed modeling framework. A brief highlight for the adopted return mapping al-
gorithm implementation that leverages automatic differentiation is provided in Section 3, followed by the
numerical experiments and the conclusions that outline this work’s major findings.
As for notations and symbols in this current work, bold-faced letters denote tensors (including vectors
which are rank-one tensors); the symbol ’·’ denotes a single contraction of adjacent indices of two tensors
(e.g. a·b=aibior c·d=cijdjk ); the symbol ‘:’ denotes a double contraction of adjacent indices of tensor
of rank two or higher ( e.g. C:ee=Cijklee
kl ); the symbol ‘’ denotes a juxtaposition of two vectors (e.g.
ab=aibj) or two symmetric second-order tensors (e.g. (ab)ijkl =aijbkl). Moreover, (ab)ijkl =
ajl bik and (ab)ijkl =ailbjk. We also define identity tensors (I)ij =dij,(I4)ijkl =dikdjl, and (I4
sym)ijkl =
2(dikdjl +dildkj), where dij is the Kronecker delta. As for sign conventions, unless specified otherwise, we
consider the direction of the tensile stress and dilative pressure as positive. Unless otherwise specified, the
strain measure listed in this paper is not expressed in percentage.
2 Framework for Sobolev training of elastoplasticity models
This section presents the framework to train the multiple deep neural networks that predict the elastic
stored energy functional, the yield function, and the plastic flow that will constitute the machine learn-
ing elastoplasticity model. We first discuss the neural network architecture designed to enable the learned
functions with the necessary degree of continuity for functional predictions (Section 2.1). We then intro-
duce a number of loss functions designed to optimize hyperelastic responses of materials with different
material symmetries (Section 2.2). Finally, we elaborate on the theoretical basis that enables us to treat
yield function as a signed distance function level set, formulate a supervised learning task that deduces
the learned evolving yield function against a monotonically increasing accumulated plastic strain, and
discuss the simplified implementation allowed by the material symmetry. The method that leverages an
interpretable design to enforce and validate the thermodynamic constraints and the generation of the aux-
iliary data points of the signed distance function level set to enhance the robustness of the learned yield
functions are also discussed (Section 2.3).
2.1 Neural network design for Sobolev training of a smooth scalar functional with physical constraints
Here we will provide a brief account of the design of the neural network suitable for Sobolev training
of a scalar functional that requires a specific minimum degree of continuity. The design presented here
will facilitate the specific supervised learning tasks formulated for an elasticity energy functional, a yield
function, and a plastic flow, which are documented in Sections 2.2 and 2.3. Our goal is to obtain a learned
function belonging to a Sobolev space, a space of function possessing sufficient derivatives for our model-
ing purposes. To facilitate the Sobolev training, we must ensure that (1) the space of the learned function
4 Nikolaos N. Vlassis, WaiChing Sun
is spanned by the basis functions that possess the sufficient degree of continuity and (2) the loss function
in the supervised learning is associated with the norm equipped for the corresponding Sobolev space. To
meet the first criterion, one must employ activation functions with sufficient differentiability. In principle,
this can be easily achieved by picking activation functions of a sufficient degree of continuity (e.g. sigmoid
and hyperbolic tangent activation functions). However, neuron layers stacked with layers with these ac-
tivities functions are often suffering from the vanishing and exploding gradient problems and, hence, not
suitable for our purpose (Wang et al.,2019b;Roodschild et al.,2020). Meanwhile, the Rectified Linear Unit
(ReLU) activation function is usually deployed instead as the default choice for multilayer perceptron and
convolutional neural networks due to the generally good performance and faster learning capacity. To at-
tain the required degree of continuity while circumventing the vanishing gradient problem, we introduce
a simple technique which we refer to as Multiply Layers. A Multiply layer receives the output vector hn1
of the preceding layer as input and simply outputs hn, such that,
hn=Multiply(hn1)=hn1hn1, (1)
where is the element-wise product of two vectors. These layers are placed in between two hidden dense
layers of the network to modify the output of the preceding layer. This technique enables us to create
neural networks that produce a learned function of an arbitrary degree of continuity without introducing
any additional weights or handcrafting any custom activation functions.
The performance of these different neural networks that complete the higher-order Sobolev training is
demonstrated in the numerical experiments showcased in Section 5.1. Several variations of the standard
two-layer architecture we experimented with are shown in Fig. 1.
Architecture A: ddd Architecture B: dmdd Architecture C: dmdmd Architecture D: dmmdmd
Fig. 1: Four neural network architectures (A-D) with different combinations of dense and multiply layers
– the number of Multiply layers increases from left to right. The letters d and m represent the Dense and
Multiply layers respectively that form an architecture (e.g. dmdd represents the stacked layer structure
consisting of Dense Multiply Dense Dense layers).
Remark 1 The placement and the number of intermediate Multiply layers are hyper-parameters that can
be fine-tuned along with the rest of the hyperparameters of the neural network (e.g. dropout rate, number
of neurons per layer, number of layers). The tuning of these hyperparameters can be performed manually
or through automatic hyperparameter tuning algorithms (cf. Bergstra et al. (2015); Komer et al. (2014)).
2.2 Sobolev training of a hyperelastic energy functional
The first component of the elastoplastic framework we train is the elastic stored energy ye. The elasticity
energy functional is not only useful for predicting the stress from the elastic strain, but can also be used to
re-interpret the experimental data to identify the accumulated plastic strain and the plastic flow directions
that are crucial for the training of the yield function and the plastic flow. In the infinitesimal strain regime,
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 5
the hyperelastic energy functional ye(ee)R+can be defined as a non-negative valued function of elastic
infinitesimal strain of which the first derivative is the Cauchy stress tensor sSand the Hessian is the
tangential elasticity tensor ceM:
eeee, (2)
where Sis the space of the second-order symmetric tensors and Mis the space of the fourth-order tensors
that possess major and minor symmetries (Heider et al.,2020). The true hyperelastic energy functional
yeof the material is approximated by the neural network learned function b
ye(ee|W,b)with the elastic
strain tensor eeas the input, parametrized by weights Wand biases bobtained from a supervised learning
To guarantee the quality and even the existence of the stress and elastic tangent operators stemmed
from the learned energy function, we adopt a Sobolev training framework, introduced by Czarnecki et al.
(2017), and extend it to higher-order derivative constraints. By leveraging the differentiability achieved by
the Multiply layers and adopting an H2norm as the training objective, we introduce an alternative that
renders the neural network model applicable for implicit solvers while eliminating the potential spurious
oscillations of the tangent operators. The new training objective for the hyperelastic energy functional
approximator b
yeincludes constraints for the predicted energy, stress, and stiffness values. This training
objective, modeled after an H2norm, for the training samples i[1, ..., N]would have the following form:
A. (3)
In the above formulation, the conventional L2norm-based training objective can be obtained by setting
parameters g2=g3=0. An H1norm-based training objective can be recovered from Eq. (3) by setting
g3=0. An H1norm loss function for a hyperelastic energy functional can be seen in Vlassis et al. (2020).
In this work, we generate elasto-plastic models that are practical for implicit solvers. This, however,
was considered a difficult task in the earlier attempts on using a neural network as a replacement for
constitutive laws (cf. Hashash et al. (2004)) where the proposed solution would be to either bypass the
calculation of a tangent with an explicit time integrator or to introduce finite differences on the stress
2.2.1 Simplified training for isotropic elasticity
In this work, our primary focus is on small strain isotropic hyperelasticity which can completely be
described in spectral form by the principal strain and stress values (without the principal directions). Thus,
for isotropic infinitesimal hyperelasticity, the H2training objective of Eq. (3) for the training samples i
[1, ..., N]can be rewritten in terms of principal values as:
2⌘⌘, (4)
where ee
Afor A=1, 2, 3 are the principal values of the elastic strain tensor ee. The approximated energy
functional for this training objective is a function of the three input principal strains and not of a full second-
order tensor of 6 input components (reduced from 9 by assuming symmetry). This effectively reduces the
input parametric space of the learned function and facilitates learning by minimizing complexity.
6 Nikolaos N. Vlassis, WaiChing Sun
2.2.2 Simplified training for two-invariant elasticity
The training objective can further be simplified to two input variables by adopting an invariant space,
commonly used in geotechnical studies when the intermediate principal stress does not exhibit a dominat-
ing effect on the elastic response or when the intermediate principal stress is not measured at all due to
limitations of the experiment apparatus (Wawersik et al.,1997;Haimson and Rudnicki,2010). In this case, a
small-strain isotropic hyperelastic law can equivalently be described with two strain invariants (volumetric
strain ee
v, deviatoric strain ee
s). The strain invariants are defined as:
v=tr (ee),ee
v1, (5)
where eeis the small strain tensor and eethe deviatoric part of the small strain tensor. Using the chain rule,
the Cauchy stress tensor can be described in the invariant space as follows:
ee. (6)
In the above, the mean pressure pand deviatoric (von Mises) stress qcan be defined as:
2s, (7)
where sis the deviatoric part of the Cauchy stress tensor. Thus, the Cauchy stress tensor can be expressed
by the stress invariants as:
3qbn, (8)
see. (9)
The H2training objective of Eq. (4) for the training samples i[1, ..., N]can now be rewritten in terms
of the two strain invariants:
2⌘⌘, (10)
11 =2y
22 =2y
, and De
12 =De
21 =2y
. (11)
To aid the implementation by a third party, a pseudocode for the supervised learning of the isotropic
hyperelastic energy functional is provided (see Algorithm 1).
Remark 2 Rescaling of the training data. In every loss function in this work, we have introduced scal-
ing coefficients gato remind the readers that it is possible to change the weighting to adjust the relative
importance of different terms in the loss function. These scaling coefficients may also be viewed as the
weighting function in a multi-objective optimization problem. In practice, we have normalized all data to
avoid the vanishing or exploding gradient problem that may occur during the back-propagation process
(Bishop et al.,1995). As such, normalization is performed before the training as a pre-processing step. The
Xisample of a measure Xis scaled to a unit interval via,
Xmax Xmin
, (12)
where Xiis the normalized sample point. Xmin and Xmax are the minimum and maximum values of the
measure Xin the training data set such that all different types of data used in this paper (e.g. energy, stress,
stress gradient, stiffness) are all normalized within the range [0, 1].
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 7
Algorithm 1 H2training of a neural network hyperelastic energy functional for isotropic material.
Require: Data set of Nsamples: strain measures ee, energy measures ye, stress measures s, and stiffness
measures ce.
1. Project data samples onto invariant space
Initialize empty set of training samples ee
12,ifor iin [0, ..., N].
for i in [0,...,N]do
Compute ee
12,i, given ee
i(see Sec. 2.2.1).
Rescale ee
iinto ee
via Eq. (12).
end for
2. Train neural network b
s)with loss function Eq. (10).
3. Output trained energy functional b
yeneural network and exit.
2.3 Training of an evolving yield function as a level set
This section introduces the theoretical framework that regards the evolution of a yield surface as a
level set evolution problem. To facilitate a geometrical interpretation without the burden of generating a
large database, we restrict our attention to construct a yield function that remains pressure-insensitive but
may otherwise evolve in any arbitrary way on the p-plane, including moving, expanding, contracting, and
deforming the elastic region. The goal of the supervised learning task is to determine the optimal way the
yield function should evolve such that it is consistent with the observed experimental data collected after
the plastic yielding and obeying the thermodynamic constraints that can be interpreted geometrically in
the stress space.
2.3.1 Reducing the dimension of the data by leveraging symmetries
Here, we provide a brief review of the geometrical interpretation of the stress space and how it can be
used to reduce the dimensions of the data representation and reduce the difficulty of the machine learning
tasks. In this work, we consider a convex elastic domain Edefined by a yield surface f. This yield function
is a function of Cauchy stress sand the internal variable xthat represent the history-dependent behavior
of the material, i.e., (cf. Borja (2013)),
ldt, (13)
where xis a monotonically increasing function of time and ˙
lis the rate of change of the plastic multiplier
where ˙ep=˙
l∂g/sand gis the plastic potential. The yield function returns a negative value in the elastic
region and equals zero when the material is yielding. The stress on the boundary f(s,x)=0 is, therefore,
the yielding stress and all admissible stress belong to the closure of the elastic domain, i.e.,
E:={(s,x)S×R1|f(s,x)0}. (14)
First, we assume that the yield function depends only on the principal stress. This treatment reduces
the dimension of the stress representation from six to three. Then, we assume that the plastic yielding is
not sensitive to the mean pressure. As such, the shape of the yield surface in the principal stress space can
be sufficiently described by a projection on the p-plane and, hence, further reduce the dimensions of the
independent stress input from three to two. To further simplify the interpolation of the yield surface, we
introduce a polar coordinate system on the p-plane such that different monotonic stress paths commonly
obtained from triaxial tests can be easily described via the Lode’s angle.
Recall that the p-plane refers to a projection of the principal stress space based on the space diagonal
defined by s1=s2=s3. More specifically, the p-plane is defined by the equation:
s1+s2+s3=0. (15)
8 Nikolaos N. Vlassis, WaiChing Sun
Fig. 2: The principle stress space (LEFT) and the corresponding p-plane (RIGHT). The p-plane is per-
pendicular to the space diagonal and is passing through the origin of the principal stress axes. Figure
reproduced from Borja (2013).
The transformation from the original stress space coordinate system (s1,s2,s3)to the p-plane can be
decomposed into two specific rotations Rand R00 of the coordinate system (cf. (Borja,2013)) such that,
;=RR 8
2/2 0 2/2
2/2 0 2/2 3
10 0
02/3 1/3
01/32/3 3
;. (16)
For pressure-insensitive plasticity, s
3is not needed, as the principal stress differences are a function of
1and s
2and are independent of s
3. We opt to describe the stress states of the material on the p-plane
using two stress invariants, the polar radius rand the Lode’s angle q(Lode,1926). These invariants are
derived by solving the characteristic equation of the deviatoric component sSof the Cauchy stress
tensor, following (Borja,2013):
s3J2sJ3=0, (17)
where sis a principal value of s, and
3tr(s3), (18)
are respectively the second and third invariants of the tensor s. Utilizing the identity:
cos3q3/4 cos q1/4 cos 3q=0, (19)
and writing sin polar coordinates such that:
s=rcos q, (20)
and substituting in (17), the polar radius and the Lode’s angle, can be retrieved as:
r=2pJ2/3, and cos 3q=33J3
. (21)
In terms of the p-plane coordinates s
1and s
2, the Lode’s coordinates rand qcan be respectively written
2, and tan q=s
. (22)
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 9
Thus, for an isotropic pressure-independent plasticity model, the yield surface can equivalently be
described by an approximator using either the principal stresses s1,s2, and s3or the stress invariant r,
and, qsuch that:
f(s1,s2,s3,x)= b
f(r,q,x)=0. (23)
Fig. 3: The procedure to generate the polycrystal plasticty dataset on the p-plane. Three simulations are
first performed to locate the yielding points along the principal axes (LEFT). An initial trial elastic convex
region is a triangle formed by connecting these three points (MIDDLE). The rest of the simulations are then
performed outside of this triangle to refine the shape of the yield surface. The p-plane is split into splices of
equal arc length and data points are sampled radially (RIGHT). The initial yielding point for each loading
path is located once yielding is detected in the DNS.
In this case, both the isotropy of the yield function and the symmetry along the hydrostatic axis sig-
nificantly ease the training process. First, the material symmetry reduces the dimensionality of the data
and, hence, reduces the demand for data points (Heider et al.,2020;Wang and Sun,2018). Second, the
geometrical interpretation of yield function on the p-plane provides clear guidance for more effective data
exploration. In our numerical examples, the material database constitutes recorded constitutive responses
obtained from direct numerical simulations of polycrystal assembles. The p-plane is partitioned by Lode’s
angles where each partitioned angle is assigned a stress path that moves toward the radial direction on
the pplane. By assuming the convexity of the yield surface, we can identify an initial path-independent
elastic region by locating the initial yielding point at each of the prescribed stress paths (see Fig. 3).
2.3.2 Data preparation for training of the yield function as a level set
Identifying the set of stress at which the initial plastic yielding occurs is a necessary but not sufficient
condition to generate a yield surface. In fact, a yield surface f(s,x)must be well-defined not just at f=0
but also anywhere in the product space S×R1. Another key observation is that, in order for the yield
surface to function properly, the value of f(s,x)inside and outside the yield surface may vary, provided
that the orientation of the stress gradient remains consistent. For instance, consider two classical J2yield
f1(s,x)=p2J2k0; f2(s,x)=pJ2k/20. (24)
These two models will yield identical constitutive responses except that, in each incremental step, the
plastic multiplier deduced from f1is 2 times smaller than that of f2, as the stress gradient of f1is 2
times larger than that of f2. With these observations in mind, we introduce a level set approach where the
yield surface is postulated to be a signed distance function level set and the evolution of the yield function
is governed by a Hamilton-Jacobi equation that is not solved but generated from a supervised learning via
these following steps.
10 Nikolaos N. Vlassis, WaiChing Sun
1. Generate auxiliary data points to train the signed distance function yield function. In this first step,
we first attempt to construct a signed distance function fin the stress space where the internal variable
is fixed on a given value, i.e. x=xwhen yielding. Let Wbe the solution domain of the stress space
of which the signed distance function is defined. Assume that the yield function can be sufficiently
described in p-plane. For simplicity, we will adopt the polar coordinate system to parametrize the
signed distance function fthat is used to train the yield surface , i.e.,
x(s11,s22,s33,s12,s23,s13)=x(s1,s2,s3)=bx(r,q). (25)
Fig. 4: Illustration of the relationship between the boundary of Edefined at fG(yield surface) (LEFT) and
the level set yield function defined everywhere in the stress space (RIGHT).
The signed distance function (see, for instance Figure 4) is defined as
x)outside fG(inadmissible stress)
0 on fG(yielding)
x)inside fG(elastic region)
, (26)
where d(b
x)is the minimum Euclidean distance between any point xof Wand the interface fG={bx
R2|f(bx)=0}, defined as:
d(bx)=min (|bxbxG|). (27)
where xGis the yielding stress for a given x. The signed distance function is obtained by solving the
Eikonal equation |bxf|=1 while prescribing the signed distance function as 0 at xfG. In the polar
coordinate system, the Eikonal equation reads,
∂r )2+1
∂q )2=1. (28)
Note there is a singularity at the polar coordinate of the pplane at r=0 and, hence, the origin point is
not used as an auxiliary point to train the yield function. The Eikonal solution could be simply solved
by a fast marching solver in 2D polar coordinates as well. It is noted that the selection of the signed
distance function to be used as the level set representation is not limiting. Other level set functions
are expected to fulfill the same purpose. However, the signed distance function was chosen for its
simplicity to implement.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 11
(a) J2 yield function (b) Tresca yield function with back stress
(c) Argyris et al. (1974) yield function (d) Custom yield function inferred from polycrystal
Fig. 5: The inferred yield surface (red curves) and the auxiliary data points (blue dots) obtained by solving
the level set re-initialization problem that generates the signed distance function (blue contours) for four
yield surfaces. The yield function f(x,t)is converted into a signed distance function with the contour
f(x,t)=0 fixed. The isocontour curves represent the projection of the signed distance function level set
on the p-plane.
Figure 5shows several examples of signed distance functions converted for classical yield surfaces or
deduced from direct numerical simulations.
12 Nikolaos N. Vlassis, WaiChing Sun
2. Obtain the velocity function to constitute the Hamilton-Jacobi hardening of the yield function Af-
ter we generate a sequence of signed distance functions for different x, we might introduce an inverse
problem to obtain the velocity function for the Hamilton-Jacobi equation that evolves the signed dis-
tance function. Recall that the Hamilton-Jacobi equation may take the following forms:
t+v·bxf=0, (29)
where vis the normal velocity field that defines the geometric evolution of the boundary and, in the
case of plasticity, is chosen to describe the observed hardening mechanism. The velocity field is given
v=Fn, (30)
where Fis a scalar function describing the magnitude of the boundary change and n=bxf/|bxf|.
Using f·f=|f|2in Eq. (29), the level set Hamilton-Jacobi equation for stationary yield function
can be simplified as,
t+F|bxf|=0. (31)
Note that tis a pseudo-time. Since the snapshot of fwe obtained from Step 1 remains a signed distance
function, then |bxf|=1. Next, we replace the pseudo-time twith x. Assuming that the experimental
data collected from different stress paths are collected data points Ntimes beyond the initial yielding
point, each time with the same incremental plastic strain Dl, then Step 1 will provide us a collection of
signed distance function {f0,f1, ...., fn+1}corresponding to {x0,x1, ...., xn+1}. Then, the corresponding
velocity function can be obtained via finite difference, i.e.,
, (32)
where Fi(r,q)=F(r,q,xi)and i=0, 1, 2, ..., n+1. By setting the signed distance function that fulfills
Eq. 31 as the yield function, i.e., f(r,q,x)=f(r,q,x), we may use experimental data generated from
the loading paths demonstrated in Fig. 3to train a neural network that predicts a new yield function
or velocity function for an arbitrary x(see Figure 6).
More importantly, we show that the evolution of the yield function can be modeled as a level set evolv-
ing according to a Hamilton-Jacobi equation. This knowledge may open up many new possibilities to cap-
ture hardening without any hand-crafted treatment. To overcome the potential cost to solve the Hamilton-
Jacobi equation, we will introduce a supervised learning procedure to obtain the updated yield function
for a given strain history represented by the internal variable xwithout explicitly solving the Hamilton-
Jacobi equation (see the next sub-sections). Consequently, this treatment will enable us to create a generic
elasto-plasticity framework that can replace the hard-crafted yield functions and hardening laws without
the high computational costs and the burden of repeated modeling trial-and-errors.
2.3.3 Training a yield function with associative plastic flow
Assuming an associative flow rule, the generalized Hooke’s law utilizing a yield function neural net-
work approximator b
fcan be written in rate form as:
s=ce: ˙e˙
s!. (33)
And in incremental form, the predictor-corrector scheme is written as:
, (34)
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 13
Fig. 6: The evolution of the yield surface fGis connected to a level set f(x,x)extension problem. The
velocity field vof the Hamilton-Jacobi equation (31) emulates the material hardening law. The yield surface
evolution and the velocity field are inferred from the data through the neural network training.
n+1:ee tr
n+De. (35)
The strain and stress tensor predictors can be written in the spectral form as follows:
n+1. (36)
The predictor-corrector scheme can be rewritten in spectral form, omitting the subscript (n+1)
A=1 3
AB b
By assuming that the plastic flow obeys the normality rule, we may use the observed plastic flow from
the data to regularize the shape of the evolving yield function. To do so, we will leverage the fact that
we have already obtained an elastic energy functional from the previous training. The plastic deforma-
tion mode can then be obtained by the difference between the trial and the true Cauchy stress at each
incremental step where the data are recorded in an experiment or direct numerical simulations, i.e.,
AB fB,
Dlfor A=1, 2, 3.
At each incremental step of the data-generating simulations, we know the total strain and total stress.
Knowing the underlying hyperelastic model, we can utilize an inverse mapping to estimate the elastic
strain from the current total stress and hence determine the plastic strain.
14 Nikolaos N. Vlassis, WaiChing Sun
The quantities f1,f2,f3correspond to the amount of plastic flow in the principal directions A=1, 2, 3.
A neural network approximator of the yield function should have adequately accurate stress derivatives
that are necessary for the implementation of the return mapping algorithm, discussed in Section 3, and
so as to provide an accurate plastic flow, in the case of associative plasticity. The normalized plastic flow
direction vector fnorm can be defined as
fnorm =b
f3, (40)
and holds information about the yield function shape in the p-plane.
In the case of the simple MLP feed-forward network, the network can be seen as an approximator func-
tion b
f(r,q,x|W,b)of the true yield function level set fwith input the Lode’s coordinates r,q, and
the hardening parameter x, parametrized by weights Wand biases b. A classical training objective, fol-
lowing an L2norm, would only constrain the predicted yield function values. The corresponding training
objective is to minimize the discrepancy measured at Nnumber of sample points (bx,x)S×R1reads,
W,b 1
2!, (41)
where fi=f((bxi,xi)and b
fi=f((bxi,xi). A second training objective can be modeled after an H1norm,
constraining both fand its first derivative with respect to the stress state s1,s2,s3. For a neural network
aprroximator parametrized as f=f(s1,s2,s3,x|W,b)using the principal stresses as inputs, this training
objective for the training samples i[1, ..., N]would have the following form:
A. (42)
Utilizing an equivalent representation of the stress state with Lode’s coordinates in the p-plane, the
above training objective can further be simplified. The normalized flow direction vector fnorm in Lode’s
coordinates can solely be described using an angle qfsince the vector has a magnitude equal to unity. To
constrain the flow direction angle, we modify the loss function of this higher-order training objective by
adding a distance function metric between two rotation tensors Rq,i,Rb
q,i, corresponding to qf,iand b
qf,i– the
flow vector directions in the p-plane for the data and approximated yield function respectively for the i-th
sample. The two rotation tensors belong to the Special Orthogonal Group, SO(3) and the metric is based
on the distance from the identity matrix. For the i-th sample, the rotation related term can be calculated as:
=s23tr Rq,iRb
q,iT, (43)
where ·Fis the Frobenius norm. For a neural network approximator parametrized via the Lode’s coor-
dinates as input, i.e. b
f(r,q,x|W,b), the Sobolev training objective for the training samples i[1, ..., N]
W,b 1
2+g9Fi!, (44)
where we minimize both the discrepancy of the yield function and the direction of the gradient in the stress
Remark 3 Discrete data points for the yield function. Note that the training of the yield function involves
not just the points at f(r,q)=0 but also the new auxiliary data generated from the re-initialization of
the level set/yield function. Strictly speaking, the accuracy of the elasto-plastic responses only depends
on how well the boundary of the admissible stress range f(r,q)=0 is kept track of. However, knowing
the yield function values inside and outside the admission range is helpful for evolving the yield function
with sufficient smoothness. To emphasize the importance of data across f(r,q)=0, we may introduce a
higher weighting factor of these data points for Eq. (44).
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 15
Algorithm 2 Training of a pressure independent isotropic yield function level set neural network.
Require: Data set of Nsamples: stress measures sat yielding and accumulated plastic strain ep,aLlevels
number of levels (isocontours) for the constructed signed distance function level set (data augmentation),
and a parameter z>1 for the radius range of the constructed signed distance function.
1. Project stress onto p-plane
Initialize empty set of p-plane projection training samples (ri,qi)for iin [0, ..., N].
for i in [0,...,N]do
Spectrally decompose si=Â3
Transform (s1,i,s2,i,s3,i)into s
3,ivia Eq. (16).
end for
2. Construct yield function level set (data augmentation)
Initialize empty set of augmented training samples (rm,qm,ep,m,fm)for min [0, ..., N×Llevels ].
for i in [0,...,N]do
for jin [0,...,Llevels]do
Llevels ri.the signed distance function is constructed for a radius range of [0, zri]
Llevels riri.the signed distance function value range is [ri,(z1)ri]
Rescale (rm,qm,ep,m,fm)into (rm,qm,ep,m,fm)via Eq. (12).
end for
end for
3. Train neural network b
f(rm,qm,ep,m)with loss function Eq. (41).
4. Output trained yield function b
fneural network and exit.
2.3.4 Training yield function and non-associative plastic flow
Here, we present the training for the plastic flow without assuming that the plastic flow follows the
normality rule. As such, the yield function and the plastic flow must be trained separately. We adopt the
idea of generalized plasticity in which the plastic flow direction is directly deduced from the Sobolev
training of a neural network on the experimental data (Zienkiewicz et al.,1999).
Firstly, the yield function training is similar to the associative flow cases, except that the terms that
control the stress gradient of the yield function cannot be directly obtained from the plastic flow due to the
non-associative flow rule. Nevertheless, the stress gradient of the yield function may still be constrained
by the convexity (if there is no intended phase transition that requires non-convexity of the yield function).
Recall that the convexity requires
s0, (45)
where sis an arbitrary stress. One necessary condition we can incorporate as a thermodynamic constraint
is a special case where we simply set s=0, as such we obtain,
fA0. (46)
16 Nikolaos N. Vlassis, WaiChing Sun
One way to enforce that constraint is to apply a penalty term for the loss function in Eq. (41), .e.g,
wnnp sign(
fA), (47)
where this term will not be activated if the learned yield function is obeying the convexity. However, the
sign operator may lead to a jump of the loss function, which is not desirable for training. As a result, a
regularized Heaviside step function can be used to replace the sign operator, for instance,
fA), (48)
where kcontrols how sharp the transition is at sAb
fA=0. As shown in our numerical experiments, this
additional term may not be required if the raw experimental data itself does not violate the thermody-
namic restriction. To obtain a plastic flow, we again obtain the flow information incrementally from the
experimental data via the following equations, i.e.,
Dlfor A=1, 2, 3.
We can gather the plastic flow information by post-processing the simulation data, similar to Equa-
tion (39). Once the plastic flow gAis determined incrementally for different x, we then introduce another
supervised learning that reads,
W,b 1
i=1g10 gA,ib
2!. (50)
The non-negative plastic work is the thermodynamic constraint that requires ˙
Wp=sep0. The
corresponding incremental form for isotropic material reads,
gA0. (51)
Notice that the stress beyond the initial yielding point satisfies the yield function f=0. As a result, this
inequality can be recast as an additional term for the loss function that trains the yield function (Eq. (44))
such that
wnnp sign(Dl
gA), (52)
where wnnp is the penalty parameter. Notice that when the non-negative plastic work is fulfilled dur-
ing the training of neural network, the penalty term would not be activated and will not affect the back-
propagation step. Furthermore, if the yield function is convex and the flow rule is associative, this con-
straint is always fulfilled and, hence, not necessary. This constraint, however, should be helpful to regulate
the relationships of the yield function and plastic flow when we intend to train the plastic flow direction
independent of the stress gradient of the yield function.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 17
Algorithm 3 Return mapping algorithm in strain-space in principal axes for an isotropic hyperelastic-
plastic model
Require: Hyperelastic energy functional b
yeneural network (see Algorithm 1) and yield function b
network (see Algorithm 2).
1. Compute trial elastic strain
Compute ee tr
Spectrally decompose ee tr
A=1ee tr
2. Compute trial elastic stress
Compute str
Aat ee tr
3. Check yield condition and perform return mapping if loading is plastic
if b
Set sn+1=Â3
Antr(A)ntr(A)and exit.
Solve for ee
3, and xn+1such that b
Compute sn+1=Â3
Antr(A)ntr(A)and exit.
end if
3 Implementation highlights: return mapping algorithm with automatic differentiation
Here, we provide a review of the implementation of a fully implicit stress integration algorithm used
for the proposed Hamilton-Jacobi hardening framework. For isotropic materials where the elastic strain
and stress are co-axial, the stress integration can be done via spectral decomposition as shown in Alg. 3.
An upshot of the proposed method is that there is only a small modification necessary to incorporate the
Hamilton-Jacobi hardening and the generalized plasticity.
In this current work – unless otherwise stated, all the necessary information for the return mapping
algorithm about the elastic and plastic and constitutive responses is derived from the trained neural net-
works of the hyperelastic energy functional and yield function respectively using the Keras (Chollet et al.,
2015) and Tensorflow (Abadi et al.,2016) libraries. No additional explicit forms of constitutive laws are
defined. Furthermore, the algorithm requires that all the strain and stress variables are in the principal
axes. However, as it was stated in Section 2, in order to facilitate the machine learning algorithms, we
have opted to train with the strain invariants ee
vand ee
s, and the stress invariants rand q. Integrating the
machine learning algorithms with the return mapping algorithm requires a set of coordinate system trans-
formations, which, in turn, require the calculation of the partial derivatives of said transformations to use
in the chain rule formulation. The partial derivative calculation is performed using the Autograd library
(Maclaurin et al.,2015) for automatic differentiation.
Autograd enables the automatic calculation of the partial derivatives of explicitly defined functions.
Thus, we can easily define the transformation of any input parameter space for our neural networks to the
principal space and readily have the necessary partial derivatives for the chain rule implementation. This
allows to use equivalent expressions of our neural network approximators b
s)and b
f(r,q,x)in the
principal space, such that:
s)= b
principal (ee
3)and b
fprincipal (ee
3,x). (53)
In this work, integrating the neural network approximators in the return mapping requires the fol-
lowing coordinate system transformations (ee
s)←→ (ee
3),(r,q)←→ (s1,s2,s3),s
(s1,s2,s3), and s
2←→ (r,q). These transformations require a large number of chain rules increas-
ing the possibility of formulation errors, as well as rendering replacing the networks’ input space less
flexible. Thus, we opt for the automation of this process using Autograd.
Due to the fact that the machine learning training has created a mapping that automatically generates
an updated yield function whenever the internal variables xare updated, there is no need to add additional
constraints for the linearized hardening rules. The return mapping algorithm can be described with a
18 Nikolaos N. Vlassis, WaiChing Sun
system of four equations that are solved iteratively. For a local iteration k, we solve for the solution vector
xsuch that Ak·Dx=rxk,xk+1xkDx,kk+1 until the residual norm ris below a set
error threshold. The residual vector rand the local tangent Akcan be assembled for the calculation of xby
a series of neural network evaluations and automatic differentiations, such that:
c11 c12 c13 b
c21 c22 c23 b
c31 c32 c33 b
5, and x=8
where ee tr
Iis the trial state principal strain, b
f/∂sIfor an associative flow rule and:
cIJ =dIJ +Dlb
,I,J=1, 2, 3. (55)
This framework is also readily available to implement in finite element simulations (Section 5.5). We
can assemble the algorithmic consistent tangent cn+1in principal axes for a global Newton iteration n:
Am(AB)m(AB)+m(AB)m(BA), (56)
where m(AB)=n(A)n(B)t he matrix of elastic moduli in principal axes is given as:
aAB :=∂sA
. (57)
Utilizing Tensorflow and Autograd, the return mapping algorithm is fully generalized for any isotropic
hyperelastic and yield function data-driven constitutive laws. It also allows for quick implementation of
any parametrization of the neural network architectures. In future work, the framework can be extended
to accommodate anisotropic responses, as well as architectures with complex internal variables and higher
descriptive power.
4 Alternative comparison models for control experiments
In this section, we will briefly review some simple black-box neural network architectures that can be
employed to predict the path-dependent plasticity behaviors. The predictive capabilities of these behaviors
will be compared to our neural network elastoplasticity framework in Section 5.4. Three different architec-
tures will be designed for comparison with our framework: a multi-step feed-forward network, a recurrent
GRU network, and a 1-D convolutional network. All of these networks demonstrate the ability to capture
path-dependent behavior utilizing different memory mechanisms.
The first architecture is a feed-forward network that takes the current strain as well as strain and stress
from the previous time-step to predict the stress at the current time step. The feed-forward architecture
consists of fully-connected Dense layers that have the following formulation in matrix form:
dense =ah(l)W(l)+b(l), (58)
where h(l+1)
dense is the output of the Dense layer, h(l)is the output of the previous layer l,ais an activation
function, W(l),b(l)are the trainable weight matrix and bias vector of the layer respectively. It is noted
that the layer formulation itself cannot hold any memory information. The memory of path-dependence
in this architecture is derived from the input of the neural network. The input is the full strain tensor en
at time step nand the full stress tensor sn1at the previous time step (n1), both in Voigt notation. The
output prediction of the network is the stress tensor snat time step n. The network attempts to infer the
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 19
Model Description
MstepDense Dense (100 neurons / ReLU) Dense (100 neurons / ReLU) Dense (100 neurons / ReLU)
Output Dense (Linear)
MGRU GRU (32 units / tanh) GRU (32 units / tanh) Dense (100 neurons / ReLU) Dense (100
neurons / ReLU) Output Dense (Linear)
MConv1D Conv1D (32 filters / ReLU) Conv1D (64 filters / ReLU) Conv1D (128 filters / ReLU) Flatten
Dense (100 neurons / ReLU) Dense (100 neurons / ReLU) Output Dense (Linear)
Table 1: Summary of black-box neural network architectures used for control experiments.
path-dependent behavior by associating the previous stress state with the current one. The architecture
consists of three Dense hidden layers (100 neurons each) with ReLU activation functions and the output
Dense layer with a Linear activation function.
The second architecture is a recurrent network that learns the path-dependent behavior in the form
of time series. The architecture utilizes the Gated Recurrent Unit (GRU) layer formulation, a recurrent
architecture introduced in (Cho et al.,2014) – a variation of the popular Long Short Term Memory (LSTM)
recurrent architecture (Gers et al.,1999). The GRU cell controls memory information by utilizing three
gates (an update gate, a reset gate, and a current memory gate). This architecture takes a time series of
strain as the input with a history length of `. The variable `is a network hyperparameter that is fine-tuned
for optimal results and it signifies the amount of information from the previous time steps that are taken
into consideration to make a prediction for the current time step. Thus, a GRU network sample for the time
step nhas input the series of strain tensors [en`, ..., en1,en]and output the stress tensor for the current
step sn. The architecture used in this work consists of two GRU hidden layers (32 recurrent units each)
with a ReLU activation function, followed by two Dense layers (100 neurons) with ReLU activations and a
Dense output layer with a linear activation function. The history variable was set to `=20.
The last architecture employed in the control experiments learns the path-dependent information from
time series by extracting features through a 1-D convolution filter. The convolution filter extracts higher-
order features from time series of fixed length and has be used for time-series predictions (LeCun et al.,
1995) and audio processing (Oord et al.,2016). The input of this architecture is the series of strain tensors
[en`, ..., en1,en]and output the stress tensor for the current step in Voigt notation sn. The 1D convolu-
tional filter processes segments of the path-dependent time series data in a rolling window manner and
has length equal to `. The architecture consists of three 1D convolution networks (32, 64, and 128 filters
respectively) with ReLU activation functions. The output features of the last convolutional layer are flat-
tened and then fed into two consecutive Dense layers (100 neurons) with ReLU activations, followed by a
Dense output layer with a Linear activation function.
All the architectures were trained for 500 epochs with the Nadam optimizer with a batch size of 64. They
were trained on different data sets to illustrate the comparisons with our elastoplasticity framework – the
data sets are described in the context of the numerical experiments in Section 5.4. The hyperparameters
of these architectures were fine-tuned through trial and error in an effort to provide optimal results and
a fair comparison with our elastoplasticity framework to the best of our knowledge. The three black-box
architectures are summarized in Table 1.
5 Numerical Experiments
In this section, we report the results of the numerical experiments that we conducted to verify the
implementation and evaluate the predictive capacity of the presented elastoplasticity ANN framework.
For brevity, some background materials and simple verification exercises are placed in the Appendices.
In Section 5.1, we demonstrate how the training of the hyperelastic energy functional approximator can
benefit by the use of higher-order activation functions and higher-order Sobolev constraints. In Section 5.2,
we demonstrate the training of the yield function level set neural networks and their approximation of the
evolving yield functions. In Section 5.4, we compare the three recurrent architectures of Section 4to our
20 Nikolaos N. Vlassis, WaiChing Sun
elastoplasticity framework as surrogate models for a polycrystal microstructure. Finally, in Section 5.5, we
demonstrate the ability of our framework to integrate into a finite element simulation by fully replacing
the elastic and plastic constitutive models with their data-driven counterparts.
5.1 Benchmark study 1: Higher-order Sobolev training of hyperelastic energy functional
In this numerical experiment, we compare the learning capacity of feed-forward neural networks
trained on hyperelastic energy functional data. The training loss curves and the predictive capacity of
the neural networks with different configurations of Multiply layers and different order norms as loss
functions are compared. The generation of the data sets is discussed in Appendix A.
Fig. 7: Training loss comparison of feed-forward architectures with a progressively larger number of Multi-
ply layers with an H2training objective for (a) linear elasticity and (b) Modified Cam-Clay hyperelastic law
(Borja et al.,2001). As a higher degree of continuity is introduced in the network architecture, the stiffness
accuracy prediction increases - more control is allowed for the H2terms of the training objective.
The neural network models in this work are trained on two energy functionals data sets for linear elas-
ticity and non-linear elasticity (Eq. (59)) of 2500 sample points each. The points are sampled in a uniform
grid of the strain invariant space (ee
s). In the first part of this numerical experiment, we investigate the
capability of the different feed-forward architectures to fulfill the higher-order Sobolev constraints. The
layers’ kernel weight matrix was initialized with a Glorot uniform distribution and the bias vector with a
zero distribution – the repeatability of the training process is demonstrated in Fig. 28 of Appendix D. Other
than the number of intermediate Multiply layers, all the other hyperparameters are identical among all the
architectures tested. The training objective is formulated according to Eq (3). All the models were trained
for 1000 epochs with a batch size of 32 using the Nadam optimizer (Dozat,2016), set with default values
in the KERAS library.
The performance of different architectures (see Fig. 1) is compared and the results are shown in Fig. 7.
Architecture dmmdmd consistently exhibits the best learning capacity and achieves the lowest loss in
energy, stress and tangential stiffness calculation. As expected, Architecture ddd fails in predicting the
tangential stiffness due to the insufficient degree of continuity of the activation functions.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 21
Fig. 8: Training loss comparison for L2,H1, and H2training objectives of an architecture with three multiply
layers (dmmdmd) for (a) linear elasticity and (b) Modified Cam-Clay hyperelastic law (Borja et al.,2001).
The H2training objective procures more accurate results than the L2and H1objectives for all of the energy,
stress, and stiffness fields.
In the second numerical experiment, we investigate the predictive accuracy of the dmmdmd architec-
ture (as shown in Fig. 1) trained via L2,H1, and H2norms. In all three cases, the neural network archi-
tecture and the training hyperparameters are identical to the ones used in the first numerical experiment.
The results of these three training experiments can be seen in Fig. 8. The predictive capability of the model
increases when higher-order Sobolev training is utilized with the best overall scores procured for H2norm-
based training. Czarnecki et al. (2017) had observed that constraining the H1terms in the loss function
improves the function value prediction accuracy. Our results indicate that by constraining the H2terms,
we are improving the prediction of the function values along with the first-order and the second-order
derivatives of the function.
5.2 Benchmark Study 2: Training of yield function as a level set
In this numerical experiment, we demonstrate how to train the neural network to generate a yield func-
tion whose evolution is driven by signed distance function data interpreted from experiments. The yield
function neural networks have a feed-forward architecture of a hidden Dense layer (100 neurons / ReLU),
followed by two Multiply layers, then another hidden Dense layer (100 neurons / ReLU) and an output
Dense layer (Linear). The layers’ kernel weight matrix was initialized with a Glorot uniform distribution
and the bias vector with a zero distribution – the repeatability of the training process is demonstrated in
Fig. 28 of Appendix D. All the models were trained for 2000 epochs with a batch size of 128 using the
Nadam optimizer, set with default values. The neural networks were trained on a data set of J2 plasticity
as well as data sets for 4 different polycrystal RVEs as described in the Appendix B. The training loss curves
for this experiment with an L2training objective are shown in Fig 10.
The ability to capture a yield surface directly from the data becomes crucial in materials such as the
polycrystal microstructures – where complex constitutive responses may manifest from spatial hetero-
geneity and grain boundary interactions. In Fig. 11, it is shown that a polycrystal RVE of the same size
22 Nikolaos N. Vlassis, WaiChing Sun
0.03 0.02 0.01 0.00
0.03 0.02 0.01 0.00
0.03 0.02 0.01 0.00
0.04 0.02 0.00 0.02 0.04
0.04 0.02 0.00 0.02 0.04
0.04 0.02 0.00 0.02 0.04
Fig. 9: Comparison of the predictions of an L2trained ddd network with and an H2trained dmmdmd
network for the energy functional, stress, and stiffness measures of the Modified Cam-Clay hyperelastic
law (Borja et al.,2001). The ddd architecture (piece-wise linear activation functions) can only predict local
second-order derivatives (D11,D22 ) to be equal to 0. The dmmdmd architecture, modified with Multiply
layers, can capture these higher-order derivatives. The stress measure is in MPa.
Fig. 10: Training loss curves for the J2 plasticity and 4 different polycrystal RVEs’ yield function level sets.
with different crystal orientations can have distinctive initial yield surfaces. Anticipating the geometry of
the yield surface in the stress space and then handcrafting it with with mathematical expressions would
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 23
be a great undertaking and possibly futile – a change in the crystal properties would require deducing a
geometric shape design from scratch.
This numerical test indicates that the proposed framework may automate the discovery of new yield
surfaces while bypassing the time-consuming hand-crafting process. This framework may also be extended
to capture the plastic behavior of anisotropic materials if the six-dimensional stress space is used. This will
be considered in future work by expanding the stress invariant input space to include orientations and
possibly more descriptive plastic internal variables that are derived from the topology of the microstruc-
3 1
neural network prediction
sample points
Fig. 11: Yield surface neural network predictions for three polycrystal RVEs with different crystal orienta-
5.2.1 Smoothed approximation of the non-smooth yield surfaces
Another useful feature of the proposed machine learning approach is that the Sobolev training can be
used to a generate smoothed approximation of a multi-yield-surface system (e.g. Mohr-Coulomb, Tresca,
crystal plasticity with multiple slip systems) on the p-plane. Classical non-smooth and multi-yield surface
models often lead to sharp tips and corners of the yield surface that makes the stress gradient of the yield
function bifurcated. This is not only an issue for the stability but also requires specialized algorithmic
designs for the return mapping algorithm to function (cf. de Souza Neto et al. (2011)). As a result, there have
been decades of efforts to hand-derive implicit functions that are smoothed approximations of well-known
multi-yield surface models Matsuoka and Nakai (1985); Abbo and Sloan (1995)). As shown in Fig. 11, the
proposed Sobolev framework may automate this time-consuming process by simply using a combination
of data points, activation functions, and loss functions to regularize the non-smooth yield surface. The
resultant smoothed yield surface does not only avoid the bifurcated stress gradient at the corners but also
enables us to use the standard return mapping algorithm for implicit stress integrations without requiring
any additional numerical treatment to handle the tips and corners of the yield surface.
Furthermore, the ML-derived evolution of the yield function is also capable of replicating complex
hardening mechanisms not known a priori. In Fig. 12, we demonstrate how the neural network can predict
a hardening mechanism that has not yet been discovered in the literature. In particular, this yield surface
does not only change in size upon yielding but also deforms on the p-plane. Anticipating this mechanism
and then deducing the corresponding mathematical expression to handcraft a hardening law is not trivial.
24 Nikolaos N. Vlassis, WaiChing Sun
level set isocontour prediction
yield surface prediction / 1(0)
Fig. 12: Predicted signed distance function isocontours for three evolving yield surfaces of RVE 1 of a
specimen undergoing increasing axial compression (left to right).
Our framework can interpret experimental data and deduce the optimal shapes and forms of a yield sur-
face that evolves with the strain history without the aforementioned burdens. This is not only important
for making more accurate predictions but also enables us to derive more precise plasticity models tailored
to specific specimens or data sets beyond parameter calibrations.
5.3 Benchmark Study 3: Yield function training with higher-order constraints
In this numerical experiment, we demonstrate how to use the proposed training framework to enforce
thermodynamic constraints and other desirable properties of the yield function via geometrical interpre-
tation in the stress space. Our experimental data obtained from direct numerical simulations have been
pre-processed into signed distance functions that evolve according to the accumulated plastic strain. The
constraints we are interested to enforce here are the unit gradient |f|=1 and the non-negativity of the
plastic dissipation.
5.3.1 Unit gradient constraint
The unit gradient constraint has been directly applied as an additional term in the loss function and
the results of the training for 4 different RVEs with and without unit gradient |f|=1 constraints are
compared in Fig. 13. These two sets of yield function training show that the additional unit gradient con-
straint does not affect the learning capacity and the accuracy of the resultant yield function. Meanwhile,
the training that explicitly enforces the unit gradient property is also at least 3 orders more successful at
fulfilling the unit gradient constraint than the counterpart that does not do so.
5.3.2 Convexity of the yield function
To enforce the non-negative plastic dissipation, we enforce the convexity of the yield function by en-
forcing an inequality constraint via Equation (46). The additional term in the loss function only activates
when the thermodynamic inequality is violated. By increasing the value of the loss function, it penalizes
predictions that violate the convexity conditions during training. During the training phase of the numeri-
cal experiments presented in this paper, the penalty term did not activate. Nevertheless, the penalty term in
the loss function is still employed as a safeguard to prevent the violation of the thermodynamic constraints.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 25
Fig. 13: Training loss (LEFT) and re-initialization condition loss (RIGHT) comparison for four polycrystal
RVEs’ yield functions with and without enforcing the Eikonal equation unit gradient constraint.
020 40 60 80 100 120 140
Sample Points
Convexity Check
Fig. 14: Convexity check for randomly sampled stress points from the polycrystal RVE dataset.
We expect that this safeguard will be helpful in future work where we extend our current framework
to experimental data or to anisotropic materials of which the visual inspection of the convexity in the
principal stress or p-plane is no longer feasible. A verification of the convexity is performed and the results
are shown in Fig 14 where material states were randomly sampled from the polycrystal RVE database to
test whether the inequality (46) is violated.
5.3.3 Non-negative plastic dissipation for non-associative plasticity
In the case where the plastic flow is non-associative, we enforce the additional rule expressed in Eq. (52)
to ensure the non-negativity of the plastic dissipation for isotropic materials. The remaining training pa-
rameters of the network are identical to the ones used for the yield function learning in Section 5.2.To
verify that convexity is preserved, we perform a random check by sampling stress points on the p-plane
and compute the plastic dissipation (see Fig. 16 (a)). In this case, having the safeguard incorporated into
the loss function ensures that plastic dissipation is either zero or positive. Furthermore, there are noticeable
differences between the optimal plastic flow and the stress gradient of the yield function, which indicate
the need to introduce non-associative plastic flow in this prediction task (see Fig. 16 (b)).
26 Nikolaos N. Vlassis, WaiChing Sun
Fig. 15: The ML-predicted plastic flow of the polycrystal RVE for increasing accumulated plastic strain (ep
equal to 0.01, 0.03, and 0.08).
(a) (b)
Fig. 16: (a) The plastic flow rule is checked for non-negative dissipation (the points correspond to the
different accumulated plastic strain levels of Fig. 15). (b) L2norm comparison for the predicted plastic flow
direction between yield function neural network ( b
fA) and plastic flow neural network (b
5.4 Application 1: Surrogate model comparisons for polycrystal RVEs
In this example, we compare the performance of the elastoplastic model with Hamilton-Jacobi harden-
ing (introduced in Section 2) with the trained black-box models obtained from the other commonly used
recurrent neural network architectures introduced in Section 4. Our goal is to create a model to predict
the upscaled elasto-plastic responses of a polycrystal RVE undertaking unseen loading paths. The data set
generation for the neural network approximator b
fis described in Appendix B.
To conduct a fair and systematic comparison, we have trained, tested, and compared with the recurrent
models with different amounts of data generated from loading paths of various types of complexity. It
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 27
yield surface (A)
yield surface (B)
yield surface (D)
yield surface (F)
yield surface (G, H)
yield surface (I, J)
yield surface (K, L)
(a) (b)
Fig. 17: Stress path in the p-plane for (a) a loading-unloading pattern and (b) a cyclic loading path. The
yield surface neural network predicts the consecutive yield surfaces for different levels of hardening. (a)
The points A, B, C, D, and E correspond to the strain-stress curve of Fig. 19. (b) The points F, G, H, I, J , K,
and L correspond to the strain-stress curve of Fig. 20.
is noted that the database used for the training of ˆ
fwill not be extended further than the 140 cases of
monolithic loading cases described in Appendix B.
5.4.1 Predicting cyclic responses from monotonic data
Initially, we train all the recurrent architectures with the 140 cases of monolithic loading paths (with
200 to 400 deformation states per case), sampled radially from the p-plane. The same set of data is also
used to train the approximator ˆ
f. All models are expected to perform adequately well in blind predictions
for monolithic testing cases as they have been trained for these simple patterns as shown in Fig. 18 (a).
However, the black-box recurrent networks fail to recover even a single unloading and reloading path,
as seen in Fig. 18 (b) whereas our proposed interpretable model is able to recover loading and unloading
patterns quite well even though it was only trained on monolithic loading paths. This success is due to the
existence of a yield surface that enables our proposed model to detect unloading and, hence, trigger the
right elastic unloading responses, even in the absence of unloading data. While the lack of unloading data
may still affect the accuracy of the ML-generated hardening mechanism, the prediction of the proposed
ML plasticity model is still more robust than the black-box counterpart.
5.4.2 Predicting cyclic responses from cyclic data
In the second numerical experiment, we increase the complexity of the database that the recurrent neu-
ral networks are trained on. Following the same loading path angles as a basis on the p-plane, we generate
cases that now include complex unloading and reloading paths. We, thus, allow the recurrent architec-
tures to be exposed to previously missing elastic unloading paths. We randomly assign the unloading and
loading paths for every loading direction. At every direction, we randomly assign from 1 to 3 unloading
and reloading paths with the unloading target strain also randomly chosen each time. Using this method,
we double the number of sampling points of the initial cases by adding random unloading and reloading
28 Nikolaos N. Vlassis, WaiChing Sun
Fig. 18: Comparison of black-box neural network architectures trained on monolithic data with our
Hamilton-Jacobi hardening elastoplastic framework (introduced in Section 2). The black-box models can
capture the monolithic loading path (a) but cannot capture any unloading paths (b). Our framework can
capture both even though it has only seen monolithic data. The stress measure is in kPa.
patterns to retrain the recurrent architectures. The performances of all these models are again compared.
The results for three testing cases with cyclic loadings are shown in Fig. 19.
In the third comparison experiment, we examine the accuracy and robustness of the predictions of con-
stitutive responses for the polycrystal specimen subjected to cyclic loading and unloading paths that span
both tensile and compressive directions. The results for the cyclic testing can be seen in Fig. 20. As expected,
the black-box models – even with an extended training data set – fail to capture the cyclic behaviors. The
proposed ML elasto-plasticity model – even when only trained with monolithic data, exhibits very good
accuracy and robustness of predictions on the cyclic behaviors for the polycrystal plasticity.
5.5 Application 2: Finite element simulations with machine learning-derived polycrystal plasticity models
The return mapping algorithm of our elastoplastic neural network framework, described in Section 3,
is implemented in a series of benchmark finite element simulations. The goal of these computational exam-
ples is to demonstrate the framework’s ability to be integrated into multi-scale simulations. By predicting
the homogenized elastoplastic response of the microstructure, we can enable offline hierarchical multi-scale
predictions that are much faster than an FE2approach without compromising the accuracy and robustness.
We perform the finite element quasi-static simulation of macroscopic monotonic uniaxial displacement
of a bar depicted in Fig. 21. The domain is symmetric along the horizontal and vertical axes and the elas-
ticity and plasticity model used are isotropic, thus, we are modeling one quarter of the domain to predict
the symmetric behavior. The domain is meshed with 3800 triangular elements with an average side length
of 6.75 ×104meters. The displacement uis applied at the boundaries as shown in Fig. 21 in increments of
Du=5×105meters. The microscopic elastoplastic behavior of every material point in the mesh is pre-
dicted by an elastic energy functional neural network and a yield function signed distance function neural
network, integrated by the return mapping algorithm of Section 3.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 29
0.00 0.05 0.10
0.00 0.05 0.10 0.15
Fig. 19: Comparison of black-box neural network architectures trained on random loading-unloading data
with our Hamilton-Jacobi hardening elastoplastic framework. Three different cases of loading-unloading
are demonstrated (a, b, and c). The black-box models can capture loading-unloading behaviors better than
the monolithic data trained ones (Fig. 18) but still may show difficulty capturing some unseen unloading
paths. Our framework appears to be more robust in loading-unloading path predictions – even though it
is only trained on monolithic data. The stress measure is in kPa.
As the first numerical verification exercise, we combine a quadratic energy functional of linear elasticity
and a J2 plasticity yield function with isotropic hardening. The neural networks and their training for the
elastic response has been described in Section 5.1 and, for the plastic response, in Sections 5.2 and D. The
goal displacement for the uniaxial loading simulation is ugoal =5.5 ×103meters. The results at the goal
displacement for the benchmark solution and our elastoplastic Hamilton-Jacobi hardening framework are
demonstrated in Fig. 22 and appear to be in a close agreement.
In the second numerical experiment, we simulate the behavior under uniaxial loading of the domain in
which the material points represent a polycrystalline microstructure. The elastoplastic framework in this
simulation consists of a quadratic hyperelastic energy functional neural network and a polycrystal yield
function the training procedures of which are described in Sections 5.1 and 5.2 respectively. Both networks
were trained on FFT simulation data as described in Appendix Bto predict the homogenized elastoplastic
behavior of the polycrystal.
Previously, the multiscale polycrystal constitutive behavior has been captured through a coupling of
the FFT and FEM method (e.g. Kochmann et al. (2016,2018)). However, the efficiency of these methods
30 Nikolaos N. Vlassis, WaiChing Sun
0.10 0.05 0.00 0.05 0.10
Fig. 20: Comparison of black-box neural network architectures trained on random loading-unloading data
with our Hamilton-Jacobi hardening elastoplastic framework for a cyclic loading path. The stress measure
is in kPa.
10.00 6.00
u u
Fig. 21: Macroscopic structure and boundary conditions used in the finite element simulations. The domain
is symmetric along the horizontal and vertical axes so only one quarter of the domain is modeled. The units
are in mm.
Fig. 22: Von Mises stress (TOP) and accumulated plastic strain (BOTTOM) for the benchmark J2 plasticity
(LEFT) simulation and neural network J2 yield function (RIGHT) FEM simulations. The stress measure is
in kPa.
depends on the heterogeneity of the polycrystal as it can affect the computational cost of the simulations
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 31
Fig. 23: Von Mises stress (TOP) and accumulated plastic strain (BOTTOM) for the neural network polycrys-
tal yield function FEM simulations. The stress measure is in kPa.
point A
point B
Fig. 24: Von Mises stress curves (LEFT) and stress paths on the p-plane (RIGHT) for Points A and B of the
domain of Fig. 23. The stress measure is in kPa.
for a large number of crystals and the stability when there are sharp material property differences. In the
current work, there is no need for online FFT simulations to be run in parallel with the FEM simulations.
The neural network training database for the elasticity and the plasticity are built separately offline for
a discrete number of FFT simulations and the trained networks will be interpolating the behaviors and
making blind predictions during the FEM simulation. The results of the simulation for the Von Mises
stress and the accumulated plastic strain for the simulation at the displacement goal of ugoal =6.5 ×103
meters is demonstrated in Fig. 23. The stress curves and stress paths on the p-plane for two points of the
domain are also demonstrated in Fig. 24.
32 Nikolaos N. Vlassis, WaiChing Sun
6 Conclusions
The history of plasticity theory is influenced by the geometrical interpretations of mechanics concepts
in different parametric spaces (DE Saint Venant,1870;Lode,1926;Hill,1998;Rice,1971). Forming a vector
space that uses different invariants or measures of stress as orthogonal bases had helped us understand
yielding and the subsequent hardening and softening through easier visualization. However, these new
mechanisms often take decades to be discovered and adopted by the mechanics community. In this work,
our contributions are two-fold. First, we leverage the geometrical interpretation of plasticity theory to es-
tablish a connection between the elastoplasticity and the level set theories. Second, we introduce a new
variety of deep machine learning that is designed to train functionals with sufficient smoothness. By us-
ing higher-order training to regularize the continuity and smoothness of the energy functional, the yield
function, the flow rules and the hardening mechanisms, we create a framework that retains the simplicity
afforded by the geometrical interpretation of the models without limiting our choices of elasticity, yield
function and hardening mechanisms. Thermodynamic constraints can be easily checked and introduced,
as the machine learning generated models are now geometrically interpretable. Finally, the most signifi-
cant part of this research is that it provides a generalized framework where the yield function may form
in any arbitrary shape and evolve in any generic way that optimizes the quality of the predictions. As
shown in the paper, the level set framework may manifest many classical plasticity models when given
the corresponding data and it may also introduce new yield surfaces and hardening laws that are difficult
to hand-craft. Comparing with the predictions obtained via the black-box neural network the proposed
machine learning framework may yield more accurate, robust and interpretable predictions.
A Appendix: Data generation for the hyperelasticity benchmark
In this work, the numerical experiments (Section 5) are performed on synthetic data sets generated
for two small strain hyperelastic laws. One of them is isotropic linear elasticity. The second is a small-
strain hyperelastic law designed for the Modified Cam-Clay plasticity model (Roscoe and Burland,1968;
Houlsby,1985;Borja et al.,2001). The hyperelastic energy functional allows full coupling between the
elastic volumetric and deviatoric responses and is described as:
s)=p0crexp ev0ee
2cµp0exp ev0ee
s)2, (59)
where ev0is the initial volumetric strain, p0is the initial mean pressure when ev=ev0,x>0 is the elastic
compressibility index, and cµ>0 is a constant. The hyperelastic energy functional is designed to describe
an elastic compression law where the equivalent elastic bulk modulus and the equivalent shear modulus
vary linearly with p, while the mean pressure pvaries exponentially with the change of the volumetric
strain Dev=ev0ev. The specifics and the utility of this hyperelastic law is outside the scope of this current
work and will be omitted. The numerical parameters of this model where chosen as ev0=0, p0=100
KPa, cµ=5.4, and x=0.018. Taking the partial derivatives of the energy functional with respect to the
strain invariants, the stress invariants are derived as:
s)2exp ev0ee
x, (60)
=3cµp0exp ev0ee
s. (61)
The components of the symmetric stiffness Hessian matrix Deare derived by taking the second-order
partial derivative of the energy functional with respect to the two strain invariants:
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 33
11 =2y
s)2exp ev0ee
22 =2y
=3cµp0exp ev0ee
12 =De
21 =2y
crexp ev0ee
B Appendix: Data generation for polycrystal yield function
Here we provide a brief account on the direct numerical simulations that generates the data set for
the ML-generated plasticity model in Section 5. The numerical specimen is a polycrystal assembly con-
sisting of 49 face centered cubic crystal grains. The crystal orientations are randomly generated using the
open source software MTEX (Bachmann et al.,2010). The crystal orientation distribution is demonstrated
in Fig. 25 along with the crystal volume distribution. Directed numerical simulations are performed on
this numerical specimen by solving the Lippman-Schwinger equation using the FFT spectral method with
periodic boundary condition (Ma and Sun,2019,2020). The resultant stress field and plastic deformation
are homogenized and these homogenized responses constitute the material database used for training of
the machine learning plasticity model.
The elasticity model of the polycrystals is linear elasticity with a Young’s Modulus of E=2.0799MPa
and a Poisson ratio of n=0.3. The material’s plastic behavior was calculated using the ultimate algorithm
for crystal plasticity (Borja and Wren,1993). The model has 12 linearly independent slip systems with a
yield stress of 100kPa and a hardening modulus of 100kPa. An FFT elastoplastic simulation is performed
radially for each of 140 different Lode’s angles spanning the p-plane.
(a) (b)
Fig. 25: (a) Crystal orientations and (b) crystal volume distribution of the polycrystal RVE used for the
elastoplasticity database generation.
Each data point generated by the FFT simulations is stored in a cylindrical coordinate system with
positions specified by a radius r, an angle q, and an accumulated plastic strain ¯
ep. For every generated
sample point (ro,qo)on the p-plane, we construct 14 signed distance function training points using a
signed distance function, distributed uniformly on the radial direction with a distance range of ±rofrom
point (ro,qo). The size of the constructed signed distance function corresponds to parameters Llevels =15
and z=2 in Algorithm 2.
After generating the points of the signed distance function, every point has a corresponding output
value equal to the signed distance function f(ro,qo,¯
ep,o)for that point. In this way, all the signed distance
function points on an isocontour will have the same output value. This proven to be an obstacle in the
34 Nikolaos N. Vlassis, WaiChing Sun
back-propagation during the neural network training – many input combinations correspond to the same
output value. To increase the variation of the output values of each sample during training, we introduce
a helper transformation function z(r,q)of the output values in the data pre-processing step. Thus, during
training, every signed distance function input sample point (ro,qo,¯
ep,o)is mapped to an output value:
ep,o)+z(ro,qo). (63)
During the prediction step, the true value of the signed distance function can recovered by subtracting
the know value of z(ro,qo)from the prediction output. The helper function in this work was chosen as
rcos(q/3), where ¯
ris the mean value of the radii in the yield function data set.
C Appendix: Verification exercise with custom hardening
The plasticity components of the neural network elastoplasticity framework can further be decom-
posed by separating the initial yield surface and its evolution – the hardening law. We are introducing a
method to apply custom hardening laws to the neural network approximated yield functions. The initial
yield surface is controlled by a neural network of the form ˜
f(r,q)with only the Lode’s coordinates as in-
puts. The hardening is handled by a separate hardening law. In plasticity literature, hardening is usually
implemented by transforming the yield surface – changing the yield stress value. However, in the case of
our neural network yield function approximation, the yield stress in not explicitly defined and cannot be
immediately modified. To overcome this obstacle, we define the desired hardening laws to the neural net-
work input instead of the assumed yield stress. Specifically, we define a hardening law as transformation
Lof the original Lode’s coordinates rand q, such that:
L(r,q,x)=Lr(r,q,x),Lq(r,q,x)=rL,qL, (64)
where Lr(r,q,x)and Lq(r,q,x)are the parametric equations that transform rand qinto the input variables
rL,qLrespectively after hardening, and xis an internal hardening variable.
Common literature hardening laws can be translated into input transformations of this type and ap-
plied to the neural network yield functions through geometric interpretation. For example, in the simple
case of isotropic hardening of the Von Mises plasticity model, hardening in the p-plane can be interpreted
as the dilation of the circular yield surface – i.e. increase of the radius rywhere there is yielding. In the case
of a neural network approximating the Von Mises yield function, the value of the current rywould not be
readily available to modify. For that reason, instead of increasing the yield radius ry, we opt for decreasing
the input radius rof the neural network an equivalent amount. The transformed radius ¯
ris defined as:
ep, (65)
where His the material’s identified hardening modulus. Any custom hardening model can be applied with
the right conversion to an input transformation. This enables for even more flexibility when assembling
the theoretical components of the elastoplastic framework system. The hyperelastic energy functional, the
initial yield surface, and the hardening law are independent of each other and can separately replaced.
Furthermore, being able to assign a hardening law as a separate process in the data-driven yield function
could prove valuable when only information of the initial yield surface is available in the data.
A few different cases of custom hardening transformations are demonstrated in Fig. 26. The initial
yield surfaces are predicted from a neural network approximator – all the points approximated have an
accumulated plastic strain ¯
ep=0. Fig. 26 (a) and (c) showcase a simple isotropic hardening cases emulated
by reducing the neural network input radius runiformly for all the Lode’s angles qon the p-plane. The
hardening mechanism can be geometrically interpreted as a dilation of the initial yield surface. Fig. 26 (b)
and (d) showcase two modes of hardening acting simultaneously – a dilation and an elongation towards a
preferred direction of the initial yield surface.
In the current formulation, the neural network elastoplastic framework can consist of any isotropic hy-
perelastic energy functional and isotropic yield function. To demonstrate the framework’s capability to cap-
ture non-linear behaviors, we have implemented a fictitious highly non-linear and a fictitious non-linear
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 35
(a) ¯
ep(b) ¯
(c) ¯
ep(d) ¯
Fig. 26: Custom hardening transformations of initial neural network yield surfaces. Transformations (a)
and (c) emulate simple isotropic hardening (dilation of yield surface). Transformations (b) and (d) emulate
a mixed mode hardening mechanism (dilation and change of shape). The transformations are implemented
by modifying the neural network input radius r.
custom hardening law. The energy functional neural network is trained on data set based a modification
on the linear elastic energy functional with the shear part replaced with a highly non-linear term:
s. (66)
The non-linear hardening law is implemented by applying a transformation on the Lode’s radius input
of the Von Mises yield function neural network. The hardening law ˘
Lprovides a transformed radius:
p)6. (67)
The prediction of the framework is demonstrated in Fig. 27. The framework provides great flexibility
to decompose the material behavior for the elasticity, yield surface and hardening law – all of which can
be individually replace. This also allows for a combination of data-driven and handcrafted laws that can
be tuned to closely replicate observed material behaviors.
D Appendix: Verification exercise on learning classical J2 plasticity with isotropic hardening
As a part of the verification exercise, we also test whether the proposed framework is able to deduce an
elasto-plasticity model with linear elasticity and Von Mises plasticity with isotropic hardening. The elasto-
36 Nikolaos N. Vlassis, WaiChing Sun
Fig. 27: Monolithic loading (LEFT), loading-unloading (MIDDLE), and cyclic loading (RIGHT) path pre-
dictions for a fictitious non-linear energy functional and hardening elastoplasticity neural network frame-
work. The initial yield surface is predicted by the Von Mises yield surface neural network. The ANN
elastoplastic framework can handle highly non-linear hyperelastic energy functionals and custom harden-
ing laws. The stress measure is in kPa.
plastic ANN framework consists of a neural network approximating the linear elastic energy functional
and a yield function neural network that approximates a Von Mises yield surface. The hardening law of
the system is implemented in two different ways to demonstrate the flexibility of the framework – similar
to Section 5or following the custom hardening method of Section C(Eq. (65)). The material has a Young’s
Modulus of E=2.0799MPa, a Poisson ratio of n=0.3, an initial yield stress of 100kPa, and a hardening
modulus of H=0.1E.
To demonstrate the repeatability of the training process, we perform the training of the neural networks
that represent a linear elasticity energy functional and a J2 yield function with 5 different random seeds.
The training loss functions for these training experiments is demonstrated in Fig. 28. The neural network
initialization appeared to have minimal effects on the training process results.
The comparison of the neural network elastoplastic framework with three benchmark simulations is
shown in Fig. 29. The framework is tested against a monolithic loading path, a loading path with mul-
tiple unloading patterns, and a cyclic loading path. The framework can adequately capture loading and
unloading patterns it has not been explicitly trained on.
5 Acknowledgments
The authors would like to thank Dr. Ran Ma for providing the implementation of the polycrystal mi-
crostructure generation, the FFT solver, and the information for Figure 25. The authors are supported
by by the NSF CAREER grant from Mechanics of Materials and Structures program at National Science
Foundation under grant contracts CMMI-1846875 and OAC-1940203, the Dynamic Materials and Interac-
tions Program from the Air Force Office of Scientific Research under grant contracts FA9550-17-1-0169 and
FA9550-19-1-0318. These supports are gratefully acknowledged. The views and conclusions contained in
this document are those of the authors, and should not be interpreted as representing the official policies,
either expressed or implied, of the sponsors, including the Army Research Laboratory or the U.S. Govern-
ment. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation herein.
6 Data availability
The data that support the findings of this study are available from the corresponding author upon
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 37
(a) (b)
(c) (d)
Fig. 28: Training loss comparison (a) the energy, (b) the stress, (c) the stiffness of an H2training objective
of a dmmdmd architecture for linear elasticity, and (d) the yield function of a J2 plasticity neural network
with 5 different random seeds.
ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin,
Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine
learning. In 12th {USENIX}symposium on operating systems design and implementation ({OSDI}16), pages
265–283, 2016.
AJ Abbo and SW Sloan. A smooth hyperbolic approximation to the mohr-coulomb yield criterion. Com-
puters & structures, 54(3):427–441, 1995.
JH Argyris, G Faust, J Szimmat, EP Warnke, and KJ Willam. Recent developments in the finite element
analysis of prestressed concrete reactor vessels. Nuclear Engineering and Design, 28(1):42–75, 1974.
F. Bachmann, Ralf Hielscher, and Helmut Schaeben. Texture Analysis with MTEX – Free and Open Source
Software Toolbox. 2010. doi: 10.4028/
James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David D Cox. Hyperopt: a python library
for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1):014008,
MA Bessa, R Bostanabad, Zeliang Liu, A Hu, Daniel W Apley, C Brinson, Wei Chen, and Wing Kam Liu. A
framework for data-driven analysis of materials under uncertainty: Countering the curse of dimension-
ality. Computer Methods in Applied Mechanics and Engineering, 320:633–667, 2017.
Christopher M Bishop et al. Neural networks for pattern recognition. Oxford university press, 1995.
Ronaldo I Borja. Plasticity. Modeling and Computation. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
ISBN 978-3-642-38546-9. doi: 10.1007/978-3-642-38547-6.
Ronaldo I Borja and Alexander P Amies. Multiaxial cyclic plasticity model for clays. Journal of geotechnical
engineering, 120(6):1051–1070, 1994.
38 Nikolaos N. Vlassis, WaiChing Sun
Fig. 29: Comparison of the neural network elastoplastic framework (linear elasticity and Von Mises plas-
ticity) with benchmark simulation data for monolithic (LEFT), loading-unloading (MIDDLE), and cyclic
loading (RIGHT) paths. TOP: The yield function NN replaces the yield function and the hardening law.
BOTTOM: The yield function NN predicts only the initial yield surface and a custom identified hardening
law is applied. The stress measure is in kPa.
Ronaldo I Borja and Jon R Wren. Discrete micromechanics of elastoplastic crystals. International Journal for
Numerical Methods in Engineering, 36(22):3815–3840, 1993.
Ronaldo I Borja, Chao-Hua Lin, and Francisco J Mont´
ans. Cam-clay plasticity, part iv: Implicit integration
of anisotropic bounding surface model with nonlinear hyperelasticity and ellipsoidal loading function.
Computer methods in applied mechanics and engineering, 190(26-27):3293–3323, 2001.
Eric C Bryant and WaiChing Sun. A micromorphically regularized cam-clay model for capturing size-
dependent anisotropy of geomaterials. Computer Methods in Applied Mechanics and Engineering, 354:56–95,
Kyunghyun Cho, Bart Van Merri¨
enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger
Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statis-
tical machine translation. arXiv preprint arXiv:1406.1078, 2014.
Franc¸ois Chollet et al. Keras., 2015.
Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev
training for neural networks. In Advances in Neural Information Processing Systems, pages 4278–4287, 2017.
Yannis F Dafalias. Bounding surface plasticity. i: Mathematical foundation and hypoplasticity. Journal of
engineering mechanics, 112(9):966–987, 1986.
e DE Saint Venant. Memoire sur l’etablissement des equations differentielles des mouvements in-
terieurs operes dans les corps ductiles au dela des limites ou le elasticite pourtrait les ramener a leur
premier etat. Comptes Rendus de l’Academie des Sciences Paris, 70:473–480, 1870.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 39
Eduardo A de Souza Neto, Djordje Peric, and David RJ Owen. Computational methods for plasticity: theory
and applications. John Wiley & Sons, 2011.
Timothy Dozat. Incorporating nesterov momentum into adam. 2016.
Daniel Charles Drucker. Some implications of work hardening and ideal plasticity. Quarterly of Applied
Mathematics, 7(4):411–418, 1950.
CD Foster, RA Regueiro, Arlo F Fossum, and Ronaldo I Borja. Implicit numerical integration of a three-
invariant, isotropic/kinematic hardening cap plasticity model for geomaterials. Computer Methods in
Applied Mechanics and Engineering, 194(50-52):5109–5138, 2005.
Tomonari Furukawa and Genki Yagawa. Implicit constitutive modelling for viscoplasticity using neural
networks. International Journal for Numerical Methods in Engineering, 43(2):195–219, 1998.
Felix A Gers, J ¨
urgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm.
J Ghaboussi, JH Garrett Jr, and Xiping Wu. Knowledge-based modeling of material behavior with neural
networks. Journal of engineering mechanics, 117(1):132–153, 1991.
Bezalel Haimson and John W Rudnicki. The effect of the intermediate principal stress on fault formation
and fault angle in siltstone. Journal of Structural Geology, 32(11):1701–1711, 2010.
YMA Hashash, S Jung, and J Ghaboussi. Numerical implementation of a neural network based material
model in finite element analysis. International Journal for numerical methods in engineering, 59(7):989–1005,
Yousef Heider, Kun Wang, and WaiChing Sun. So (3)-invariance of informed-graph-based deep neural
network for anisotropic elastoplastic materials. Computer Methods in Applied Mechanics and Engineering,
363:112875, 2020.
Rodney Hill. The mathematical theory of plasticity, volume 11. Oxford university press, 1998.
GT Houlsby. The use of a variable shear modulus in elastic-plastic models for clays. Computers and Geotech-
nics, 1(1):3–13, 1985.
Julian Kochmann, Stephan Wulfinghoff, Stefanie Reese, Jaber Rezaei Mianroodi, and Bob Svendsen. Two-
scale fe–fft-and phase-field-based computational modeling of bulk microstructural evolution and macro-
scopic material behavior. Computer Methods in Applied Mechanics and Engineering, 305:89–110, 2016.
Julian Kochmann, Lisa Ehle, Stephan Wulfinghoff, Joachim Mayer, Bob Svendsen, and Stefanie Reese. Ef-
ficient multiscale fe-fft-based modeling and simulation of macroscopic deformation processes with non-
linear heterogeneous microstructures. In Multiscale Modeling of Heterogeneous Structures, pages 129–146.
Springer, 2018.
DIHD Kolymbas. An outline of hypoplasticity. Archive of applied mechanics, 61(3):143–151, 1991.
Brent Komer, James Bergstra, and Chris Eliasmith. Hyperopt-sklearn: automatic hyperparameter configu-
ration for scikit-learn. In ICML workshop on AutoML, volume 9, page 50. Citeseer, 2014.
Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The
handbook of brain theory and neural networks, 3361(10):1995, 1995.
M Lefik, DP Boso, and BA Schrefler. Artificial neural networks in numerical modelling of composites.
Computer Methods in Applied Mechanics and Engineering, 198(21-26):1785–1804, 2009.
Marek Lefik and Bernhard A Schrefler. Artificial neural network as an incremental non-linear constitutive
model for a finite element code. Computer methods in applied mechanics and engineering, 192(28-30):3265–
3283, 2003.
W Lode. Versuche ¨
uber den einfluß der mittleren hauptspannung auf das fließen der metalle eisen, kupfer
und nickel. Zeitschrift f¨ur Physik, 36(11-12):913–939, 1926.
Ran Ma and WaiChing Sun. Fft-based solver for higher-order and multi-phase-field fracture models ap-
plied to strongly anisotropic brittle materials and poly-crystals. Computer Methods in Applied Mechanics
and Engineering, 2019. tentatively accepted.
Ran Ma and WaiChing Sun. Computational thermomechanics for crystalline rock. part ii: Chemo-damage-
plasticity and healing in strongly anisotropic polycrystals. Computer Methods in Applied Mechanics and
Engineering, 369:113184, 2020.
Dougal Maclaurin, David Duvenaud, and Ryan P Adams. Autograd: Effortless gradients in numpy. In
ICML 2015 AutoML Workshop, volume 238, page 5, 2015.
Hajime Matsuoka and Teruo Nakai. Relationship among tresca, mises, mohr-coulomb and matsuoka-nakai
failure criteria. Soils and Foundations, 25(4):123–128, 1985.
40 Nikolaos N. Vlassis, WaiChing Sun
C Miehe, N Apel, and M Lambrecht. Anisotropic additive plasticity in the logarithmic strain space: mod-
ular kinematic formulation and implementation based on incremental minimization principles for stan-
dard materials. Computer Methods in Applied Mechanics and Engineering, 191(47-48):5383–5425, November
2002. ISSN 00457825. doi: 10.1016/S0045-7825(02)00438-3. URL http://linkinghub.elsevier.
R v Mises. Mechanik der festen k ¨
orper im plastisch-deformablen zustand. Nachrichten von der Gesellschaft
der Wissenschaften zu G¨ottingen, Mathematisch-Physikalische Klasse, 1913:582–592, 1913.
M Mozaffar, R Bostanabad, W Chen, K Ehmann, Jian Cao, and MA Bessa. Deep learning predicts path-
dependent plasticity. Proceedings of the National Academy of Sciences, 116(52):26414–26420, 2019.
Kim Lau Nielsen and Viggo Tvergaard. Ductile shear failure or plug failure of spot welds modelled by
modified gurson model. Engineering Fracture Mechanics, 77(7):1031–1047, 2010.
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.
arXiv preprint arXiv:1609.03499, 2016.
M Pastor, OC Zienkiewicz, and AHC Chan. Generalized plasticity and the modelling of soil behaviour.
International Journal for Numerical and Analytical Methods in Geomechanics, 14(3):151–190, 1990.
ephane Pernot and C-H Lamarque. Application of neural networks to the modelling of some constitutive
laws. Neural Networks, 12(2):371–392, 1999.
William Prager. The theory of plasticity: a survey of recent achievements. Proceedings of the Institution of
Mechanical Engineers, 169(1):41–57, 1955.
James R Rice. Inelastic constitutive relations for solids: an internal-variable theory and its application to
metal plasticity. Journal of the Mechanics and Physics of Solids, 19(6):433–455, 1971.
ıas Roodschild, Jorge Gotay Sardi˜
nas, and Adri´
an Will. A new approach for the vanishing gradient
problem on sigmoid activation. Progress in Artificial Intelligence, 9(4):351–360, 2020.
K H Roscoe and JB Burland. On the generalized stress-strain behaviour of wet clay. 1968.
WaiChing Sun. A unified method to predict diffuse and localized instabilities in sands. Geomechanics and
Geoengineering, 8(2):65–75, 2013.
WaiChing Sun, Qiushi Chen, and Jakob T Ostien. Modeling the hydro-mechanical responses of strip and
circular punch loadings on water-saturated collapsible geomaterials. Acta Geotechnica, 9(5):903–934, 2014.
Mahdi Taiebat and Yannis F Dafalias. Sanisand: Simple anisotropic sand plasticity model. International
Journal for Numerical and Analytical Methods in Geomechanics, 32(8):915–948, 2008.
Daniele Versino, Alberto Tonda, and Curt A Bronkhorst. Data driven modeling of plastic deformation.
Computer Methods in Applied Mechanics and Engineering, 318:981–1004, 2017.
Nikolaos Vlassis, Ran Ma, and WaiChing Sun. Geometric deep learning for computational mechanics part
i: Anisotropic hyperelasticity. Computer Methods in Applied Mechanics and Engineering, 371, 2020.
Kun Wang and WaiChing Sun. A multiscale multi-permeability poroplasticity model linked by recursive
homogenizations and deep learning. Computer Methods in Applied Mechanics and Engineering, 334:337–
380, 2018.
Kun Wang, WaiChing Sun, Simon Salager, SeonHong Na, and Ghonwa Khaddour. Identifying material pa-
rameters for a micro-polar plasticity model via X-ray micro-computed tomographic (CT) images: lessons
learned from the curve-fitting exercises. International Journal for Multiscale Computational Engineering, 14
(4), 2016.
Kun Wang, WaiChing Sun, and Qiang Du. A cooperative game for automated learning of elasto-plasticity
knowledge graphs and models with ai-guided experimentation. Computational Mechanics, 64(2):467–499,
Kun Wang, WaiChing Sun, and Qiang Du. A non-cooperative meta-modeling game for automated third-
party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks. arXiv
preprint arXiv:2004.09392, 2020.
Xin Wang, Yi Qin, Yi Wang, Sheng Xiang, and Haizhou Chen. Reltanh: An activation function with van-
ishing gradient resistance for sae-based dnns and its application to rotating machinery fault diagnosis.
Neurocomputing, 363:88–98, 2019b.
WR Wawersik, LW Carlson, DJ Holcomb, and RJ Williams. New method for true-triaxial rock testing.
International Journal of Rock Mechanics and Mining Sciences, 34(3-4):330–e1, 1997.
Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 41
KJ William and EP Warnke. Constitutive model for the triaxial behaviour of concrete (paper iii-l). In Proc.,
Seminar on Concrete Structures Subjected to Triaxial Stresses, 1974.
Kailai Xu, Daniel Z Huang, and Eric Darve. Learning constitutive relations using symmetric positive
definite neural networks. arXiv preprint arXiv:2004.00265, 2020.
Annan Zhang and Dirk Mohr. Using neural networks to represent von mises plasticity with isotropic
hardening. International Journal of Plasticity, page 102732, 2020.
Ruiyang Zhang, Yang Liu, and Hao Sun. Physics-informed multi-lstm networks for metamodeling of
nonlinear structures. arXiv preprint arXiv:2002.10253, 2020.
Olgierd C Zienkiewicz, AHC Chan, M Pastor, BA Schrefler, and T Shiomi. Computational geomechanics,
volume 613. Citeseer, 1999.