Content uploaded by Waiching Sun

Author content

All content in this area was uploaded by Waiching Sun on Jan 22, 2021

Content may be subject to copyright.

Computer Methods in Applied Mechanics and Engineering manuscript No.

(will be inserted by the editor)

Sobolev training of thermodynamic-informed neural networks for

interpretable elasto-plasticity models with level set hardening

Nikolaos N. Vlassis ·WaiChing Sun

Received: December 22, 2020/ Accepted: date

Abstract We introduce a deep learning framework designed to train smoothed elastoplasticity models

with interpretable components, such as stored elastic energy function, ield surface, and plastic ﬂow that

may evolve based on a set of deep neural network predictions. By recasting the yield function as an evolv-

ing level set, we introduce a deep learning approach to deduce the solutions of the Hamilton-Jacobi equa-

tion that governs the hardening/softening mechanism. This machine learning hardening law may recover

any classical hand-crafted hardening rules and discover new mechanisms that are either unbeknownst or

difﬁcult to express with mathematical expressions. Leveraging Sobolev training to gain control over the

derivatives of the learned functions, the resultant machine learning elastoplasticity models are thermody-

namically consistent, interpretable, while exhibiting excellent learning capacity. Using a 3D FFT solver to

create a polycrystal database, numerical experiments are conducted and the implementations of each com-

ponent of the models are individually veriﬁed. Our numerical experiments reveal that this new approach

provides more robust and accurate forward predictions of cyclic stress paths than those obtained from

black-box deep neural network models such as the recurrent neural network, the 1D convolutional neural

network, and the multi-step feed-forward models.

Keywords Sobolev training; multiscale; polycrystals; isotropic yield function, recurrent neural network,

physics-informed constraints

1 Introduction

Plastic deformation of materials is a history-dependent process manifested by irreversible and perma-

nent changes of microstructures, such as dislocation, pore collapses, growth of defects, and phase transi-

tion. Macroscopic constitutive models designed to capture history-dependent constitutive responses can

be categorized into multiple families, such as hypoplasticity, elastoplasticity, and generalized plasticity

(Pastor et al.,1990;Zienkiewicz et al.,1999;Wang et al.,2019a,2020). For example, hypoplasticity models

often do not distinguish the reversible and irreversible strain (Dafalias,1986;Kolymbas,1991;Wang et al.,

2016). Unlike the classical elastoplasticity models where the plastic ﬂow is normal to the stress gradient of

the plastic potential and the evolution of it is governed by a set of hardening rules (Rice,1971;Hill,1998;

Sun,2013;Bryant and Sun,2019), hypoplasticity models do not employ a yield function to characterize the

initial yielding. Instead, the relationship between the strain rate and the stress rate is captured by a set of

evolution laws originated from a combination of phenomenological observations and physics constraints.

Interestingly, the early design of neural network models such as Ghaboussi et al. (1991); Furukawa and

Yagawa (1998); Pernot and Lamarque (1999); Leﬁk et al. (2009), would often adopt this approach with a

purely supervised learning strategy to adjust the weights of neurons to minimize the errors. Using the

Corresponding author: WaiChing Sun

Associate Professor, Department of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail

Code: 4709, New York, NY 10027 Tel.: 212-854-3143, Fax: 212-854-6267, E-mail: wsun@columbia.edu

2 Nikolaos N. Vlassis, WaiChing Sun

strain from current and previous time steps to estimate the current stress, these models would essentially

predict the stress rate without utilizing a yield function and, hence, can be viewed as hypoplasticity models

with machine learning derived evolution laws. The major issue that limits the adaptations of neural net-

work constitutive models is the lack of interpretability and the vulnerability to over-ﬁtting. While there are

existing regularization techniques such as dropout layers (Wang and Sun,2018), cross-validation (Heider

et al.,2020;Vlassis et al.,2020), and/or increasing the size of the database that could be helpful, it re-

mains difﬁcult to assess their credibility without the interpretability of the underlying laws deduced from

the neural network. Another approach could involve symbolic regression through reinforcement learning

(Wang et al.,2019a) or genetic algorithms (Versino et al.,2017) that may lead to explicitly written evolution

laws, however, the ﬁtness of these equations is often at the expense of readability.

Another common approach to predict plastic deformation is the classical elasto-plasticity model where

an elasticity model is coupled with a yield function that evolves with a set of internal variables that rep-

resents the history of the materials. Within the framework of the classical elasto-plasticity model – where

constitutive models are driven by the evolution of yield surface and the underlying elastic model, there has

been a signiﬁcant number of works dedicated to reﬁning the initial shapes and forms of the yield functions

in the stress space (e.g. Mises (1913); Prager (1955); William and Warnke (1974)) and the corresponding

hardening laws that govern the evolution of these yield functions with the plastic strain (e.g. Drucker

(1950); Borja and Amies (1994); Taiebat and Dafalias (2008); Nielsen and Tvergaard (2010); Foster et al.

(2005); Sun et al. (2014)). Generalized plasticity (Pastor et al.,1990;Zienkiewicz et al.,1999;Wang et al.,

2019a) bypasses the usage of both the stress gradient of the plastic potential or the yield surface to predict

plastic ﬂow direction. Instead, an additional phenomenological relation is deduced from experiments to

predict the plastic ﬂow direction as a function of internal variables. In both cases, there are several key

upshots brought by the existence of the yield function. For instance, the existence of a yield function facil-

itates the geometric interpretation of plasticity and, therefore, enables us to connect mechanics concepts,

such as the thermodynamic law, with geometric concepts, such as convexity in the principal stress space

(Miehe et al.,2002;Borja,2013;Vlassis et al.,2020). Furthermore, the existence of a distinctive elastic region

in the stress or strain space also allows introducing a multi-step transfer learning strategy. In this case, the

machine learning of the elastic responses can be viewed as a pre-training step for the plasticity machine

learning where predicted elastic responses can be used to determine the underlying split of the elastic

and plastic strain upon the initial yielding and, hence, allows one to deduce more accurate hardening and

plastic ﬂow rules.

1.1 Why Sobolev training for plasticity

Recently, there have been attempts to rectify the limitations of machine learning models that do not

distinguish or partition the elastic and plastic strain. Xu et al. (2020), for instance, introduce a smooth

transition function to create a ﬁnite transition zone between the elastic and plastic ranges for an incremental

constitutive law generated from supervised learning.

Previous works on machine learning plasticity are often black-box models where strain history is used

as input to predict the stress via a deep neural net trained with a pure supervised learning strategy Leﬁk

and Schreﬂer (2003); Zhang et al. (2020); Bessa et al. (2017). Since the rationale behind the predictions is

stored in the weights of the neurons, it is difﬁcult to interpret. To circumvent this limited interpretability,

previous work such as Mozaffar et al. (2019) and later Zhang and Mohr (2020) introduce machine learning

techniques to deduce the yield function and subsequently deduce the optimal linear or distortion harden-

ing mechanism that minimizes the L2norm of the yield function discrepancy. Wang et al. (2019a), on the

other hand, views the possibility of building different plasticity models as a directed multi-graph and intro-

duces a reinforcement learning algorithm to deduce the optimal conﬁguration of plasticity models among

all the available options (e.g. isotropic, kinematic, rotation, and frictional hardening, isotropic/anisotropic

elasticity) to generate fully interpretable plasticity models. Nevertheless, these previous approaches are not

capable of deducing new hardening/softening mechanisms previously unbeknownst to modelers. This re-

search has three goals: (1) creating interpretable machine learning elastoplasticity models, (2) formulating

a new learning strategy that enables the discovery of new softening/hardening mechanisms, and (3) lever-

aging the interpretability of the models to enforce thermodynamic laws to make predictions compatible to

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 3

universal principles of mechanics. Sobolev training plays an important role for us to achieve these goals by

ensuring the regularity of the energy functionals and level set functions. By adopting the Lode’s coordinate

system to simplify the parametrization, we introduce a simple training program that can generate accurate

and robust stress predictions but also yield the elastic energy and elasto-plastic tangent operator that is

sufﬁciently smooth for numerical predictions – one of the technical barriers that prevented the adoption of

neural network models since their inception in the 90s (Hashash et al.,2004).

1.2 Organization of the content and notations

The organization of the rest of the paper is as follows. We ﬁrst provide a detailed account of the different

designs of Sobolev higher-order training introduced to generate the elastic stored energy functional, yield

function, ﬂow rules, and hardening models in their corresponding parametric space. The setup of con-

trol experiments with other common alternative black-box models is then described. We then demonstrate

how to leverage this new design of an interpretable machine learning framework to analyze the thermo-

dynamic behavior of the machine learning derived constitutive laws while illustrating the geometrical

interpretation of the proposed modeling framework. A brief highlight for the adopted return mapping al-

gorithm implementation that leverages automatic differentiation is provided in Section 3, followed by the

numerical experiments and the conclusions that outline this work’s major ﬁndings.

As for notations and symbols in this current work, bold-faced letters denote tensors (including vectors

which are rank-one tensors); the symbol ’·’ denotes a single contraction of adjacent indices of two tensors

(e.g. a·b=aibior c·d=cijdjk ); the symbol ‘:’ denotes a double contraction of adjacent indices of tensor

of rank two or higher ( e.g. C:ee=Cijklee

kl ); the symbol ‘⊗’ denotes a juxtaposition of two vectors (e.g.

a⊗b=aibj) or two symmetric second-order tensors (e.g. (a⊗b)ijkl =aijbkl). Moreover, (a⊕b)ijkl =

ajl bik and (ab)ijkl =ailbjk. We also deﬁne identity tensors (I)ij =dij,(I4)ijkl =dikdjl, and (I4

sym)ijkl =

1

2(dikdjl +dildkj), where dij is the Kronecker delta. As for sign conventions, unless speciﬁed otherwise, we

consider the direction of the tensile stress and dilative pressure as positive. Unless otherwise speciﬁed, the

strain measure listed in this paper is not expressed in percentage.

2 Framework for Sobolev training of elastoplasticity models

This section presents the framework to train the multiple deep neural networks that predict the elastic

stored energy functional, the yield function, and the plastic ﬂow that will constitute the machine learn-

ing elastoplasticity model. We ﬁrst discuss the neural network architecture designed to enable the learned

functions with the necessary degree of continuity for functional predictions (Section 2.1). We then intro-

duce a number of loss functions designed to optimize hyperelastic responses of materials with different

material symmetries (Section 2.2). Finally, we elaborate on the theoretical basis that enables us to treat

yield function as a signed distance function level set, formulate a supervised learning task that deduces

the learned evolving yield function against a monotonically increasing accumulated plastic strain, and

discuss the simpliﬁed implementation allowed by the material symmetry. The method that leverages an

interpretable design to enforce and validate the thermodynamic constraints and the generation of the aux-

iliary data points of the signed distance function level set to enhance the robustness of the learned yield

functions are also discussed (Section 2.3).

2.1 Neural network design for Sobolev training of a smooth scalar functional with physical constraints

Here we will provide a brief account of the design of the neural network suitable for Sobolev training

of a scalar functional that requires a speciﬁc minimum degree of continuity. The design presented here

will facilitate the speciﬁc supervised learning tasks formulated for an elasticity energy functional, a yield

function, and a plastic ﬂow, which are documented in Sections 2.2 and 2.3. Our goal is to obtain a learned

function belonging to a Sobolev space, a space of function possessing sufﬁcient derivatives for our model-

ing purposes. To facilitate the Sobolev training, we must ensure that (1) the space of the learned function

4 Nikolaos N. Vlassis, WaiChing Sun

is spanned by the basis functions that possess the sufﬁcient degree of continuity and (2) the loss function

in the supervised learning is associated with the norm equipped for the corresponding Sobolev space. To

meet the ﬁrst criterion, one must employ activation functions with sufﬁcient differentiability. In principle,

this can be easily achieved by picking activation functions of a sufﬁcient degree of continuity (e.g. sigmoid

and hyperbolic tangent activation functions). However, neuron layers stacked with layers with these ac-

tivities functions are often suffering from the vanishing and exploding gradient problems and, hence, not

suitable for our purpose (Wang et al.,2019b;Roodschild et al.,2020). Meanwhile, the Rectiﬁed Linear Unit

(ReLU) activation function is usually deployed instead as the default choice for multilayer perceptron and

convolutional neural networks due to the generally good performance and faster learning capacity. To at-

tain the required degree of continuity while circumventing the vanishing gradient problem, we introduce

a simple technique which we refer to as Multiply Layers. A Multiply layer receives the output vector hn−1

of the preceding layer as input and simply outputs hn, such that,

hn=Multiply(hn−1)=hn−1◦hn−1, (1)

where ◦is the element-wise product of two vectors. These layers are placed in between two hidden dense

layers of the network to modify the output of the preceding layer. This technique enables us to create

neural networks that produce a learned function of an arbitrary degree of continuity without introducing

any additional weights or handcrafting any custom activation functions.

The performance of these different neural networks that complete the higher-order Sobolev training is

demonstrated in the numerical experiments showcased in Section 5.1. Several variations of the standard

two-layer architecture we experimented with are shown in Fig. 1.

Architecture A: ddd Architecture B: dmdd Architecture C: dmdmd Architecture D: dmmdmd

Fig. 1: Four neural network architectures (A-D) with different combinations of dense and multiply layers

– the number of Multiply layers increases from left to right. The letters d and m represent the Dense and

Multiply layers respectively that form an architecture (e.g. dmdd represents the stacked layer structure

consisting of Dense →Multiply →Dense →Dense layers).

Remark 1 The placement and the number of intermediate Multiply layers are hyper-parameters that can

be ﬁne-tuned along with the rest of the hyperparameters of the neural network (e.g. dropout rate, number

of neurons per layer, number of layers). The tuning of these hyperparameters can be performed manually

or through automatic hyperparameter tuning algorithms (cf. Bergstra et al. (2015); Komer et al. (2014)).

2.2 Sobolev training of a hyperelastic energy functional

The ﬁrst component of the elastoplastic framework we train is the elastic stored energy ye. The elasticity

energy functional is not only useful for predicting the stress from the elastic strain, but can also be used to

re-interpret the experimental data to identify the accumulated plastic strain and the plastic ﬂow directions

that are crucial for the training of the yield function and the plastic ﬂow. In the inﬁnitesimal strain regime,

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 5

the hyperelastic energy functional ye(ee)∈R+can be deﬁned as a non-negative valued function of elastic

inﬁnitesimal strain of which the ﬁrst derivative is the Cauchy stress tensor s∈Sand the Hessian is the

tangential elasticity tensor ce∈M:

s=∂ye(ee)

∂ee,ce=∂s

∂ee=∂2ye(ee)

∂ee⊗∂ee, (2)

where Sis the space of the second-order symmetric tensors and Mis the space of the fourth-order tensors

that possess major and minor symmetries (Heider et al.,2020). The true hyperelastic energy functional

yeof the material is approximated by the neural network learned function b

ye(ee|W,b)with the elastic

strain tensor eeas the input, parametrized by weights Wand biases bobtained from a supervised learning

procedure.

To guarantee the quality and even the existence of the stress and elastic tangent operators stemmed

from the learned energy function, we adopt a Sobolev training framework, introduced by Czarnecki et al.

(2017), and extend it to higher-order derivative constraints. By leveraging the differentiability achieved by

the Multiply layers and adopting an H2norm as the training objective, we introduce an alternative that

renders the neural network model applicable for implicit solvers while eliminating the potential spurious

oscillations of the tangent operators. The new training objective for the hyperelastic energy functional

approximator b

yeincludes constraints for the predicted energy, stress, and stiffness values. This training

objective, modeled after an H2norm, for the training samples i∈[1, ..., N]would have the following form:

W,b=argmin

W,b0

@1

N

N

Â

i=10

@g1

ye

i−b

ye

i

2

2+g2

∂ye

i

∂ee

i−∂b

ye

i

∂ee

i

2

2

+g3

∂2ye

i

∂ee

i⊗∂ee

i−∂2b

ye

i

∂ee

i⊗∂ee

i

2

21

A1

A. (3)

In the above formulation, the conventional L2norm-based training objective can be obtained by setting

parameters g2=g3=0. An H1norm-based training objective can be recovered from Eq. (3) by setting

g3=0. An H1norm loss function for a hyperelastic energy functional can be seen in Vlassis et al. (2020).

In this work, we generate elasto-plastic models that are practical for implicit solvers. This, however,

was considered a difﬁcult task in the earlier attempts on using a neural network as a replacement for

constitutive laws (cf. Hashash et al. (2004)) where the proposed solution would be to either bypass the

calculation of a tangent with an explicit time integrator or to introduce ﬁnite differences on the stress

predictions.

2.2.1 Simpliﬁed training for isotropic elasticity

In this work, our primary focus is on small strain isotropic hyperelasticity which can completely be

described in spectral form by the principal strain and stress values (without the principal directions). Thus,

for isotropic inﬁnitesimal hyperelasticity, the H2training objective of Eq. (3) for the training samples i∈

[1, ..., N]can be rewritten in terms of principal values as:

W,b=argmin

W,b⇣1

N

N

Â

i=1⇣g1

ye

i−b

ye

i

2

2+

3

Â

A=1

g2

∂ye

i

∂ee

A,i−∂b

ye

i

∂ee

A,i

2

2

+

3

Â

A=1

3

Â

B=1

g3

∂2ye

i

∂ee

A,i∂ee

B,i−∂2b

ye

i

∂ee

A,i∂ee

B,i

2

2⌘⌘, (4)

where ee

Afor A=1, 2, 3 are the principal values of the elastic strain tensor ee. The approximated energy

functional for this training objective is a function of the three input principal strains and not of a full second-

order tensor of 6 input components (reduced from 9 by assuming symmetry). This effectively reduces the

input parametric space of the learned function and facilitates learning by minimizing complexity.

6 Nikolaos N. Vlassis, WaiChing Sun

2.2.2 Simpliﬁed training for two-invariant elasticity

The training objective can further be simpliﬁed to two input variables by adopting an invariant space,

commonly used in geotechnical studies when the intermediate principal stress does not exhibit a dominat-

ing effect on the elastic response or when the intermediate principal stress is not measured at all due to

limitations of the experiment apparatus (Wawersik et al.,1997;Haimson and Rudnicki,2010). In this case, a

small-strain isotropic hyperelastic law can equivalently be described with two strain invariants (volumetric

strain ee

v, deviatoric strain ee

s). The strain invariants are deﬁned as:

ee

v=tr (ee),ee

s=r2

3ee,ee=ee−1

3ee

v1, (5)

where eeis the small strain tensor and eethe deviatoric part of the small strain tensor. Using the chain rule,

the Cauchy stress tensor can be described in the invariant space as follows:

s=∂ye

∂ee

v

∂ee

v

∂ee+∂ye

∂ee

s

∂ee

s

∂ee. (6)

In the above, the mean pressure pand deviatoric (von Mises) stress qcan be deﬁned as:

p=∂ye

∂ee

v≡1

3tr(s),q=∂ye

∂ee

s≡r3

2s, (7)

where sis the deviatoric part of the Cauchy stress tensor. Thus, the Cauchy stress tensor can be expressed

by the stress invariants as:

s=p1+r2

3qbn, (8)

where

bn=ee/ee=√2/3/ee

see. (9)

The H2training objective of Eq. (4) for the training samples i∈[1, ..., N]can now be rewritten in terms

of the two strain invariants:

W,b=argmin

W,b⇣1

N

N

Â

i=1⇣g1

ye

i−b

ye

i

2

2+g4pi−b

pi2

2

+g5qi−b

qi2

2+

2

Â

a=1

2

Â

b=1

g6

De

ab,i−b

De

ab,i

2

2⌘⌘, (10)

where:

De

11 =∂2y

∂ee2

v

,De

22 =∂2y

∂ee2

s

, and De

12 =De

21 =∂2y

∂ee

v∂ee

s

. (11)

To aid the implementation by a third party, a pseudocode for the supervised learning of the isotropic

hyperelastic energy functional is provided (see Algorithm 1).

Remark 2 Rescaling of the training data. In every loss function in this work, we have introduced scal-

ing coefﬁcients gato remind the readers that it is possible to change the weighting to adjust the relative

importance of different terms in the loss function. These scaling coefﬁcients may also be viewed as the

weighting function in a multi-objective optimization problem. In practice, we have normalized all data to

avoid the vanishing or exploding gradient problem that may occur during the back-propagation process

(Bishop et al.,1995). As such, normalization is performed before the training as a pre-processing step. The

Xisample of a measure Xis scaled to a unit interval via,

Xi:=Xi−Xmin

Xmax −Xmin

, (12)

where Xiis the normalized sample point. Xmin and Xmax are the minimum and maximum values of the

measure Xin the training data set such that all different types of data used in this paper (e.g. energy, stress,

stress gradient, stiffness) are all normalized within the range [0, 1].

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 7

Algorithm 1 H2training of a neural network hyperelastic energy functional for isotropic material.

Require: Data set of Nsamples: strain measures ee, energy measures ye, stress measures s, and stiffness

measures ce.

1. Project data samples onto invariant space

Initialize empty set of training samples ⇣ee

v,i,ee

s,i,pi,qi,De

11,i,De

22,i,De

12,i⌘for iin [0, ..., N].

for i in [0,...,N]do

Compute ⇣ee

v,i,ee

s,i,pi,qi,De

11,i,De

22,i,De

12,i⌘, given ee

i,si,ce

i(see Sec. 2.2.1).

Rescale ⇣ee

v,i,ee

s,i,pi,qi,De

11,i,De

22,i,De

12,i,ye

i⌘into ⇣ee

v,i,ee

s,i,ye

i,pi,qi,De

11,i,De

22,i,De

12,i,ye

i⌘

via Eq. (12).

end for

2. Train neural network b

ye(ee

v,ee

s)with loss function Eq. (10).

3. Output trained energy functional b

yeneural network and exit.

2.3 Training of an evolving yield function as a level set

This section introduces the theoretical framework that regards the evolution of a yield surface as a

level set evolution problem. To facilitate a geometrical interpretation without the burden of generating a

large database, we restrict our attention to construct a yield function that remains pressure-insensitive but

may otherwise evolve in any arbitrary way on the p-plane, including moving, expanding, contracting, and

deforming the elastic region. The goal of the supervised learning task is to determine the optimal way the

yield function should evolve such that it is consistent with the observed experimental data collected after

the plastic yielding and obeying the thermodynamic constraints that can be interpreted geometrically in

the stress space.

2.3.1 Reducing the dimension of the data by leveraging symmetries

Here, we provide a brief review of the geometrical interpretation of the stress space and how it can be

used to reduce the dimensions of the data representation and reduce the difﬁculty of the machine learning

tasks. In this work, we consider a convex elastic domain Edeﬁned by a yield surface f. This yield function

is a function of Cauchy stress sand the internal variable xthat represent the history-dependent behavior

of the material, i.e., (cf. Borja (2013)),

x=Zt

0

˙

ldt, (13)

where xis a monotonically increasing function of time and ˙

lis the rate of change of the plastic multiplier

where ˙ep=˙

l∂g/∂sand gis the plastic potential. The yield function returns a negative value in the elastic

region and equals zero when the material is yielding. The stress on the boundary f(s,x)=0 is, therefore,

the yielding stress and all admissible stress belong to the closure of the elastic domain, i.e.,

E:={(s,x)∈S×R1|f(s,x)≤0}. (14)

First, we assume that the yield function depends only on the principal stress. This treatment reduces

the dimension of the stress representation from six to three. Then, we assume that the plastic yielding is

not sensitive to the mean pressure. As such, the shape of the yield surface in the principal stress space can

be sufﬁciently described by a projection on the p-plane and, hence, further reduce the dimensions of the

independent stress input from three to two. To further simplify the interpolation of the yield surface, we

introduce a polar coordinate system on the p-plane such that different monotonic stress paths commonly

obtained from triaxial tests can be easily described via the Lode’s angle.

Recall that the p-plane refers to a projection of the principal stress space based on the space diagonal

deﬁned by s1=s2=s3. More speciﬁcally, the p-plane is deﬁned by the equation:

s1+s2+s3=0. (15)

8 Nikolaos N. Vlassis, WaiChing Sun

σ1

σ2

σ3

σ1=σ2=σ3

1

1

1

σ1=σ2=σ3

Fig. 2: The principle stress space (LEFT) and the corresponding p-plane (RIGHT). The p-plane is per-

pendicular to the space diagonal and is passing through the origin of the principal stress axes. Figure

reproduced from Borja (2013).

The transformation from the original stress space coordinate system (s1,s2,s3)to the p-plane can be

decomposed into two speciﬁc rotations Rand R00 of the coordinate system (cf. (Borja,2013)) such that,

8

<

:

s1

s2

s39

=

;=RR 8

<

:

s

1

s

2

s

3

9

=

;=2

4

√2/2 0 √2/2

010

−√2/2 0 √2/2 3

52

4

10 0

0√2/3 1/√3

0−1/√3√2/3 3

58

<

:

s

1

s

2

s

3

9

=

;. (16)

For pressure-insensitive plasticity, s

3is not needed, as the principal stress differences are a function of

s

1and s

2and are independent of s

3. We opt to describe the stress states of the material on the p-plane

using two stress invariants, the polar radius rand the Lode’s angle q(Lode,1926). These invariants are

derived by solving the characteristic equation of the deviatoric component s∈Sof the Cauchy stress

tensor, following (Borja,2013):

s3−J2s−J3=0, (17)

where sis a principal value of s, and

J2=1

2tr(s2),J3=1

3tr(s3), (18)

are respectively the second and third invariants of the tensor s. Utilizing the identity:

cos3q−3/4 cos q−1/4 cos 3q=0, (19)

and writing sin polar coordinates such that:

s=rcos q, (20)

and substituting in (17), the polar radius and the Lode’s angle, can be retrieved as:

r=2pJ2/3, and cos 3q=3√3J3

2J3/2

2

. (21)

In terms of the p-plane coordinates s

1and s

2, the Lode’s coordinates rand qcan be respectively written

as:

r=qs2

1+s2

2, and tan q=s

2

s

1

. (22)

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 9

Thus, for an isotropic pressure-independent plasticity model, the yield surface can equivalently be

described by an approximator using either the principal stresses s1,s2, and s3or the stress invariant r,

and, qsuch that:

f(s1,s2,s3,x)= b

f(r,q,x)=0. (23)

Fig. 3: The procedure to generate the polycrystal plasticty dataset on the p-plane. Three simulations are

ﬁrst performed to locate the yielding points along the principal axes (LEFT). An initial trial elastic convex

region is a triangle formed by connecting these three points (MIDDLE). The rest of the simulations are then

performed outside of this triangle to reﬁne the shape of the yield surface. The p-plane is split into splices of

equal arc length and data points are sampled radially (RIGHT). The initial yielding point for each loading

path is located once yielding is detected in the DNS.

In this case, both the isotropy of the yield function and the symmetry along the hydrostatic axis sig-

niﬁcantly ease the training process. First, the material symmetry reduces the dimensionality of the data

and, hence, reduces the demand for data points (Heider et al.,2020;Wang and Sun,2018). Second, the

geometrical interpretation of yield function on the p-plane provides clear guidance for more effective data

exploration. In our numerical examples, the material database constitutes recorded constitutive responses

obtained from direct numerical simulations of polycrystal assembles. The p-plane is partitioned by Lode’s

angles where each partitioned angle is assigned a stress path that moves toward the radial direction on

the p−plane. By assuming the convexity of the yield surface, we can identify an initial path-independent

elastic region by locating the initial yielding point at each of the prescribed stress paths (see Fig. 3).

2.3.2 Data preparation for training of the yield function as a level set

Identifying the set of stress at which the initial plastic yielding occurs is a necessary but not sufﬁcient

condition to generate a yield surface. In fact, a yield surface f(s,x)must be well-deﬁned not just at f=0

but also anywhere in the product space S×R1. Another key observation is that, in order for the yield

surface to function properly, the value of f(s,x)inside and outside the yield surface may vary, provided

that the orientation of the stress gradient remains consistent. For instance, consider two classical J2yield

functions,

f1(s,x)=p2J2−k≤0; f2(s,x)=pJ2−k/√2≤0. (24)

These two models will yield identical constitutive responses except that, in each incremental step, the

plastic multiplier deduced from f1is √2 times smaller than that of f2, as the stress gradient of f1is √2

times larger than that of f2. With these observations in mind, we introduce a level set approach where the

yield surface is postulated to be a signed distance function level set and the evolution of the yield function

is governed by a Hamilton-Jacobi equation that is not solved but generated from a supervised learning via

these following steps.

10 Nikolaos N. Vlassis, WaiChing Sun

1. Generate auxiliary data points to train the signed distance function yield function. In this ﬁrst step,

we ﬁrst attempt to construct a signed distance function fin the stress space where the internal variable

is ﬁxed on a given value, i.e. x=xwhen yielding. Let Wbe the solution domain of the stress space

of which the signed distance function is deﬁned. Assume that the yield function can be sufﬁciently

described in p-plane. For simplicity, we will adopt the polar coordinate system to parametrize the

signed distance function fthat is used to train the yield surface , i.e.,

x(s11,s22,s33,s12,s23,s13)=x(s1,s2,s3)=bx(r,q). (25)

Ω

fΓ

fΓ=φ−1(0)

Ω

φ(x)

Fig. 4: Illustration of the relationship between the boundary of Edeﬁned at fG(yield surface) (LEFT) and

the level set yield function deﬁned everywhere in the stress space (RIGHT).

The signed distance function (see, for instance Figure 4) is deﬁned as

f(b

x,t)=8

<

:

d(b

x)outside fG(inadmissible stress)

0 on fG(yielding)

−d(b

x)inside fG(elastic region)

, (26)

where d(b

x)is the minimum Euclidean distance between any point xof Wand the interface fG={bx∈

R2|f(bx)=0}, deﬁned as:

d(bx)=min (|bx−bxG|). (27)

where xGis the yielding stress for a given x. The signed distance function is obtained by solving the

Eikonal equation |∇bxf|=1 while prescribing the signed distance function as 0 at x∈fG. In the polar

coordinate system, the Eikonal equation reads,

(∂f

∂r )2+1

r2(∂f

∂q )2=1. (28)

Note there is a singularity at the polar coordinate of the p−plane at r=0 and, hence, the origin point is

not used as an auxiliary point to train the yield function. The Eikonal solution could be simply solved

by a fast marching solver in 2D polar coordinates as well. It is noted that the selection of the signed

distance function to be used as the level set representation is not limiting. Other level set functions

are expected to fulﬁll the same purpose. However, the signed distance function was chosen for its

simplicity to implement.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 11

1

1

1

1

(a) J2 yield function (b) Tresca yield function with back stress

1

1

1

1

(c) Argyris et al. (1974) yield function (d) Custom yield function inferred from polycrystal

simulations

Fig. 5: The inferred yield surface (red curves) and the auxiliary data points (blue dots) obtained by solving

the level set re-initialization problem that generates the signed distance function (blue contours) for four

yield surfaces. The yield function f(x,t)is converted into a signed distance function with the contour

f(x,t)=0 ﬁxed. The isocontour curves represent the projection of the signed distance function level set

on the p-plane.

Figure 5shows several examples of signed distance functions converted for classical yield surfaces or

deduced from direct numerical simulations.

12 Nikolaos N. Vlassis, WaiChing Sun

2. Obtain the velocity function to constitute the Hamilton-Jacobi hardening of the yield function Af-

ter we generate a sequence of signed distance functions for different x, we might introduce an inverse

problem to obtain the velocity function for the Hamilton-Jacobi equation that evolves the signed dis-

tance function. Recall that the Hamilton-Jacobi equation may take the following forms:

∂f

∂t+v·∇bxf=0, (29)

where vis the normal velocity ﬁeld that deﬁnes the geometric evolution of the boundary and, in the

case of plasticity, is chosen to describe the observed hardening mechanism. The velocity ﬁeld is given

by:

v=Fn, (30)

where Fis a scalar function describing the magnitude of the boundary change and n=∇bxf/|∇bxf|.

Using ∇f·∇f=|∇f|2in Eq. (29), the level set Hamilton-Jacobi equation for stationary yield function

can be simpliﬁed as,

∂f

∂t+F|∇bxf|=0. (31)

Note that tis a pseudo-time. Since the snapshot of fwe obtained from Step 1 remains a signed distance

function, then |∇bxf|=1. Next, we replace the pseudo-time twith x. Assuming that the experimental

data collected from different stress paths are collected data points Ntimes beyond the initial yielding

point, each time with the same incremental plastic strain Dl, then Step 1 will provide us a collection of

signed distance function {f0,f1, ...., fn+1}corresponding to {x0,x1, ...., xn+1}. Then, the corresponding

velocity function can be obtained via ﬁnite difference, i.e.,

Fi≈fi−fi+1

xi+1−xi

, (32)

where Fi(r,q)=F(r,q,xi)and i=0, 1, 2, ..., n+1. By setting the signed distance function that fulﬁlls

Eq. 31 as the yield function, i.e., f(r,q,x)=f(r,q,x), we may use experimental data generated from

the loading paths demonstrated in Fig. 3to train a neural network that predicts a new yield function

or velocity function for an arbitrary x(see Figure 6).

More importantly, we show that the evolution of the yield function can be modeled as a level set evolv-

ing according to a Hamilton-Jacobi equation. This knowledge may open up many new possibilities to cap-

ture hardening without any hand-crafted treatment. To overcome the potential cost to solve the Hamilton-

Jacobi equation, we will introduce a supervised learning procedure to obtain the updated yield function

for a given strain history represented by the internal variable xwithout explicitly solving the Hamilton-

Jacobi equation (see the next sub-sections). Consequently, this treatment will enable us to create a generic

elasto-plasticity framework that can replace the hard-crafted yield functions and hardening laws without

the high computational costs and the burden of repeated modeling trial-and-errors.

2.3.3 Training a yield function with associative plastic ﬂow

Assuming an associative ﬂow rule, the generalized Hooke’s law utilizing a yield function neural net-

work approximator b

fcan be written in rate form as:

˙

s=ce: ˙e−˙

l∂b

f

∂s!. (33)

And in incremental form, the predictor-corrector scheme is written as:

sn+1=str

n+1−Dlce

n+1:∂b

f

∂sn+1

, (34)

where

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 13

Fig. 6: The evolution of the yield surface fGis connected to a level set f(x,x)extension problem. The

velocity ﬁeld vof the Hamilton-Jacobi equation (31) emulates the material hardening law. The yield surface

evolution and the velocity ﬁeld are inferred from the data through the neural network training.

str

n+1=sn+ce

n+1:De=ce

n+1:ee tr

n+1,eetr

n+1=ee

n+De. (35)

The strain and stress tensor predictors can be written in the spectral form as follows:

str

n+1=

3

Â

A=1

str

A,n+1ntr(A)

n+1⊗ntr(A)

n+1,eetr

n+1=

3

Â

A=1

eetr

A,n+1ntr(A)

n+1⊗ntr(A)

n+1. (36)

The predictor-corrector scheme can be rewritten in spectral form, omitting the subscript (n+1)

3

Â

A=1

sAn(A)⊗n(A)=

3

Â

A=1

str

Antr(A)⊗ntr(A)−Dl

3

Â

A=1 3

Â

B=1

ae

AB b

fB!n(A)⊗n(A)(37)

n(A)⊗n(A)=ntr(A)⊗ntr(A)(38)

By assuming that the plastic ﬂow obeys the normality rule, we may use the observed plastic ﬂow from

the data to regularize the shape of the evolving yield function. To do so, we will leverage the fact that

we have already obtained an elastic energy functional from the previous training. The plastic deforma-

tion mode can then be obtained by the difference between the trial and the true Cauchy stress at each

incremental step where the data are recorded in an experiment or direct numerical simulations, i.e.,

sA=str

A−Dl

3

Â

B=1

ae

AB fB,

fA=∂f/∂sA=eetr

A−ee

A

Dlfor A=1, 2, 3.

(39)

At each incremental step of the data-generating simulations, we know the total strain and total stress.

Knowing the underlying hyperelastic model, we can utilize an inverse mapping to estimate the elastic

strain from the current total stress and hence determine the plastic strain.

14 Nikolaos N. Vlassis, WaiChing Sun

The quantities f1,f2,f3correspond to the amount of plastic ﬂow in the principal directions A=1, 2, 3.

A neural network approximator of the yield function should have adequately accurate stress derivatives

that are necessary for the implementation of the return mapping algorithm, discussed in Section 3, and

so as to provide an accurate plastic ﬂow, in the case of associative plasticity. The normalized plastic ﬂow

direction vector fnorm can be deﬁned as

fnorm =b

f1,b

f2,b

f3/b

f1,b

f2,b

f3, (40)

and holds information about the yield function shape in the p-plane.

In the case of the simple MLP feed-forward network, the network can be seen as an approximator func-

tion b

f=b

f(r,q,x|W,b)of the true yield function level set fwith input the Lode’s coordinates r,q, and

the hardening parameter x, parametrized by weights Wand biases b. A classical training objective, fol-

lowing an L2norm, would only constrain the predicted yield function values. The corresponding training

objective is to minimize the discrepancy measured at Nnumber of sample points (bx,x)∈S×R1reads,

W,b=argmin

W,b 1

N

N

Â

i=1

g7

fi−b

fi

2

2!, (41)

where fi=f((bxi,xi)and b

fi=f((bxi,xi). A second training objective can be modeled after an H1norm,

constraining both fand its ﬁrst derivative with respect to the stress state s1,s2,s3. For a neural network

aprroximator parametrized as f=f(s1,s2,s3,x|W,b)using the principal stresses as inputs, this training

objective for the training samples i∈[1, ..., N]would have the following form:

W,b=argmin

W,b0

@1

N

N

Â

i=10

@g7

fi−fi

2

2+

3

Â

A=1

g8

∂fi

∂sAi−∂fi

∂sAi

2

21

A1

A. (42)

Utilizing an equivalent representation of the stress state with Lode’s coordinates in the p-plane, the

above training objective can further be simpliﬁed. The normalized ﬂow direction vector fnorm in Lode’s

coordinates can solely be described using an angle qfsince the vector has a magnitude equal to unity. To

constrain the ﬂow direction angle, we modify the loss function of this higher-order training objective by

adding a distance function metric between two rotation tensors Rq,i,Rb

q,i, corresponding to qf,iand b

qf,i– the

ﬂow vector directions in the p-plane for the data and approximated yield function respectively for the i-th

sample. The two rotation tensors belong to the Special Orthogonal Group, SO(3) and the metric is based

on the distance from the identity matrix. For the i-th sample, the rotation related term can be calculated as:

Fi=

I−Rq,i⇣Rb

q,i⌘T

F

=s23−tr Rq,i⇣Rb

q,i⌘T, (43)

where ·Fis the Frobenius norm. For a neural network approximator parametrized via the Lode’s coor-

dinates as input, i.e. b

f=b

f(r,q,x|W,b), the Sobolev training objective for the training samples i∈[1, ..., N]

reads,

W,b=argmin

W,b 1

N

N

Â

i=1✓g7

fi−b

fi

2

2+g9Fi◆!, (44)

where we minimize both the discrepancy of the yield function and the direction of the gradient in the stress

space.

Remark 3 Discrete data points for the yield function. Note that the training of the yield function involves

not just the points at f(r,q)=0 but also the new auxiliary data generated from the re-initialization of

the level set/yield function. Strictly speaking, the accuracy of the elasto-plastic responses only depends

on how well the boundary of the admissible stress range f(r,q)=0 is kept track of. However, knowing

the yield function values inside and outside the admission range is helpful for evolving the yield function

with sufﬁcient smoothness. To emphasize the importance of data across f(r,q)=0, we may introduce a

higher weighting factor of these data points for Eq. (44).

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 15

Algorithm 2 Training of a pressure independent isotropic yield function level set neural network.

Require: Data set of Nsamples: stress measures sat yielding and accumulated plastic strain ep,aLlevels

number of levels (isocontours) for the constructed signed distance function level set (data augmentation),

and a parameter z>1 for the radius range of the constructed signed distance function.

1. Project stress onto p-plane

Initialize empty set of p-plane projection training samples (ri,qi)for iin [0, ..., N].

for i in [0,...,N]do

Spectrally decompose si=Â3

A=1sA,in(A)

i⊗n(A)

i.

Transform (s1,i,s2,i,s3,i)into ⇣s

1,i,s

2,i,s

3,i⌘via Eq. (16).

ri←qs2

1,i+s2

2,i

qi←tan−1✓s

2,i

s

1,i◆

end for

2. Construct yield function level set (data augmentation)

Initialize empty set of augmented training samples (rm,qm,ep,m,fm)for min [0, ..., N×Llevels ].

m←0.

for i in [0,...,N]do

for jin [0,...,Llevels]do

rm←⇣zj

Llevels ⌘ri.the signed distance function is constructed for a radius range of [0, zri]

qm←qi

ep,m←ep,i

fm←⇣zj

Llevels ⌘ri−ri.the signed distance function value range is [−ri,(z−1)ri]

Rescale (rm,qm,ep,m,fm)into (rm,qm,ep,m,fm)via Eq. (12).

m←m+1

end for

end for

3. Train neural network b

f(rm,qm,ep,m)with loss function Eq. (41).

4. Output trained yield function b

fneural network and exit.

2.3.4 Training yield function and non-associative plastic ﬂow

Here, we present the training for the plastic ﬂow without assuming that the plastic ﬂow follows the

normality rule. As such, the yield function and the plastic ﬂow must be trained separately. We adopt the

idea of generalized plasticity in which the plastic ﬂow direction is directly deduced from the Sobolev

training of a neural network on the experimental data (Zienkiewicz et al.,1999).

Firstly, the yield function training is similar to the associative ﬂow cases, except that the terms that

control the stress gradient of the yield function cannot be directly obtained from the plastic ﬂow due to the

non-associative ﬂow rule. Nevertheless, the stress gradient of the yield function may still be constrained

by the convexity (if there is no intended phase transition that requires non-convexity of the yield function).

Recall that the convexity requires

(s∗−s):∂b

f

∂s≤0, (45)

where s∗is an arbitrary stress. One necessary condition we can incorporate as a thermodynamic constraint

is a special case where we simply set s∗=0, as such we obtain,

3

Â

A=1

sAb

fA≥0. (46)

16 Nikolaos N. Vlassis, WaiChing Sun

One way to enforce that constraint is to apply a penalty term for the loss function in Eq. (41), .e.g,

wnnp sign(−

3

Â

A=1

sAb

fA), (47)

where this term will not be activated if the learned yield function is obeying the convexity. However, the

sign operator may lead to a jump of the loss function, which is not desirable for training. As a result, a

regularized Heaviside step function can be used to replace the sign operator, for instance,

signapprox(−sAb

fA)=1

2+1

2tanh(−ksAb

fA), (48)

where kcontrols how sharp the transition is at sAb

fA=0. As shown in our numerical experiments, this

additional term may not be required if the raw experimental data itself does not violate the thermody-

namic restriction. To obtain a plastic ﬂow, we again obtain the ﬂow information incrementally from the

experimental data via the following equations, i.e.,

sA=str

A−Dl

3

Â

B=1

ae

ABgB

gA=∂g/∂sA=eetr

A−ee

A

Dlfor A=1, 2, 3.

(49)

We can gather the plastic ﬂow information by post-processing the simulation data, similar to Equa-

tion (39). Once the plastic ﬂow gAis determined incrementally for different x, we then introduce another

supervised learning that reads,

W,b=argmin

W,b 1

N

N

Â

i=1⇣g10 gA,i−b

gA,i2

2⌘!. (50)

The non-negative plastic work is the thermodynamic constraint that requires ˙

Wp=s:˙ep≥0. The

corresponding incremental form for isotropic material reads,

DW=sn+1:Dep=Dl

3

Â

A=1

sAb

gA≥0. (51)

Notice that the stress beyond the initial yielding point satisﬁes the yield function f=0. As a result, this

inequality can be recast as an additional term for the loss function that trains the yield function (Eq. (44))

such that

wnnp sign(−Dl

3

Â

A=1

sAb

gA), (52)

where wnnp is the penalty parameter. Notice that when the non-negative plastic work is fulﬁlled dur-

ing the training of neural network, the penalty term would not be activated and will not affect the back-

propagation step. Furthermore, if the yield function is convex and the ﬂow rule is associative, this con-

straint is always fulﬁlled and, hence, not necessary. This constraint, however, should be helpful to regulate

the relationships of the yield function and plastic ﬂow when we intend to train the plastic ﬂow direction

independent of the stress gradient of the yield function.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 17

Algorithm 3 Return mapping algorithm in strain-space in principal axes for an isotropic hyperelastic-

plastic model

Require: Hyperelastic energy functional b

yeneural network (see Algorithm 1) and yield function b

fneural

network (see Algorithm 2).

1. Compute trial elastic strain

Compute ee tr

n+1=ee

n+De.

Spectrally decompose ee tr

n+1=Â3

A=1ee tr

Antr(A)⊗ntr(A).

2. Compute trial elastic stress

Compute str

A=∂b

ye/∂ee

Aat ee tr

n+1.

3. Check yield condition and perform return mapping if loading is plastic

if b

fstr

1,str

2,str

3,xn≤0then

Set sn+1=Â3

A=1str

Antr(A)⊗ntr(A)and exit.

else

Solve for ee

1,ee

2,ee

3, and xn+1such that b

fstr

1,str

2,str

3,xn+1=0.

Compute sn+1=Â3

A=1∂b

ye/∂ee

Antr(A)⊗ntr(A)and exit.

end if

3 Implementation highlights: return mapping algorithm with automatic differentiation

Here, we provide a review of the implementation of a fully implicit stress integration algorithm used

for the proposed Hamilton-Jacobi hardening framework. For isotropic materials where the elastic strain

and stress are co-axial, the stress integration can be done via spectral decomposition as shown in Alg. 3.

An upshot of the proposed method is that there is only a small modiﬁcation necessary to incorporate the

Hamilton-Jacobi hardening and the generalized plasticity.

In this current work – unless otherwise stated, all the necessary information for the return mapping

algorithm about the elastic and plastic and constitutive responses is derived from the trained neural net-

works of the hyperelastic energy functional and yield function respectively using the Keras (Chollet et al.,

2015) and Tensorﬂow (Abadi et al.,2016) libraries. No additional explicit forms of constitutive laws are

deﬁned. Furthermore, the algorithm requires that all the strain and stress variables are in the principal

axes. However, as it was stated in Section 2, in order to facilitate the machine learning algorithms, we

have opted to train with the strain invariants ee

vand ee

s, and the stress invariants rand q. Integrating the

machine learning algorithms with the return mapping algorithm requires a set of coordinate system trans-

formations, which, in turn, require the calculation of the partial derivatives of said transformations to use

in the chain rule formulation. The partial derivative calculation is performed using the Autograd library

(Maclaurin et al.,2015) for automatic differentiation.

Autograd enables the automatic calculation of the partial derivatives of explicitly deﬁned functions.

Thus, we can easily deﬁne the transformation of any input parameter space for our neural networks to the

principal space and readily have the necessary partial derivatives for the chain rule implementation. This

allows to use equivalent expressions of our neural network approximators b

ye(ee

v,ee

s)and b

f(r,q,x)in the

principal space, such that:

b

ye(ee

v,ee

s)= b

ye

principal (ee

1,ee

2,ee

3)and b

f(r,q,x)=b

fprincipal (ee

1,ee

2,ee

3,x). (53)

In this work, integrating the neural network approximators in the return mapping requires the fol-

lowing coordinate system transformations (ee

v,ee

s)←→ (ee

1,ee

2,ee

3),(r,q)←→ (s1,s2,s3),s

1,s

2←→

(s1,s2,s3), and s

1,s

2←→ (r,q). These transformations require a large number of chain rules increas-

ing the possibility of formulation errors, as well as rendering replacing the networks’ input space less

ﬂexible. Thus, we opt for the automation of this process using Autograd.

Due to the fact that the machine learning training has created a mapping that automatically generates

an updated yield function whenever the internal variables xare updated, there is no need to add additional

constraints for the linearized hardening rules. The return mapping algorithm can be described with a

18 Nikolaos N. Vlassis, WaiChing Sun

system of four equations that are solved iteratively. For a local iteration k, we solve for the solution vector

xsuch that Ak·Dx=r⇣xk⌘,xk+1←xk−Dx,k←k+1 until the residual norm ris below a set

error threshold. The residual vector rand the local tangent Akcan be assembled for the calculation of xby

a series of neural network evaluations and automatic differentiations, such that:

r(x)=8

>

>

<

>

>

:

ee

1−eetr

1+Dlb

g1

ee

2−eetr

2+Dlb

g2

ee

3−eetr

3+Dlb

g3

b

fee

1,ee

2,ee

3,x

9

>

>

=

>

>

;

,Ak=r⇣xk⌘=2

6

6

4

c11 c12 c13 b

g1

c21 c22 c23 b

g2

c31 c32 c33 b

g3

∂b

f/∂ee

1∂b

f/∂ee

2∂b

f/∂ee

3∂b

f/∂x

3

7

7

5, and x=8

>

>

<

>

>

:

ee

1

ee

2

ee

3

Dl

9

>

>

=

>

>

;

,

(54)

where ee tr

Iis the trial state principal strain, b

gI=∂b

f/∂sIfor an associative ﬂow rule and:

cIJ =dIJ +Dl∂b

gI

∂ee

J

,I,J=1, 2, 3. (55)

This framework is also readily available to implement in ﬁnite element simulations (Section 5.5). We

can assemble the algorithmic consistent tangent cn+1in principal axes for a global Newton iteration n:

cn+1=

3

Â

A=1

3

Â

B=1

aABm(A)⊗m(B)+1

2

3

Â

A=1

Â

B=A✓sB−sA

eetr

B−eetr

A◆⇣m(AB)⊗m(AB)+m(AB)⊗m(BA)⌘, (56)

where m(AB)=n(A)⊗n(B)t he matrix of elastic moduli in principal axes is given as:

aAB :=∂sA

∂eetr

B

=

3

Â

C=1✓∂2b

ye

∂ee

A∂ee

C◆∂ee

C

∂eetr

B

. (57)

Utilizing Tensorﬂow and Autograd, the return mapping algorithm is fully generalized for any isotropic

hyperelastic and yield function data-driven constitutive laws. It also allows for quick implementation of

any parametrization of the neural network architectures. In future work, the framework can be extended

to accommodate anisotropic responses, as well as architectures with complex internal variables and higher

descriptive power.

4 Alternative comparison models for control experiments

In this section, we will brieﬂy review some simple black-box neural network architectures that can be

employed to predict the path-dependent plasticity behaviors. The predictive capabilities of these behaviors

will be compared to our neural network elastoplasticity framework in Section 5.4. Three different architec-

tures will be designed for comparison with our framework: a multi-step feed-forward network, a recurrent

GRU network, and a 1-D convolutional network. All of these networks demonstrate the ability to capture

path-dependent behavior utilizing different memory mechanisms.

The ﬁrst architecture is a feed-forward network that takes the current strain as well as strain and stress

from the previous time-step to predict the stress at the current time step. The feed-forward architecture

consists of fully-connected Dense layers that have the following formulation in matrix form:

h(l+1)

dense =a⇣h(l)W(l)+b(l)⌘, (58)

where h(l+1)

dense is the output of the Dense layer, h(l)is the output of the previous layer l,ais an activation

function, W(l),b(l)are the trainable weight matrix and bias vector of the layer respectively. It is noted

that the layer formulation itself cannot hold any memory information. The memory of path-dependence

in this architecture is derived from the input of the neural network. The input is the full strain tensor en

at time step nand the full stress tensor sn−1at the previous time step (n−1), both in Voigt notation. The

output prediction of the network is the stress tensor snat time step n. The network attempts to infer the

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 19

Model Description

MstepDense Dense (100 neurons / ReLU) →Dense (100 neurons / ReLU) →Dense (100 neurons / ReLU) →

Output Dense (Linear)

MGRU GRU (32 units / tanh) →GRU (32 units / tanh) →Dense (100 neurons / ReLU) →Dense (100

neurons / ReLU) →Output Dense (Linear)

MConv1D Conv1D (32 ﬁlters / ReLU) →Conv1D (64 ﬁlters / ReLU) →Conv1D (128 ﬁlters / ReLU) →Flatten

→Dense (100 neurons / ReLU) →Dense (100 neurons / ReLU) →Output Dense (Linear)

Table 1: Summary of black-box neural network architectures used for control experiments.

path-dependent behavior by associating the previous stress state with the current one. The architecture

consists of three Dense hidden layers (100 neurons each) with ReLU activation functions and the output

Dense layer with a Linear activation function.

The second architecture is a recurrent network that learns the path-dependent behavior in the form

of time series. The architecture utilizes the Gated Recurrent Unit (GRU) layer formulation, a recurrent

architecture introduced in (Cho et al.,2014) – a variation of the popular Long Short Term Memory (LSTM)

recurrent architecture (Gers et al.,1999). The GRU cell controls memory information by utilizing three

gates (an update gate, a reset gate, and a current memory gate). This architecture takes a time series of

strain as the input with a history length of `. The variable `is a network hyperparameter that is ﬁne-tuned

for optimal results and it signiﬁes the amount of information from the previous time steps that are taken

into consideration to make a prediction for the current time step. Thus, a GRU network sample for the time

step nhas input the series of strain tensors [en−`, ..., en−1,en]and output the stress tensor for the current

step sn. The architecture used in this work consists of two GRU hidden layers (32 recurrent units each)

with a ReLU activation function, followed by two Dense layers (100 neurons) with ReLU activations and a

Dense output layer with a linear activation function. The history variable was set to `=20.

The last architecture employed in the control experiments learns the path-dependent information from

time series by extracting features through a 1-D convolution ﬁlter. The convolution ﬁlter extracts higher-

order features from time series of ﬁxed length and has be used for time-series predictions (LeCun et al.,

1995) and audio processing (Oord et al.,2016). The input of this architecture is the series of strain tensors

[en−`, ..., en−1,en]and output the stress tensor for the current step in Voigt notation sn. The 1D convolu-

tional ﬁlter processes segments of the path-dependent time series data in a rolling window manner and

has length equal to `. The architecture consists of three 1D convolution networks (32, 64, and 128 ﬁlters

respectively) with ReLU activation functions. The output features of the last convolutional layer are ﬂat-

tened and then fed into two consecutive Dense layers (100 neurons) with ReLU activations, followed by a

Dense output layer with a Linear activation function.

All the architectures were trained for 500 epochs with the Nadam optimizer with a batch size of 64. They

were trained on different data sets to illustrate the comparisons with our elastoplasticity framework – the

data sets are described in the context of the numerical experiments in Section 5.4. The hyperparameters

of these architectures were ﬁne-tuned through trial and error in an effort to provide optimal results and

a fair comparison with our elastoplasticity framework to the best of our knowledge. The three black-box

architectures are summarized in Table 1.

5 Numerical Experiments

In this section, we report the results of the numerical experiments that we conducted to verify the

implementation and evaluate the predictive capacity of the presented elastoplasticity ANN framework.

For brevity, some background materials and simple veriﬁcation exercises are placed in the Appendices.

In Section 5.1, we demonstrate how the training of the hyperelastic energy functional approximator can

beneﬁt by the use of higher-order activation functions and higher-order Sobolev constraints. In Section 5.2,

we demonstrate the training of the yield function level set neural networks and their approximation of the

evolving yield functions. In Section 5.4, we compare the three recurrent architectures of Section 4to our

20 Nikolaos N. Vlassis, WaiChing Sun

elastoplasticity framework as surrogate models for a polycrystal microstructure. Finally, in Section 5.5, we

demonstrate the ability of our framework to integrate into a ﬁnite element simulation by fully replacing

the elastic and plastic constitutive models with their data-driven counterparts.

5.1 Benchmark study 1: Higher-order Sobolev training of hyperelastic energy functional

In this numerical experiment, we compare the learning capacity of feed-forward neural networks

trained on hyperelastic energy functional data. The training loss curves and the predictive capacity of

the neural networks with different conﬁgurations of Multiply layers and different order norms as loss

functions are compared. The generation of the data sets is discussed in Appendix A.

(a)

(b)

Fig. 7: Training loss comparison of feed-forward architectures with a progressively larger number of Multi-

ply layers with an H2training objective for (a) linear elasticity and (b) Modiﬁed Cam-Clay hyperelastic law

(Borja et al.,2001). As a higher degree of continuity is introduced in the network architecture, the stiffness

accuracy prediction increases - more control is allowed for the H2terms of the training objective.

The neural network models in this work are trained on two energy functionals data sets for linear elas-

ticity and non-linear elasticity (Eq. (59)) of 2500 sample points each. The points are sampled in a uniform

grid of the strain invariant space (ee

v,ee

s). In the ﬁrst part of this numerical experiment, we investigate the

capability of the different feed-forward architectures to fulﬁll the higher-order Sobolev constraints. The

layers’ kernel weight matrix was initialized with a Glorot uniform distribution and the bias vector with a

zero distribution – the repeatability of the training process is demonstrated in Fig. 28 of Appendix D. Other

than the number of intermediate Multiply layers, all the other hyperparameters are identical among all the

architectures tested. The training objective is formulated according to Eq (3). All the models were trained

for 1000 epochs with a batch size of 32 using the Nadam optimizer (Dozat,2016), set with default values

in the KERAS library.

The performance of different architectures (see Fig. 1) is compared and the results are shown in Fig. 7.

Architecture dmmdmd consistently exhibits the best learning capacity and achieves the lowest loss in

energy, stress and tangential stiffness calculation. As expected, Architecture ddd fails in predicting the

tangential stiffness due to the insufﬁcient degree of continuity of the activation functions.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 21

(a)

(b)

Fig. 8: Training loss comparison for L2,H1, and H2training objectives of an architecture with three multiply

layers (dmmdmd) for (a) linear elasticity and (b) Modiﬁed Cam-Clay hyperelastic law (Borja et al.,2001).

The H2training objective procures more accurate results than the L2and H1objectives for all of the energy,

stress, and stiffness ﬁelds.

In the second numerical experiment, we investigate the predictive accuracy of the dmmdmd architec-

ture (as shown in Fig. 1) trained via L2,H1, and H2norms. In all three cases, the neural network archi-

tecture and the training hyperparameters are identical to the ones used in the ﬁrst numerical experiment.

The results of these three training experiments can be seen in Fig. 8. The predictive capability of the model

increases when higher-order Sobolev training is utilized with the best overall scores procured for H2norm-

based training. Czarnecki et al. (2017) had observed that constraining the H1terms in the loss function

improves the function value prediction accuracy. Our results indicate that by constraining the H2terms,

we are improving the prediction of the function values along with the ﬁrst-order and the second-order

derivatives of the function.

5.2 Benchmark Study 2: Training of yield function as a level set

In this numerical experiment, we demonstrate how to train the neural network to generate a yield func-

tion whose evolution is driven by signed distance function data interpreted from experiments. The yield

function neural networks have a feed-forward architecture of a hidden Dense layer (100 neurons / ReLU),

followed by two Multiply layers, then another hidden Dense layer (100 neurons / ReLU) and an output

Dense layer (Linear). The layers’ kernel weight matrix was initialized with a Glorot uniform distribution

and the bias vector with a zero distribution – the repeatability of the training process is demonstrated in

Fig. 28 of Appendix D. All the models were trained for 2000 epochs with a batch size of 128 using the

Nadam optimizer, set with default values. The neural networks were trained on a data set of J2 plasticity

as well as data sets for 4 different polycrystal RVEs as described in the Appendix B. The training loss curves

for this experiment with an L2training objective are shown in Fig 10.

The ability to capture a yield surface directly from the data becomes crucial in materials such as the

polycrystal microstructures – where complex constitutive responses may manifest from spatial hetero-

geneity and grain boundary interactions. In Fig. 11, it is shown that a polycrystal RVE of the same size

22 Nikolaos N. Vlassis, WaiChing Sun

0.03 0.02 0.01 0.00

e

v

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

e

0.03 0.02 0.01 0.00

e

v

0.5

0.4

0.3

0.2

0.1

p

0.03 0.02 0.01 0.00

e

v

0

5

10

15

20

25

30

D11

0.04 0.02 0.00 0.02 0.04

e

s

0.00175

0.00200

0.00225

0.00250

0.00275

0.00300

0.00325

0.00350

0.00375

e

0.04 0.02 0.00 0.02 0.04

e

s

0.08

0.06

0.04

0.02

0.00

0.02

0.04

0.06

0.08

q

0.04 0.02 0.00 0.02 0.04

e

s

2

0

2

4

6

D22

Fig. 9: Comparison of the predictions of an L2trained ddd network with and an H2trained dmmdmd

network for the energy functional, stress, and stiffness measures of the Modiﬁed Cam-Clay hyperelastic

law (Borja et al.,2001). The ddd architecture (piece-wise linear activation functions) can only predict local

second-order derivatives (D11,D22 ) to be equal to 0. The dmmdmd architecture, modiﬁed with Multiply

layers, can capture these higher-order derivatives. The stress measure is in MPa.

Fig. 10: Training loss curves for the J2 plasticity and 4 different polycrystal RVEs’ yield function level sets.

with different crystal orientations can have distinctive initial yield surfaces. Anticipating the geometry of

the yield surface in the stress space and then handcrafting it with with mathematical expressions would

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 23

be a great undertaking and possibly futile – a change in the crystal properties would require deducing a

geometric shape design from scratch.

This numerical test indicates that the proposed framework may automate the discovery of new yield

surfaces while bypassing the time-consuming hand-crafting process. This framework may also be extended

to capture the plastic behavior of anisotropic materials if the six-dimensional stress space is used. This will

be considered in future work by expanding the stress invariant input space to include orientations and

possibly more descriptive plastic internal variables that are derived from the topology of the microstruc-

tures.

2

31

2

3 1

2

31

RVE 2 RVE 3 RVE 4

neural network prediction

sample points

Fig. 11: Yield surface neural network predictions for three polycrystal RVEs with different crystal orienta-

tions.

5.2.1 Smoothed approximation of the non-smooth yield surfaces

Another useful feature of the proposed machine learning approach is that the Sobolev training can be

used to a generate smoothed approximation of a multi-yield-surface system (e.g. Mohr-Coulomb, Tresca,

crystal plasticity with multiple slip systems) on the p-plane. Classical non-smooth and multi-yield surface

models often lead to sharp tips and corners of the yield surface that makes the stress gradient of the yield

function bifurcated. This is not only an issue for the stability but also requires specialized algorithmic

designs for the return mapping algorithm to function (cf. de Souza Neto et al. (2011)). As a result, there have

been decades of efforts to hand-derive implicit functions that are smoothed approximations of well-known

multi-yield surface models Matsuoka and Nakai (1985); Abbo and Sloan (1995)). As shown in Fig. 11, the

proposed Sobolev framework may automate this time-consuming process by simply using a combination

of data points, activation functions, and loss functions to regularize the non-smooth yield surface. The

resultant smoothed yield surface does not only avoid the bifurcated stress gradient at the corners but also

enables us to use the standard return mapping algorithm for implicit stress integrations without requiring

any additional numerical treatment to handle the tips and corners of the yield surface.

Furthermore, the ML-derived evolution of the yield function is also capable of replicating complex

hardening mechanisms not known a priori. In Fig. 12, we demonstrate how the neural network can predict

a hardening mechanism that has not yet been discovered in the literature. In particular, this yield surface

does not only change in size upon yielding but also deforms on the p-plane. Anticipating this mechanism

and then deducing the corresponding mathematical expression to handcraft a hardening law is not trivial.

24 Nikolaos N. Vlassis, WaiChing Sun

level set isocontour prediction

yield surface prediction / 1(0)

Fig. 12: Predicted signed distance function isocontours for three evolving yield surfaces of RVE 1 of a

specimen undergoing increasing axial compression (left to right).

Our framework can interpret experimental data and deduce the optimal shapes and forms of a yield sur-

face that evolves with the strain history without the aforementioned burdens. This is not only important

for making more accurate predictions but also enables us to derive more precise plasticity models tailored

to speciﬁc specimens or data sets beyond parameter calibrations.

5.3 Benchmark Study 3: Yield function training with higher-order constraints

In this numerical experiment, we demonstrate how to use the proposed training framework to enforce

thermodynamic constraints and other desirable properties of the yield function via geometrical interpre-

tation in the stress space. Our experimental data obtained from direct numerical simulations have been

pre-processed into signed distance functions that evolve according to the accumulated plastic strain. The

constraints we are interested to enforce here are the unit gradient |∇f|=1 and the non-negativity of the

plastic dissipation.

5.3.1 Unit gradient constraint

The unit gradient constraint has been directly applied as an additional term in the loss function and

the results of the training for 4 different RVEs with and without unit gradient |∇f|=1 constraints are

compared in Fig. 13. These two sets of yield function training show that the additional unit gradient con-

straint does not affect the learning capacity and the accuracy of the resultant yield function. Meanwhile,

the training that explicitly enforces the unit gradient property is also at least 3 orders more successful at

fulﬁlling the unit gradient constraint than the counterpart that does not do so.

5.3.2 Convexity of the yield function

To enforce the non-negative plastic dissipation, we enforce the convexity of the yield function by en-

forcing an inequality constraint via Equation (46). The additional term in the loss function only activates

when the thermodynamic inequality is violated. By increasing the value of the loss function, it penalizes

predictions that violate the convexity conditions during training. During the training phase of the numeri-

cal experiments presented in this paper, the penalty term did not activate. Nevertheless, the penalty term in

the loss function is still employed as a safeguard to prevent the violation of the thermodynamic constraints.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 25

Fig. 13: Training loss (LEFT) and re-initialization condition loss (RIGHT) comparison for four polycrystal

RVEs’ yield functions with and without enforcing the Eikonal equation unit gradient constraint.

020 40 60 80 100 120 140

Sample Points

50

0

50

100

150

200

3

A

A

f

A

Convexity Check

Fig. 14: Convexity check for randomly sampled stress points from the polycrystal RVE dataset.

We expect that this safeguard will be helpful in future work where we extend our current framework

to experimental data or to anisotropic materials of which the visual inspection of the convexity in the

principal stress or p-plane is no longer feasible. A veriﬁcation of the convexity is performed and the results

are shown in Fig 14 where material states were randomly sampled from the polycrystal RVE database to

test whether the inequality (46) is violated.

5.3.3 Non-negative plastic dissipation for non-associative plasticity

In the case where the plastic ﬂow is non-associative, we enforce the additional rule expressed in Eq. (52)

to ensure the non-negativity of the plastic dissipation for isotropic materials. The remaining training pa-

rameters of the network are identical to the ones used for the yield function learning in Section 5.2.To

verify that convexity is preserved, we perform a random check by sampling stress points on the p-plane

and compute the plastic dissipation (see Fig. 16 (a)). In this case, having the safeguard incorporated into

the loss function ensures that plastic dissipation is either zero or positive. Furthermore, there are noticeable

differences between the optimal plastic ﬂow and the stress gradient of the yield function, which indicate

the need to introduce non-associative plastic ﬂow in this prediction task (see Fig. 16 (b)).

26 Nikolaos N. Vlassis, WaiChing Sun

2

31

Fig. 15: The ML-predicted plastic ﬂow of the polycrystal RVE for increasing accumulated plastic strain (ep

equal to 0.01, 0.03, and 0.08).

(a) (b)

Fig. 16: (a) The plastic ﬂow rule is checked for non-negative dissipation (the points correspond to the

different accumulated plastic strain levels of Fig. 15). (b) L2norm comparison for the predicted plastic ﬂow

direction between yield function neural network ( b

fA) and plastic ﬂow neural network (b

gA).

5.4 Application 1: Surrogate model comparisons for polycrystal RVEs

In this example, we compare the performance of the elastoplastic model with Hamilton-Jacobi harden-

ing (introduced in Section 2) with the trained black-box models obtained from the other commonly used

recurrent neural network architectures introduced in Section 4. Our goal is to create a model to predict

the upscaled elasto-plastic responses of a polycrystal RVE undertaking unseen loading paths. The data set

generation for the neural network approximator b

fis described in Appendix B.

To conduct a fair and systematic comparison, we have trained, tested, and compared with the recurrent

models with different amounts of data generated from loading paths of various types of complexity. It

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 27

A

B

DE

yield surface (A)

yield surface (B)

yield surface (D)

C

F

G

I

J

K

L

yield surface (F)

yield surface (G, H)

yield surface (I, J)

yield surface (K, L)

H

(a) (b)

Fig. 17: Stress path in the p-plane for (a) a loading-unloading pattern and (b) a cyclic loading path. The

yield surface neural network predicts the consecutive yield surfaces for different levels of hardening. (a)

The points A, B, C, D, and E correspond to the strain-stress curve of Fig. 19. (b) The points F, G, H, I, J , K,

and L correspond to the strain-stress curve of Fig. 20.

is noted that the database used for the training of ˆ

fwill not be extended further than the 140 cases of

monolithic loading cases described in Appendix B.

5.4.1 Predicting cyclic responses from monotonic data

Initially, we train all the recurrent architectures with the 140 cases of monolithic loading paths (with

200 to 400 deformation states per case), sampled radially from the p-plane. The same set of data is also

used to train the approximator ˆ

f. All models are expected to perform adequately well in blind predictions

for monolithic testing cases as they have been trained for these simple patterns as shown in Fig. 18 (a).

However, the black-box recurrent networks fail to recover even a single unloading and reloading path,

as seen in Fig. 18 (b) whereas our proposed interpretable model is able to recover loading and unloading

patterns quite well even though it was only trained on monolithic loading paths. This success is due to the

existence of a yield surface that enables our proposed model to detect unloading and, hence, trigger the

right elastic unloading responses, even in the absence of unloading data. While the lack of unloading data

may still affect the accuracy of the ML-generated hardening mechanism, the prediction of the proposed

ML plasticity model is still more robust than the black-box counterpart.

5.4.2 Predicting cyclic responses from cyclic data

In the second numerical experiment, we increase the complexity of the database that the recurrent neu-

ral networks are trained on. Following the same loading path angles as a basis on the p-plane, we generate

cases that now include complex unloading and reloading paths. We, thus, allow the recurrent architec-

tures to be exposed to previously missing elastic unloading paths. We randomly assign the unloading and

loading paths for every loading direction. At every direction, we randomly assign from 1 to 3 unloading

and reloading paths with the unloading target strain also randomly chosen each time. Using this method,

we double the number of sampling points of the initial cases by adding random unloading and reloading

28 Nikolaos N. Vlassis, WaiChing Sun

(a)

(b)

Fig. 18: Comparison of black-box neural network architectures trained on monolithic data with our

Hamilton-Jacobi hardening elastoplastic framework (introduced in Section 2). The black-box models can

capture the monolithic loading path (a) but cannot capture any unloading paths (b). Our framework can

capture both even though it has only seen monolithic data. The stress measure is in kPa.

patterns to retrain the recurrent architectures. The performances of all these models are again compared.

The results for three testing cases with cyclic loadings are shown in Fig. 19.

In the third comparison experiment, we examine the accuracy and robustness of the predictions of con-

stitutive responses for the polycrystal specimen subjected to cyclic loading and unloading paths that span

both tensile and compressive directions. The results for the cyclic testing can be seen in Fig. 20. As expected,

the black-box models – even with an extended training data set – fail to capture the cyclic behaviors. The

proposed ML elasto-plasticity model – even when only trained with monolithic data, exhibits very good

accuracy and robustness of predictions on the cyclic behaviors for the polycrystal plasticity.

5.5 Application 2: Finite element simulations with machine learning-derived polycrystal plasticity models

The return mapping algorithm of our elastoplastic neural network framework, described in Section 3,

is implemented in a series of benchmark ﬁnite element simulations. The goal of these computational exam-

ples is to demonstrate the framework’s ability to be integrated into multi-scale simulations. By predicting

the homogenized elastoplastic response of the microstructure, we can enable ofﬂine hierarchical multi-scale

predictions that are much faster than an FE2approach without compromising the accuracy and robustness.

We perform the ﬁnite element quasi-static simulation of macroscopic monotonic uniaxial displacement

of a bar depicted in Fig. 21. The domain is symmetric along the horizontal and vertical axes and the elas-

ticity and plasticity model used are isotropic, thus, we are modeling one quarter of the domain to predict

the symmetric behavior. The domain is meshed with 3800 triangular elements with an average side length

of 6.75 ×10−4meters. The displacement uis applied at the boundaries as shown in Fig. 21 in increments of

Du=5×10−5meters. The microscopic elastoplastic behavior of every material point in the mesh is pre-

dicted by an elastic energy functional neural network and a yield function signed distance function neural

network, integrated by the return mapping algorithm of Section 3.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 29

(a)

(b)

(c)

175

33

0.00 0.05 0.10

0

25

50

75

100

125

150

0.15

33

0.00 0.05 0.10 0.15

33

0

25

50

75

100

125

150

175

33

Fig. 19: Comparison of black-box neural network architectures trained on random loading-unloading data

with our Hamilton-Jacobi hardening elastoplastic framework. Three different cases of loading-unloading

are demonstrated (a, b, and c). The black-box models can capture loading-unloading behaviors better than

the monolithic data trained ones (Fig. 18) but still may show difﬁculty capturing some unseen unloading

paths. Our framework appears to be more robust in loading-unloading path predictions – even though it

is only trained on monolithic data. The stress measure is in kPa.

As the ﬁrst numerical veriﬁcation exercise, we combine a quadratic energy functional of linear elasticity

and a J2 plasticity yield function with isotropic hardening. The neural networks and their training for the

elastic response has been described in Section 5.1 and, for the plastic response, in Sections 5.2 and D. The

goal displacement for the uniaxial loading simulation is ugoal =5.5 ×10−3meters. The results at the goal

displacement for the benchmark solution and our elastoplastic Hamilton-Jacobi hardening framework are

demonstrated in Fig. 22 and appear to be in a close agreement.

In the second numerical experiment, we simulate the behavior under uniaxial loading of the domain in

which the material points represent a polycrystalline microstructure. The elastoplastic framework in this

simulation consists of a quadratic hyperelastic energy functional neural network and a polycrystal yield

function the training procedures of which are described in Sections 5.1 and 5.2 respectively. Both networks

were trained on FFT simulation data as described in Appendix Bto predict the homogenized elastoplastic

behavior of the polycrystal.

Previously, the multiscale polycrystal constitutive behavior has been captured through a coupling of

the FFT and FEM method (e.g. Kochmann et al. (2016,2018)). However, the efﬁciency of these methods

30 Nikolaos N. Vlassis, WaiChing Sun

0.10 0.05 0.00 0.05 0.10

11

200

150

100

50

0

50

100

150

11

Fig. 20: Comparison of black-box neural network architectures trained on random loading-unloading data

with our Hamilton-Jacobi hardening elastoplastic framework for a cyclic loading path. The stress measure

is in kPa.

36.00

108.00

36.0032.00

10.00 6.00

u u

Fig. 21: Macroscopic structure and boundary conditions used in the ﬁnite element simulations. The domain

is symmetric along the horizontal and vertical axes so only one quarter of the domain is modeled. The units

are in mm.

Fig. 22: Von Mises stress (TOP) and accumulated plastic strain (BOTTOM) for the benchmark J2 plasticity

(LEFT) simulation and neural network J2 yield function (RIGHT) FEM simulations. The stress measure is

in kPa.

depends on the heterogeneity of the polycrystal as it can affect the computational cost of the simulations

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 31

Fig. 23: Von Mises stress (TOP) and accumulated plastic strain (BOTTOM) for the neural network polycrys-

tal yield function FEM simulations. The stress measure is in kPa.

2

31

point A

point B

Fig. 24: Von Mises stress curves (LEFT) and stress paths on the p-plane (RIGHT) for Points A and B of the

domain of Fig. 23. The stress measure is in kPa.

for a large number of crystals and the stability when there are sharp material property differences. In the

current work, there is no need for online FFT simulations to be run in parallel with the FEM simulations.

The neural network training database for the elasticity and the plasticity are built separately ofﬂine for

a discrete number of FFT simulations and the trained networks will be interpolating the behaviors and

making blind predictions during the FEM simulation. The results of the simulation for the Von Mises

stress and the accumulated plastic strain for the simulation at the displacement goal of ugoal =6.5 ×10−3

meters is demonstrated in Fig. 23. The stress curves and stress paths on the p-plane for two points of the

domain are also demonstrated in Fig. 24.

32 Nikolaos N. Vlassis, WaiChing Sun

6 Conclusions

The history of plasticity theory is inﬂuenced by the geometrical interpretations of mechanics concepts

in different parametric spaces (DE Saint Venant,1870;Lode,1926;Hill,1998;Rice,1971). Forming a vector

space that uses different invariants or measures of stress as orthogonal bases had helped us understand

yielding and the subsequent hardening and softening through easier visualization. However, these new

mechanisms often take decades to be discovered and adopted by the mechanics community. In this work,

our contributions are two-fold. First, we leverage the geometrical interpretation of plasticity theory to es-

tablish a connection between the elastoplasticity and the level set theories. Second, we introduce a new

variety of deep machine learning that is designed to train functionals with sufﬁcient smoothness. By us-

ing higher-order training to regularize the continuity and smoothness of the energy functional, the yield

function, the ﬂow rules and the hardening mechanisms, we create a framework that retains the simplicity

afforded by the geometrical interpretation of the models without limiting our choices of elasticity, yield

function and hardening mechanisms. Thermodynamic constraints can be easily checked and introduced,

as the machine learning generated models are now geometrically interpretable. Finally, the most signiﬁ-

cant part of this research is that it provides a generalized framework where the yield function may form

in any arbitrary shape and evolve in any generic way that optimizes the quality of the predictions. As

shown in the paper, the level set framework may manifest many classical plasticity models when given

the corresponding data and it may also introduce new yield surfaces and hardening laws that are difﬁcult

to hand-craft. Comparing with the predictions obtained via the black-box neural network the proposed

machine learning framework may yield more accurate, robust and interpretable predictions.

A Appendix: Data generation for the hyperelasticity benchmark

In this work, the numerical experiments (Section 5) are performed on synthetic data sets generated

for two small strain hyperelastic laws. One of them is isotropic linear elasticity. The second is a small-

strain hyperelastic law designed for the Modiﬁed Cam-Clay plasticity model (Roscoe and Burland,1968;

Houlsby,1985;Borja et al.,2001). The hyperelastic energy functional allows full coupling between the

elastic volumetric and deviatoric responses and is described as:

y(ee

v,ee

s)=−p0crexp ✓ev0−ee

v

x◆−3

2cµp0exp ✓ev0−ee

v

x◆(ee

s)2, (59)

where ev0is the initial volumetric strain, p0is the initial mean pressure when ev=ev0,x>0 is the elastic

compressibility index, and cµ>0 is a constant. The hyperelastic energy functional is designed to describe

an elastic compression law where the equivalent elastic bulk modulus and the equivalent shear modulus

vary linearly with −p, while the mean pressure pvaries exponentially with the change of the volumetric

strain Dev=ev0−ev. The speciﬁcs and the utility of this hyperelastic law is outside the scope of this current

work and will be omitted. The numerical parameters of this model where chosen as ev0=0, p0=−100

KPa, cµ=5.4, and x=0.018. Taking the partial derivatives of the energy functional with respect to the

strain invariants, the stress invariants are derived as:

p=∂y

∂ee

v

=p0✓1+3cµ

2x(ee

s)2◆exp ✓ev0−ee

v

x◆, (60)

q=∂y

∂ee

s

=−3cµp0exp ✓ev0−ee

v

x◆ee

s. (61)

The components of the symmetric stiffness Hessian matrix Deare derived by taking the second-order

partial derivative of the energy functional with respect to the two strain invariants:

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 33

De

11 =∂2y

∂ee2

v

=−p0

cr✓1+3cµ

2cr

(ee

s)2◆exp ✓ev0−ee

v

cr◆,

De

22 =∂2y

∂ee2

s

=−3cµp0exp ✓ev0−ee

v

cr◆,

De

12 =De

21 =∂2y

∂ee

v∂ee

s

=3p0cµee

s

crexp ✓ev0−ee

v

cr◆.

(62)

B Appendix: Data generation for polycrystal yield function

Here we provide a brief account on the direct numerical simulations that generates the data set for

the ML-generated plasticity model in Section 5. The numerical specimen is a polycrystal assembly con-

sisting of 49 face centered cubic crystal grains. The crystal orientations are randomly generated using the

open source software MTEX (Bachmann et al.,2010). The crystal orientation distribution is demonstrated

in Fig. 25 along with the crystal volume distribution. Directed numerical simulations are performed on

this numerical specimen by solving the Lippman-Schwinger equation using the FFT spectral method with

periodic boundary condition (Ma and Sun,2019,2020). The resultant stress ﬁeld and plastic deformation

are homogenized and these homogenized responses constitute the material database used for training of

the machine learning plasticity model.

The elasticity model of the polycrystals is linear elasticity with a Young’s Modulus of E=2.0799MPa

and a Poisson ratio of n=0.3. The material’s plastic behavior was calculated using the ultimate algorithm

for crystal plasticity (Borja and Wren,1993). The model has 12 linearly independent slip systems with a

yield stress of 100kPa and a hardening modulus of 100kPa. An FFT elastoplastic simulation is performed

radially for each of 140 different Lode’s angles spanning the p-plane.

(a) (b)

Fig. 25: (a) Crystal orientations and (b) crystal volume distribution of the polycrystal RVE used for the

elastoplasticity database generation.

Each data point generated by the FFT simulations is stored in a cylindrical coordinate system with

positions speciﬁed by a radius r, an angle q, and an accumulated plastic strain ¯

ep. For every generated

sample point (ro,qo)on the p-plane, we construct 14 signed distance function training points using a

signed distance function, distributed uniformly on the radial direction with a distance range of ±rofrom

point (ro,qo). The size of the constructed signed distance function corresponds to parameters Llevels =15

and z=2 in Algorithm 2.

After generating the points of the signed distance function, every point has a corresponding output

value equal to the signed distance function f(ro,qo,¯

ep,o)for that point. In this way, all the signed distance

function points on an isocontour will have the same output value. This proven to be an obstacle in the

34 Nikolaos N. Vlassis, WaiChing Sun

back-propagation during the neural network training – many input combinations correspond to the same

output value. To increase the variation of the output values of each sample during training, we introduce

a helper transformation function z(r,q)of the output values in the data pre-processing step. Thus, during

training, every signed distance function input sample point (ro,qo,¯

ep,o)is mapped to an output value:

fz(ro,qo,¯

ep,o)=f(ro,qo,¯

ep,o)+z(ro,qo). (63)

During the prediction step, the true value of the signed distance function can recovered by subtracting

the know value of z(ro,qo)from the prediction output. The helper function in this work was chosen as

z(r,q)=2¯

rcos(q/3), where ¯

ris the mean value of the radii in the yield function data set.

C Appendix: Veriﬁcation exercise with custom hardening

The plasticity components of the neural network elastoplasticity framework can further be decom-

posed by separating the initial yield surface and its evolution – the hardening law. We are introducing a

method to apply custom hardening laws to the neural network approximated yield functions. The initial

yield surface is controlled by a neural network of the form ˜

f(r,q)with only the Lode’s coordinates as in-

puts. The hardening is handled by a separate hardening law. In plasticity literature, hardening is usually

implemented by transforming the yield surface – changing the yield stress value. However, in the case of

our neural network yield function approximation, the yield stress in not explicitly deﬁned and cannot be

immediately modiﬁed. To overcome this obstacle, we deﬁne the desired hardening laws to the neural net-

work input instead of the assumed yield stress. Speciﬁcally, we deﬁne a hardening law as transformation

Lof the original Lode’s coordinates rand q, such that:

L(r,q,x)=Lr(r,q,x),Lq(r,q,x)=rL,qL, (64)

where Lr(r,q,x)and Lq(r,q,x)are the parametric equations that transform rand qinto the input variables

rL,qLrespectively after hardening, and xis an internal hardening variable.

Common literature hardening laws can be translated into input transformations of this type and ap-

plied to the neural network yield functions through geometric interpretation. For example, in the simple

case of isotropic hardening of the Von Mises plasticity model, hardening in the p-plane can be interpreted

as the dilation of the circular yield surface – i.e. increase of the radius rywhere there is yielding. In the case

of a neural network approximating the Von Mises yield function, the value of the current rywould not be

readily available to modify. For that reason, instead of increasing the yield radius ry, we opt for decreasing

the input radius rof the neural network an equivalent amount. The transformed radius ¯

ris deﬁned as:

¯

r=¯

Lr(r,¯

ep)=r−r2

3H¯

ep, (65)

where His the material’s identiﬁed hardening modulus. Any custom hardening model can be applied with

the right conversion to an input transformation. This enables for even more ﬂexibility when assembling

the theoretical components of the elastoplastic framework system. The hyperelastic energy functional, the

initial yield surface, and the hardening law are independent of each other and can separately replaced.

Furthermore, being able to assign a hardening law as a separate process in the data-driven yield function

could prove valuable when only information of the initial yield surface is available in the data.

A few different cases of custom hardening transformations are demonstrated in Fig. 26. The initial

yield surfaces are predicted from a neural network approximator – all the points approximated have an

accumulated plastic strain ¯

ep=0. Fig. 26 (a) and (c) showcase a simple isotropic hardening cases emulated

by reducing the neural network input radius runiformly for all the Lode’s angles qon the p-plane. The

hardening mechanism can be geometrically interpreted as a dilation of the initial yield surface. Fig. 26 (b)

and (d) showcase two modes of hardening acting simultaneously – a dilation and an elongation towards a

preferred direction of the initial yield surface.

In the current formulation, the neural network elastoplastic framework can consist of any isotropic hy-

perelastic energy functional and isotropic yield function. To demonstrate the framework’s capability to cap-

ture non-linear behaviors, we have implemented a ﬁctitious highly non-linear and a ﬁctitious non-linear

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 35

(a) ¯

r=r−q2

3H¯

ep(b) ¯

r=r−q2

3H¯

ep(1+cos2(q−p

6))

(c) ¯

r=r−q2

3H¯

ep(d) ¯

r=r−q2

3H¯

ep(1+cos2(q−p

6))

Fig. 26: Custom hardening transformations of initial neural network yield surfaces. Transformations (a)

and (c) emulate simple isotropic hardening (dilation of yield surface). Transformations (b) and (d) emulate

a mixed mode hardening mechanism (dilation and change of shape). The transformations are implemented

by modifying the neural network input radius r.

custom hardening law. The energy functional neural network is trained on data set based a modiﬁcation

on the linear elastic energy functional with the shear part replaced with a highly non-linear term:

˘

y(ee

v,ee

s)=1

2Kee2

v+3

2Gee4

s. (66)

The non-linear hardening law is implemented by applying a transformation on the Lode’s radius input

of the Von Mises yield function neural network. The hardening law ˘

Lprovides a transformed radius:

˘

r=˘

r(r,¯

ep)=r·(1−¯

e2

p)6. (67)

The prediction of the framework is demonstrated in Fig. 27. The framework provides great ﬂexibility

to decompose the material behavior for the elasticity, yield surface and hardening law – all of which can

be individually replace. This also allows for a combination of data-driven and handcrafted laws that can

be tuned to closely replicate observed material behaviors.

D Appendix: Veriﬁcation exercise on learning classical J2 plasticity with isotropic hardening

As a part of the veriﬁcation exercise, we also test whether the proposed framework is able to deduce an

elasto-plasticity model with linear elasticity and Von Mises plasticity with isotropic hardening. The elasto-

36 Nikolaos N. Vlassis, WaiChing Sun

Fig. 27: Monolithic loading (LEFT), loading-unloading (MIDDLE), and cyclic loading (RIGHT) path pre-

dictions for a ﬁctitious non-linear energy functional and hardening elastoplasticity neural network frame-

work. The initial yield surface is predicted by the Von Mises yield surface neural network. The ANN

elastoplastic framework can handle highly non-linear hyperelastic energy functionals and custom harden-

ing laws. The stress measure is in kPa.

plastic ANN framework consists of a neural network approximating the linear elastic energy functional

and a yield function neural network that approximates a Von Mises yield surface. The hardening law of

the system is implemented in two different ways to demonstrate the ﬂexibility of the framework – similar

to Section 5or following the custom hardening method of Section C(Eq. (65)). The material has a Young’s

Modulus of E=2.0799MPa, a Poisson ratio of n=0.3, an initial yield stress of 100kPa, and a hardening

modulus of H=0.1E.

To demonstrate the repeatability of the training process, we perform the training of the neural networks

that represent a linear elasticity energy functional and a J2 yield function with 5 different random seeds.

The training loss functions for these training experiments is demonstrated in Fig. 28. The neural network

initialization appeared to have minimal effects on the training process results.

The comparison of the neural network elastoplastic framework with three benchmark simulations is

shown in Fig. 29. The framework is tested against a monolithic loading path, a loading path with mul-

tiple unloading patterns, and a cyclic loading path. The framework can adequately capture loading and

unloading patterns it has not been explicitly trained on.

5 Acknowledgments

The authors would like to thank Dr. Ran Ma for providing the implementation of the polycrystal mi-

crostructure generation, the FFT solver, and the information for Figure 25. The authors are supported

by by the NSF CAREER grant from Mechanics of Materials and Structures program at National Science

Foundation under grant contracts CMMI-1846875 and OAC-1940203, the Dynamic Materials and Interac-

tions Program from the Air Force Ofﬁce of Scientiﬁc Research under grant contracts FA9550-17-1-0169 and

FA9550-19-1-0318. These supports are gratefully acknowledged. The views and conclusions contained in

this document are those of the authors, and should not be interpreted as representing the ofﬁcial policies,

either expressed or implied, of the sponsors, including the Army Research Laboratory or the U.S. Govern-

ment. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes

notwithstanding any copyright notation herein.

6 Data availability

The data that support the ﬁndings of this study are available from the corresponding author upon

request.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 37

(a) (b)

(c) (d)

Fig. 28: Training loss comparison (a) the energy, (b) the stress, (c) the stiffness of an H2training objective

of a dmmdmd architecture for linear elasticity, and (d) the yield function of a J2 plasticity neural network

with 5 different random seeds.

References

Mart´

ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin,

Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorﬂow: A system for large-scale machine

learning. In 12th {USENIX}symposium on operating systems design and implementation ({OSDI}16), pages

265–283, 2016.

AJ Abbo and SW Sloan. A smooth hyperbolic approximation to the mohr-coulomb yield criterion. Com-

puters & structures, 54(3):427–441, 1995.

JH Argyris, G Faust, J Szimmat, EP Warnke, and KJ Willam. Recent developments in the ﬁnite element

analysis of prestressed concrete reactor vessels. Nuclear Engineering and Design, 28(1):42–75, 1974.

F. Bachmann, Ralf Hielscher, and Helmut Schaeben. Texture Analysis with MTEX – Free and Open Source

Software Toolbox. 2010. doi: 10.4028/www.scientiﬁc.net/SSP.160.63.

James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David D Cox. Hyperopt: a python library

for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1):014008,

2015.

MA Bessa, R Bostanabad, Zeliang Liu, A Hu, Daniel W Apley, C Brinson, Wei Chen, and Wing Kam Liu. A

framework for data-driven analysis of materials under uncertainty: Countering the curse of dimension-

ality. Computer Methods in Applied Mechanics and Engineering, 320:633–667, 2017.

Christopher M Bishop et al. Neural networks for pattern recognition. Oxford university press, 1995.

Ronaldo I Borja. Plasticity. Modeling and Computation. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.

ISBN 978-3-642-38546-9. doi: 10.1007/978-3-642-38547-6.

Ronaldo I Borja and Alexander P Amies. Multiaxial cyclic plasticity model for clays. Journal of geotechnical

engineering, 120(6):1051–1070, 1994.

38 Nikolaos N. Vlassis, WaiChing Sun

Fig. 29: Comparison of the neural network elastoplastic framework (linear elasticity and Von Mises plas-

ticity) with benchmark simulation data for monolithic (LEFT), loading-unloading (MIDDLE), and cyclic

loading (RIGHT) paths. TOP: The yield function NN replaces the yield function and the hardening law.

BOTTOM: The yield function NN predicts only the initial yield surface and a custom identiﬁed hardening

law is applied. The stress measure is in kPa.

Ronaldo I Borja and Jon R Wren. Discrete micromechanics of elastoplastic crystals. International Journal for

Numerical Methods in Engineering, 36(22):3815–3840, 1993.

Ronaldo I Borja, Chao-Hua Lin, and Francisco J Mont´

ans. Cam-clay plasticity, part iv: Implicit integration

of anisotropic bounding surface model with nonlinear hyperelasticity and ellipsoidal loading function.

Computer methods in applied mechanics and engineering, 190(26-27):3293–3323, 2001.

Eric C Bryant and WaiChing Sun. A micromorphically regularized cam-clay model for capturing size-

dependent anisotropy of geomaterials. Computer Methods in Applied Mechanics and Engineering, 354:56–95,

2019.

Kyunghyun Cho, Bart Van Merri¨

enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger

Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statis-

tical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Franc¸ois Chollet et al. Keras. https://keras.io, 2015.

Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev

training for neural networks. In Advances in Neural Information Processing Systems, pages 4278–4287, 2017.

Yannis F Dafalias. Bounding surface plasticity. i: Mathematical foundation and hypoplasticity. Journal of

engineering mechanics, 112(9):966–987, 1986.

Barr´

e DE Saint Venant. Memoire sur l’etablissement des equations differentielles des mouvements in-

terieurs operes dans les corps ductiles au dela des limites ou le elasticite pourtrait les ramener a leur

premier etat. Comptes Rendus de l’Academie des Sciences Paris, 70:473–480, 1870.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 39

Eduardo A de Souza Neto, Djordje Peric, and David RJ Owen. Computational methods for plasticity: theory

and applications. John Wiley & Sons, 2011.

Timothy Dozat. Incorporating nesterov momentum into adam. 2016.

Daniel Charles Drucker. Some implications of work hardening and ideal plasticity. Quarterly of Applied

Mathematics, 7(4):411–418, 1950.

CD Foster, RA Regueiro, Arlo F Fossum, and Ronaldo I Borja. Implicit numerical integration of a three-

invariant, isotropic/kinematic hardening cap plasticity model for geomaterials. Computer Methods in

Applied Mechanics and Engineering, 194(50-52):5109–5138, 2005.

Tomonari Furukawa and Genki Yagawa. Implicit constitutive modelling for viscoplasticity using neural

networks. International Journal for Numerical Methods in Engineering, 43(2):195–219, 1998.

Felix A Gers, J ¨

urgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm.

1999.

J Ghaboussi, JH Garrett Jr, and Xiping Wu. Knowledge-based modeling of material behavior with neural

networks. Journal of engineering mechanics, 117(1):132–153, 1991.

Bezalel Haimson and John W Rudnicki. The effect of the intermediate principal stress on fault formation

and fault angle in siltstone. Journal of Structural Geology, 32(11):1701–1711, 2010.

YMA Hashash, S Jung, and J Ghaboussi. Numerical implementation of a neural network based material

model in ﬁnite element analysis. International Journal for numerical methods in engineering, 59(7):989–1005,

2004.

Yousef Heider, Kun Wang, and WaiChing Sun. So (3)-invariance of informed-graph-based deep neural

network for anisotropic elastoplastic materials. Computer Methods in Applied Mechanics and Engineering,

363:112875, 2020.

Rodney Hill. The mathematical theory of plasticity, volume 11. Oxford university press, 1998.

GT Houlsby. The use of a variable shear modulus in elastic-plastic models for clays. Computers and Geotech-

nics, 1(1):3–13, 1985.

Julian Kochmann, Stephan Wulﬁnghoff, Stefanie Reese, Jaber Rezaei Mianroodi, and Bob Svendsen. Two-

scale fe–fft-and phase-ﬁeld-based computational modeling of bulk microstructural evolution and macro-

scopic material behavior. Computer Methods in Applied Mechanics and Engineering, 305:89–110, 2016.

Julian Kochmann, Lisa Ehle, Stephan Wulﬁnghoff, Joachim Mayer, Bob Svendsen, and Stefanie Reese. Ef-

ﬁcient multiscale fe-fft-based modeling and simulation of macroscopic deformation processes with non-

linear heterogeneous microstructures. In Multiscale Modeling of Heterogeneous Structures, pages 129–146.

Springer, 2018.

DIHD Kolymbas. An outline of hypoplasticity. Archive of applied mechanics, 61(3):143–151, 1991.

Brent Komer, James Bergstra, and Chris Eliasmith. Hyperopt-sklearn: automatic hyperparameter conﬁgu-

ration for scikit-learn. In ICML workshop on AutoML, volume 9, page 50. Citeseer, 2014.

Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The

handbook of brain theory and neural networks, 3361(10):1995, 1995.

M Leﬁk, DP Boso, and BA Schreﬂer. Artiﬁcial neural networks in numerical modelling of composites.

Computer Methods in Applied Mechanics and Engineering, 198(21-26):1785–1804, 2009.

Marek Leﬁk and Bernhard A Schreﬂer. Artiﬁcial neural network as an incremental non-linear constitutive

model for a ﬁnite element code. Computer methods in applied mechanics and engineering, 192(28-30):3265–

3283, 2003.

W Lode. Versuche ¨

uber den einﬂuß der mittleren hauptspannung auf das ﬂießen der metalle eisen, kupfer

und nickel. Zeitschrift f¨ur Physik, 36(11-12):913–939, 1926.

Ran Ma and WaiChing Sun. Fft-based solver for higher-order and multi-phase-ﬁeld fracture models ap-

plied to strongly anisotropic brittle materials and poly-crystals. Computer Methods in Applied Mechanics

and Engineering, 2019. tentatively accepted.

Ran Ma and WaiChing Sun. Computational thermomechanics for crystalline rock. part ii: Chemo-damage-

plasticity and healing in strongly anisotropic polycrystals. Computer Methods in Applied Mechanics and

Engineering, 369:113184, 2020.

Dougal Maclaurin, David Duvenaud, and Ryan P Adams. Autograd: Effortless gradients in numpy. In

ICML 2015 AutoML Workshop, volume 238, page 5, 2015.

Hajime Matsuoka and Teruo Nakai. Relationship among tresca, mises, mohr-coulomb and matsuoka-nakai

failure criteria. Soils and Foundations, 25(4):123–128, 1985.

40 Nikolaos N. Vlassis, WaiChing Sun

C Miehe, N Apel, and M Lambrecht. Anisotropic additive plasticity in the logarithmic strain space: mod-

ular kinematic formulation and implementation based on incremental minimization principles for stan-

dard materials. Computer Methods in Applied Mechanics and Engineering, 191(47-48):5383–5425, November

2002. ISSN 00457825. doi: 10.1016/S0045-7825(02)00438-3. URL http://linkinghub.elsevier.

com/retrieve/pii/S0045782502004383.

R v Mises. Mechanik der festen k ¨

orper im plastisch-deformablen zustand. Nachrichten von der Gesellschaft

der Wissenschaften zu G¨ottingen, Mathematisch-Physikalische Klasse, 1913:582–592, 1913.

M Mozaffar, R Bostanabad, W Chen, K Ehmann, Jian Cao, and MA Bessa. Deep learning predicts path-

dependent plasticity. Proceedings of the National Academy of Sciences, 116(52):26414–26420, 2019.

Kim Lau Nielsen and Viggo Tvergaard. Ductile shear failure or plug failure of spot welds modelled by

modiﬁed gurson model. Engineering Fracture Mechanics, 77(7):1031–1047, 2010.

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal

Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.

arXiv preprint arXiv:1609.03499, 2016.

M Pastor, OC Zienkiewicz, and AHC Chan. Generalized plasticity and the modelling of soil behaviour.

International Journal for Numerical and Analytical Methods in Geomechanics, 14(3):151–190, 1990.

St´

ephane Pernot and C-H Lamarque. Application of neural networks to the modelling of some constitutive

laws. Neural Networks, 12(2):371–392, 1999.

William Prager. The theory of plasticity: a survey of recent achievements. Proceedings of the Institution of

Mechanical Engineers, 169(1):41–57, 1955.

James R Rice. Inelastic constitutive relations for solids: an internal-variable theory and its application to

metal plasticity. Journal of the Mechanics and Physics of Solids, 19(6):433–455, 1971.

Mat´

ıas Roodschild, Jorge Gotay Sardi˜

nas, and Adri´

an Will. A new approach for the vanishing gradient

problem on sigmoid activation. Progress in Artiﬁcial Intelligence, 9(4):351–360, 2020.

K H Roscoe and JB Burland. On the generalized stress-strain behaviour of wet clay. 1968.

WaiChing Sun. A uniﬁed method to predict diffuse and localized instabilities in sands. Geomechanics and

Geoengineering, 8(2):65–75, 2013.

WaiChing Sun, Qiushi Chen, and Jakob T Ostien. Modeling the hydro-mechanical responses of strip and

circular punch loadings on water-saturated collapsible geomaterials. Acta Geotechnica, 9(5):903–934, 2014.

Mahdi Taiebat and Yannis F Dafalias. Sanisand: Simple anisotropic sand plasticity model. International

Journal for Numerical and Analytical Methods in Geomechanics, 32(8):915–948, 2008.

Daniele Versino, Alberto Tonda, and Curt A Bronkhorst. Data driven modeling of plastic deformation.

Computer Methods in Applied Mechanics and Engineering, 318:981–1004, 2017.

Nikolaos Vlassis, Ran Ma, and WaiChing Sun. Geometric deep learning for computational mechanics part

i: Anisotropic hyperelasticity. Computer Methods in Applied Mechanics and Engineering, 371, 2020.

Kun Wang and WaiChing Sun. A multiscale multi-permeability poroplasticity model linked by recursive

homogenizations and deep learning. Computer Methods in Applied Mechanics and Engineering, 334:337–

380, 2018.

Kun Wang, WaiChing Sun, Simon Salager, SeonHong Na, and Ghonwa Khaddour. Identifying material pa-

rameters for a micro-polar plasticity model via X-ray micro-computed tomographic (CT) images: lessons

learned from the curve-ﬁtting exercises. International Journal for Multiscale Computational Engineering, 14

(4), 2016.

Kun Wang, WaiChing Sun, and Qiang Du. A cooperative game for automated learning of elasto-plasticity

knowledge graphs and models with ai-guided experimentation. Computational Mechanics, 64(2):467–499,

2019a.

Kun Wang, WaiChing Sun, and Qiang Du. A non-cooperative meta-modeling game for automated third-

party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks. arXiv

preprint arXiv:2004.09392, 2020.

Xin Wang, Yi Qin, Yi Wang, Sheng Xiang, and Haizhou Chen. Reltanh: An activation function with van-

ishing gradient resistance for sae-based dnns and its application to rotating machinery fault diagnosis.

Neurocomputing, 363:88–98, 2019b.

WR Wawersik, LW Carlson, DJ Holcomb, and RJ Williams. New method for true-triaxial rock testing.

International Journal of Rock Mechanics and Mining Sciences, 34(3-4):330–e1, 1997.

Sobolev training of neural networks for interpretable elasto-plasticity models with level set hardening 41

KJ William and EP Warnke. Constitutive model for the triaxial behaviour of concrete (paper iii-l). In Proc.,

Seminar on Concrete Structures Subjected to Triaxial Stresses, 1974.

Kailai Xu, Daniel Z Huang, and Eric Darve. Learning constitutive relations using symmetric positive

deﬁnite neural networks. arXiv preprint arXiv:2004.00265, 2020.

Annan Zhang and Dirk Mohr. Using neural networks to represent von mises plasticity with isotropic

hardening. International Journal of Plasticity, page 102732, 2020.

Ruiyang Zhang, Yang Liu, and Hao Sun. Physics-informed multi-lstm networks for metamodeling of

nonlinear structures. arXiv preprint arXiv:2002.10253, 2020.

Olgierd C Zienkiewicz, AHC Chan, M Pastor, BA Schreﬂer, and T Shiomi. Computational geomechanics,

volume 613. Citeseer, 1999.