ArticlePDF Available

Component-based machine learning paradigm for discovering rate-dependent and pressure-sensitive level-set plasticity models


Abstract and Figures

Conventionally, neural network constitutive laws for path-dependent elasto-plastic solids are trained via supervised learning performed on recurrent neural networks, with the time history of strain as input and the stress as input. However, training a neural network to replicate path-dependent constitutive responses require significantly more amount of data due to path dependence. This demand on diverse and abundance of accurate data, as well as the lack of interpretability to guide the data generation process, could become major roadblocks for engineering applications. In this work, we attempt to simplify these training processes and improve the interpretability of the trained models by breaking down the training of material models into multiple supervised machine learning programs for elasticity, initial yielding, and hardening laws that can be conducted sequentially. To predict pressure-sensitivity and rate dependence of the plastic responses, we reformulate the Hamliton-Jacobi equation such that the yield function is parametrized in product space spanned by the principle stress, the accumulated plastic strain, and time. To test the versatility of the neural network meta-modeling framework, we conduct multiple numerical experiments where neural networks are trained and validated against (1) data generated from known benchmark models, (2) data obtained from physical experiments, and (3) data inferred from homogenizing sub-scale direct numerical simulations of microstructures. The neural network model is also incorporated into an offline FFT-FEM model to improve the efficiency of the multiscale calculations.
Content may be subject to copyright.
Component-based machine learning paradigm
for discovering rate-dependent and
pressure-sensitive level-set plasticity models
Nikolaos N. Vlassis
Postdoctoral Research Scientist
Department of Civil Engineering and Engineering Mechanics
Columbia University
New York, New York 10027
WaiChing Sun
Associate Professor
Department of Civil Engineering and Engineering Mechanics
Columbia University
New York, New York 10027
Conventionally, neural network constitutive laws for path-
dependent elasto-plastic solids are trained via supervised
learning performed on recurrent neural network, with the
time history of strain as input and the stress as input. How-
ever, training neural network to replicate path-dependent
constitutive responses require significant more amount of
data due to the path dependence. This demand on diverse
and abundance of accurate data, as well as the lack of in-
terpretability to guide the data generation process, could be-
come major roadblocks for engineering applications. In this
work, we attempt to simplify these training processes and im-
prove the interpretability of the trained models by breaking
down the training of material models into multiple super-
vised machine learning program for elasticity, initial yield-
ing and hardening laws that can be conducted sequentially.
To predict pressure-sensitivity and rate dependence of the
plastic responses, we reformulate the Hamliton-Jacobi equa-
tion such that the yield function is parametrized in prod-
uct space spanned by the principle stress, the accmulated
plastic strain and time. To test the versatility of the neural
network meta-modeling framework, we conduct multiple nu-
merical experiments where neural networks are trained and
validated against (1) data generated from known benchmark
models, (2) data obtained from physical experiments and (3)
data inferred from homogenizing sub-scale direct numerical
simulations of microstructures. The neural network model is
also incorporated into an offline FFT-FEM model to improve
the efficiency of the multiscale calculations.
Corresponding author
1 Introduction
One of the century-old challenges for mechanics re-
searchers is to formulate plasticity theory that predicts the
relationship among strain history, plastic deformation, and
stress for materials governed by different deformation mech-
anisms. As plastic deformation accumulates, the dissipation
and plastic work may lead the yielding criteria to evolve, and
cause a variety of hardening/softening mechanisms to mani-
fest as the evolution of microstructures, such as twinning [1],
dislocation [2], pore collapse [3], void nucleation [4], and re-
arranging of particles [5]. Generations of scholars including
Coulomb [6], von Mises [7], Drucker and Prager [8] spent
decades to create new plasticity theories to incorporate new
causality relations and hypotheses for path-dependent ma-
terials. In stress-based plasticity theories, yield function is
expressed as a function of stress, internal variables, and the
the hardening laws (e.g. isotropic, kinematic hardening, rota-
tional, mixed-mode, hardening) as deduced from experimen-
tal observations and sub-scale micro-mechanical simulations
(e.g. dislocation dynamic, molecular simulations).
In the past decades, new plasticity models are often gen-
erated by modifying existing models with different expres-
sions of the yield functions or the hardening laws. For in-
stance, a search in Google Scholar for ”modified Johnson-
Cook model” and ”modified Gurson model” reveals more
than 632,000 and 8,350 results respectively 1. A vast major-
ity of these published works are dedicated to manually mod-
ifying the original model with new evolution laws or shapes
of the yield surface that accompany new physics, new mate-
1as of 7/8/2021
1 Copyright c
rials, or new insights for more precise predictions. While
this conventional workflow has led to numerous improve-
ments in modeling, the more sophisticated models are often
inherently harder to tune due to the expansion of parametric
space. This expansion does not only make it less feasible
to determine the optimal mathematical expressions through
a manner trial-and-error effort (even after the causality of the
yielding and hardening is known [9]), but also require a more
complicated inverse problems to identify the material param-
eters [10,11,12].
The recent success of deep neural networks has inspired
a new trend where one may simply build a forecast engine
by training a network with a pair of strain and stress his-
tories [13,14]. To replicate the history dependence of the
plastic deformation, an earlier neural network approach work
would employ strain and stress from multiple previous time
steps to predict new stress states [15] whereas more recent
works such as [16] and [17] employ recurrent neural net-
works, such as the Long Short-Term Memory (LSTM) and
Gated Recurrent neural networks to introduce memory ef-
fects. The promise and expectation are that the continuous
advancement of neural networks or other machine learning
techniques might one day replace the modeling paradigm
currently employed in the engineering industry with supe-
rior accuracy, efficiency, and robustness [18,16]. However,
the early success in the 90s and the recent resurrection of
optimism about neural network predictions on constitutive
responses so far have a limited impact on industry appli-
cations. This reluctance is not entirely unjustified. In fact,
recent studies and workshops conducted by US Department
of Energy have cited the lack of domain awareness, inter-
pretability, and robustness as some of the major technical
barriers for the revolution of AI for scientific machine learn-
ing [19]. To facilitate changes in the industry, the trustwor-
thiness of the predictions is necessary and interpretability is
a necessary condition to overcome these obstacles [20].
As such, our focus in this work is to explore the possi-
bility of building an interpretable machine learning paradigm
capable of serving as the interface between plasticity model-
ers and artificial intelligence. As such, our focus has shifted
from solely using AI to make predictions to building AI
to create plasticity theories compatible with domain knowl-
edge, easily interpreted, and capable of not only improving
the accuracy but also the robustness of existing models. We
propose the training of elastoplasticity models through mul-
tiple supervised learning to generate the model components
of knowledge separately (i.e. elastic stored energy, yielding
function, and hardening laws). The resultant model is the
composition of these machine-generated knowledge compo-
nents and is fully interpretable by modelers.
To achieve this goal, we have recast both rate-
independent and rate-dependent plasticity as a Hamilton-
Jacobi problem in a parametric space that is spanned by the
principal stress, the accumulated plastic strain, the plastic
strain rate and the real time. Meanwhile, the anisotropy of
plastic yielding is achieved by mapping the yield function
level sets of the same material under different orientations
through a supervised neural network training on a chosen
yield function projection basis. These treatments enable us to
create a general mathematical description for a large number
of existing plasticity models, including von Mises plasticity,
Drucker-Prager, and Cam-clay models combined with any
possible hardening law as merely special cases of the level
set plasticity model. Instead of solving the level set extension
problem in the parametric space, we formulate a supervised
learning problem generate the constitutive updates from neu-
ral computation and speed up calculation compared to clas-
sical hierarchical multiscale computation [21,22,23,24].
More importantly, this new AI-enabled framework rep-
resents a new paradigm shift where the goal of machine
learning has shifted from merely generating forecast engines
for mechanistic predictions to creating interpretable math-
ematical models that inherently obey physical laws with the
assistance of machine learning. The resultant model does not
require the usage of recurrent neural networks, it is easier to
train, and provides more robust results for blind predictions.
2 Level set plasticity
The goal of this section is to extend the previous pub-
lished work [25] to incorporate pressure-sensitivity and rate-
dependence. The mathematical framework is very similar
except that the introduction of the pressure-dependence and
the rate-dependence may lead to a higher dimensional space
for the Hamilton-Jacobi problems and therefore higher de-
mand on data. These new implications are highlighted in this
section. Details of the initial implementation of the level set
plasticity model can be referred to [25]. The algorithm used
to generate the yield function level sets and trained the plas-
ticity model neural networks is summarized in Algorithm 1.
Here we formulate the machine learning plasticity prob-
lem, not in a single supervised learning, but by splitting the
task into multiple smaller ones (predicting a stored elastic-
ity energy functional, predicting a yield surface, introducing
a mapping for anisotropy), each constituting one neural net-
work trained for a sub-goal. Then, complex behaviors can
be predicted by integrating these networks in a level set plas-
ticity framework. This treatment does not only improve the
predictions but more importantly introduces a learning struc-
ture where the casual relations of the individual components
are clearly defined without losing the generality of individ-
ual model predictions. As pointed out by recent work for in-
terpretable machine learning [20] and [26], this component
design helps promoting both simulatability (the ability to
internally simulate and reason about the overall predictions)
and modularity (the ability to interpret portions of the pre-
dictions independently) of the AI-generated models.
We introduce a new concept of treating the yield sur-
faces in the parametric space composed of stress, accu-
mulated plastic strain, strain rate as a level set. We also
discuss the importance of leveraging material symmetries
to reduce the data demand for the supervised machine
learning problem. Previously, [27] and [25], have intro-
duced NURB and machine learning-based interpolations re-
spectively to generate yield surfaces with isotropic rate-
independent plasticity. The key departure here is the new
2 Copyright c
"= !#= !!
"= !#= !!
"= !#= !!
Gather yield
surface points
Generate level set
(data augmentation)
Train NN yield
function level set
yield surface
level set predicted yield
Fig. 1. Universal training process for level set yield functions: 1) gather yield surface data points, 2) generate level set through the initialization
process, and 3) train neural network on the level set data (the zeroth level of the predicted level set is the approximated yield surface).
capacity to generalize the learning algorithm for anisotropic
rate-dependent/independent plasticity.
An important factor that dictate whether the training of
the machine learning model with limited data could be suc-
cessful is how material symmetry can be leveraged. For ex-
ample, the data collection can be significantly reduced for
isotropic plasticity, as the principal strain and stress are co-
axial. Another important aspect to consider is how to lever-
age material symmetry to select the coordinate system that
represents the same data in the parametric space. For in-
stance, a Euclidean space spanned by the values of the three
principal stress could be sufficient for isotropic yield func-
tion and hence leads to a simpler supervised learning prob-
lem than those that use all 6 stress components. Furthermore,
the choice of the coordinate system may affect how one plan
to collect the data and vice versa. For instance, while it is
possible to formulate the level set problem with the princi-
pal stress as the Cartesian basis, i.e., (σ1,σ2,σ3), it might be
even more efficient to consider the usage of (q,p)stress for
experimental data obtained from conventional triaxial tests
where only two distinctive principal stress can be controlled.
In this latter case, the anisotropy and the dependence of all
three invariants of the constitutive responses could not be
sufficiently captured from the data gathered by the set of the
experiments alone. Hence increasing the dimensions of the
parametric space for the elasticity energy and the yield func-
tion would not be beneficial. In the numerical experiments
we conducted, we adopt the cylindrical coordinates (see Eq.
(1)) for the π-plane orthogonal to the hydrostatic axis where
σ1=σ2=σ3(cf. [28]). This treatment enables us to detect
any symmetry on the π-plane that might allow us to reduce
the dimensions of the data and potentially simplify the train-
ing of the neural network with less data.
2/2 0 2/2
0 1 0
2/2 0 2/2
1 0 0
0p2/3 1/3
Before translating the yield surface fΓdata points into
a yield function level set, we reduce the dimensionality of
the stress point xrepresentation. In the case of the isotropic
pressure-dependent plasticity, we can reduce the stress rep-
resentation from 6 dimensions x(σ11,σ22 ,σ33,σ12 ,σ23,σ13 )
(already reduced from 9 due to balance of angular momen-
tum) to an equivalent three stress invariant representation
x(p,ρ,θ). In this representation, pis the mean pressure, ρ
and θare the Lode’s radius and angle respectively.
The yield function is then postulated to be a signed dis-
tance function defined as:
ξ,t) =
x)outside fΓ(inadmissible stress)
0 on fΓ(yielding)
x)inside fΓ(elastic region)
where d(b
x)is the minimum Euclidean distance between any
point b
xof the solution domain of the stress space where
the signed distance function is defined and the yield surface
x) = 0}, defined as:
x) = min(|b
where b
xΓis the yielding stress for a given value of accumu-
lated plastic strain ξand its rate ˙
ξat time t. The plastic in-
ternal variable ξis monotonically increasing and represents
3 Copyright c
the history-dependent behavior of the material. The time t
signifies a snapshot of the current state of the level set φfor
the current value of the plastic internal variable ξand its rate
2.1 Data augmentation through signed distance func-
tion generation
We can now pre-process the stress point cloud of the
yield surface for a given ξand ˙
ξby solving the Eikonal equa-
tion ˆ
xφ=1 while prescribing the signed distance function
to 0 at ˆ
xfΓ. For every stress point in the yield surface
data set, we generate a discrete number of auxiliary points
that construct a signed distance function. In the context of
level set theory, this can be seen as solving the level set ini-
tialization problem. In the context of machine learning, the
signed distance function construction can be interpreted as
a method of data augmentation – a large number of auxil-
iary data samples where fΓ6=0 are introduced to improve
the training performance as well as the accuracy and robust-
ness of both the learned function fΓand equally importantly
its stress gradient fΓ/∂σi j. A schematic of the yield surface
data pre-processing into a signed distance function is demon-
strated in Fig. 1.The color is the value of the signed distance
yield function. It is negative in the elastic region and posi-
tive in the inadmissible stress region. The material yields if
the current stress is at the location where the value of yield
function equals to zero. It is noted that the signed distance
function has been selected as the preferred level set function
due to the simplicity of the implementation – the yield func-
tion can be formulated on other level set function bases, the
benefits of which will be considered in future work.
2.2 Hardening as a level set extension problem
After pre-processing the yield surface fΓdata points for
a sequence of internal variable values ξand rates ˙
ξinto a
level set by solving the level set initialization problem, we
will recover the velocity function of a Hamilton-Jacobi equa-
tion of a level set extension problem to describe the temporal
evolution of the level set. A general Hamilton-Jacobi equa-
tion reads:
where vis the normal velocity field that describes the geo-
metric evolution of the boundary (yield surface fΓ). In the
context of plasticity, the velocity field corresponds to the ob-
served hardening mechanism. The velocity vector field can
be described by a magnitude scalar function Fand a direc-
tion vector field n=ˆ
xφsuch that:
Substituting into Eq. (4):
where Fiφi+1(ξi+1,˙
In the above equation, Fi(p,ρ,θ,ξ,˙
ξ) = F(p,ρ,θ,ξ,˙
for i=0,1,2,...,n+1 is the finite difference approxi-
mated scalar velocity (hardening) function that corresponds
to the pre-processed collection of signed distance functions
{φ0,φ1,...,φn+1}at time {t0,t1,...,tn+1}. Thus, we have
recast a yield function finto a signed distance function
φ, such that f(p,ρ,θ,ξ,˙
ξ) = φ(p,ρ,θ,ξ,˙
ξ). We can now
formulate a machine learning problem to approximate the
level set yield function fwith its neural network yield func-
tion b
ξ|W,bcounterpart, parametrized by
weights Wand biases bto be optimized during training.
The training objective for the neural network optimiza-
tion is to minimize the following loss function at training
samples (ˆx,ξ,˙
ξ,t)for i[1,...,N]:
where we have added a penalty term, weighted by a factor
wp, that will activate when the the yield function is not obey-
ing convexity during training.
It is noted that the Hamilton-Jacobi equation described
in this section will not be solved numerically – while theo-
retically possible (e.g fast marching solver). Its solution will
be directly predicted by a neural network. The zeroth level
of the neural network predicted level set is the yield surface.
The neural network approximated velocity field is the data-
driven hardening mechanism.
Remark 1 Rescaling of the training data. In every loss
function in this work, we have introduced scaling coeffi-
cients γαto remind the readers that it is possible to change
the weighting to adjust the relative importance of different
terms in the loss function. These scaling coefficients may
also be viewed as the weighting function in a multi-objective
optimization problem. In practice, we have normalized all
data to avoid the vanishing or exploding gradient problem
that may occur during the back-propagation process [29]. As
such, normalization is performed before the training as a pre-
processing step. The Xisample of a measure Xis scaled to a
unit interval via,
4 Copyright c
Algorithm 1 Training of a pressure and rate dependent
isotropic yield function level set neural network.
Require: Data set of Nsamples: stress measures σat yield-
ing, accumulated plastic strain εp, and accumulated plas-
tic strain rate ˙
εp, a Llevels number of levels (isocontours)
for the constructed signed distance function level set (data
augmentation), and a parameter ζ>1 for the radius range
of the constructed signed distance function.
1. Project stress onto π-plane
Initialize empty set of π-plane projection training sam-
ples (ρi,θi,pi)for iin [0,...,N].
for i in [0,...,N]do
Spectrally decompose σi=3
Transform (σ1,i,σ2,i,σ3,i)into σ00
Eq. (1).
end for
2. Construct yield function level set (data augmentation)
Initialize empty set of augmented training samples
εp,m,fm)for min [0,...,N×Llevels].
for i in [0,...,N]do
for jin [0,...,Llevels]do
Llevels ρithe signed distance
function is constructed for a radius range of [0,ζρi]
Llevels ρiρithe signed distance
function value range is [ρi,(ζ1)ρi]
Rescale (ρm,θm,pm,εp,m,˙
εp,m,fm)via Eq. (8).
end for
end for
3. Train neural network b
εp,m)with loss
function Eq. (7).
4. Output trained yield function b
fneural network and exit.
Xmax Xmin
where Xiis the normalized sample point. Xmin and Xmax are
the minimum and maximum values of the measure Xin the
training data set such that all different types of data used in
this paper (e.g. energy, stress, stress gradient, stiffness) are
all normalized within the range [0,1].
2.3 High-order Sobolev training
In this work, we distinguish between the material’s elas-
tic and plastic behaviors by training two different neural net-
work model components – a hyperelastic energy functional
and a yield function level set that evolves according to accu-
mulated plastic strain. These components are then combined
in a specific form of return mapping algorithm (Algorithm 1)
that may take an arbitrary elasticity model, and a yield func-
tion with generic hardening law to generate the constitutive
update for the class of inelastic materials that has a distinct
elastic region defined in a parametric space. The hypere-
lastic network counterpart is expected to have interpretable
derivatives – the first derivative of the energy functional with
respect to the strain should be a valid stress tensor, and the
second derivative a valid stiffness tensor. We adopt a Sobolev
training objective, first introduced in [30], and we extend it
to higher-order constraints, to train the energy functional ap-
proximator b
ψe(εe|W,b)using the following loss function:
A benefit of using Sobolev training is the notable data ef-
ficiency. Sobolev training has been shown to produce more
accurate and smooth predictions for the energy, stress, and
stiffness fields for the same amount of data compared to clas-
sical L2norm approaches that would solely constrain the pre-
dicted energy values [25].
3 Numerical experiments
In this section, we demonstrate the AI’s capacity to re-
discover plasticity models from the literature, we explore
the model’s ability to capture highly complex new harden-
ing modes, and, finally, showcase how the AI can discover
the yield surface for a new polycrystal material and replace
the plasticity model in a finite element simulation. To test
whether the machine learning approach can be generalized,
we purposely test the AI against a wide range of material
data sets for soil, rock, poly-crystal, and steel. In particu-
lar, we employ three types of data sets, (1) data generated
from known literature models, (2) data obtained from exper-
iments, and (3) data obtained from sub-scale direct numeri-
cal simulations of microstructures. The first type of data is
used as a benchmark to verify whether the neural network
can correctly deduce the correct plastic deformation mech-
anisms (yield surface and hardening) when given the corre-
sponding data. The second and third types of data are used to
validate and examine the AI’s ability to discover new plastic
deformation mechanisms with a geometrical interpretation in
the stress space.
5 Copyright c
3.1 Verification examples
The purpose of this example is to showcase our algo-
rithms capacity to reproduce the modeling capacity of clas-
sical plasticity theory. We first demonstrate our algorithms
ability to recover yield surface and hardening mechanisms
from the classical plasticity literature. We then demon-
strate the frameworks capacity to make predictions cali-
brated on experimental data for pressure-dependent and rate-
dependent plasticity.
3.1.1 Verification on classical plasticity theories
The proposed AI can readily reproduce numerous yield
function models from the plasticity literature, following the
same universal data pre-processing and neural network train-
ing algorithm. For this benchmark experiment, we gener-
ate synthetic data sets for four initial yield surfaces of in-
creasing shape complexity: the J2 [7] (cylinder), Drucker-
Prager [8] (cone), Modified Cam-Clay [31] (oval), and Ar-
gyris [32] (ovoid with triangular cross-section) yield sur-
faces. We simultaneously study four common hardening
mechanisms that transform and/or translate these surfaces
in the 3D stress space: isotropic hardening (cylinder dila-
tion), rotational hardening (cone rotation), kinematic hard-
ening (translation along the hydrostatic axis), and softening
The data sets for these yield surfaces are populated by
sampling from the above-mentioned literature yield func-
tions. The sampling was performed as a uniform grid of the
stress invariants and the accumulated plastic strain. We sam-
ple 50 data points along the mean pressure axis, 100 data
points along the angle axis, and 10 data points along the ac-
cumulated plastic strain axis (a total of 50000 data samples
per yield function data set). The yield surface data points
are pre-processed into a signed distance function level set
database through the level set initialization procedure. For
each yield surface, 15 levels are constructed: the yielding
level, 7 in the elastic region, and 7 in the region of inadmis-
sible stress. After data augmentation, the training data set
consists of 750000 level set sample points.
For each level set database, we train a feed-forward neu-
ral network to approximate the initial yield function and its
evolution. The yield function neural networks consist of a
hidden Dense layer (100 neurons / ReLU), followed by two
Multiply layers, then another hidden Dense layer (100 neu-
rons / ReLU) and an output Dense layer (Linear). The use
of Multiply layers was first introduced in [25] to increase the
continuity of the activation functions of neural network func-
tional approximators. They were shown to allow for greater
control over the network’s higher-order derivatives and the
application of higher-order Sobolev constraints in the loss
function. The layers’ kernel weight matrix was initialized
with a Glorot uniform distribution and the bias vector with
a zero distribution. All the models were trained for 2000
epochs with a batch size of 128 using the NAdam optimizer,
set with default values of the Keras library [33].
The neural network predicted yield surfaces are demon-
strated in Fig. 2. For each model, three surfaces are shown
for three different levels of accumulated plastic strain. It is
highlighted that, given an accumulated plastic strain value,
we can recover the entire yield locus.
3.1.2 Level set plasticity model discovery for rate-
dependent and anisotropic materials
In this section, we test the frameworks capacity to make
predictions on rate-dependent and anisotropic data.
To test the trained neural network prediction of rate-
dependent responses, we incorporate data from the published
work [34] for steel that exhibits different yielding stress un-
der different strain rates. In the numerical experiments, we
use experimental data collected at strain rates ranging from
0 to 0.02s1as the training data, sampled in a uniform grid
of 10 strain rate increments. The yield surface is sampled at
25 points along the mean pressure axis, at 100 points along
the angles axis, and at 10 points along the accumulated plas-
tic strain axis (a total of 250000 sample points). The data
are pre-processed into signed distance functions of 15 lev-
els, generating 3750000 training sample points. The neural
network used for this viscoplastic model training follows the
same architecture as the yield function neural networks de-
scribed in the previous section.
We use the experimental data collected at strain rates
104, 5 ×101, and 0.02s1to validate the ability of the
model to make blind predictions for unseen events. Figure
3(a) shows the results of the six predictions that the AI gen-
erated for unseen data. The left figure shows the stress-strain
predictions on the uniaxial tensile tests of three different
loading rates, while the right figure shows the stress-strain
predictions on the simple shear test counterpart. In both
cases, the predictions match well with the unseen benchmark
data that is excluded from the training data set.
As for the anisotropic predictions, Figure 4shows the
machine learning generated mapping that predicts how the
yield surface in the principal stress space evolves for dif-
ferent material orientations. The data we employed in this
second experiment is generated from an FFT solver that sim-
ulates the polycrystal plasticity of a specimen composed of
FCC crystal grains. We sample the material constitutive be-
havior at 10 microstructure orientations at 150 Lode angle
sampling directions and pre-process the data into signed dis-
tance functions of 15 levels, generating 22500 training sam-
ple points for the projection mapping neural network. Work-
ing on the pressure independent stress space, the network
inputs the true stress invariants and the microstructural ori-
entation information that describes the anisotropy – in the
case of polycrystals studied in this work, the polycrystal ori-
entations as three Euler angles, and outputs the reference
stress space invariants. The network has the following layer
structure: Dense layer (200 neurons / ReLU), Multiply layer,
Dense layer (200 neurons / ReLU), Multiply layer, followed
by three more Dense layers (200 neurons / ReLU) and an
output Dense layer (Linear). The layers’ kernel weight ma-
trix was initialized with a Glorot uniform distribution and the
bias vector with a zero distribution. The model was trained
for 2000 epochs with a batch size of 256 using the NAdam
6 Copyright c
!!= !"= !#
!"!!= !"= !#
!!= !"= !#
!!= !"= !#
Fig. 2. AI can rediscover classical plasticity models: J2 plasticity model with isotropic hardening (top left), Drucker-Prager model with
rotational hardening (top right), MCC model with kinematic hardening (Bauschinger effect) (bottom left), and Argyris model with softening
(bottom right). The corresponding benchmark and predicted strain-stress curves are also demonstrated. The stress measure is in kPa.
Fig. 3. Neural network predicted viscoplastic response for increas-
ing loading strain rates for a tension (left) and shear (right) load-
ing test performed on mild-steel beams (experimental data obtained
from [34]).
optimizer. The predictions of the mapping suggest that it is
possible to generate a single mapping function that maps all
yield surfaces obtained from different polycrystal specimens
of different orientation onto a reference stress domain - de-
noted as (σ00
3.2 Demonstration of model discovery capacity
Yield surface discovery in the literature has been limited
by the difficulty of deriving mathematical expressions for
higher-complexity geometrical shapes that represent them.
Additional obstacles arise when there is need to describe the
smooth transition from the shape of the initial yield surface
to that of a state with more accumulated plasticity. The al-
gorithm’s capability to discover new yield surfaces and hard-
ening mechanisms automatically directly from the data over-
comes these impediments.
To test this, we construct a fictitious yield surface
Fig. 4. The framework can capture anisotropic responses by pro-
jecting anisotropic yield surfaces onto a master projection basis curve
using a neural network stress space mapping ϕNN .
database that is based on the Argyris model [32] and com-
bines the Modified Cam-Clay [31] hardening mechanism
along with a transformation of the elastic region’s cross-
section from a triangular shape to a circle. The yield surface
is sampled at a total of 50000 points and pre-processed to
generate 750000 level set sample points. The predictions for
the yield surface and underlying level set for increasing ac-
cumulated plastic strain are demonstrated in Fig. 5. Deriving
a mathematical expression for this data set is not straightfor-
ward. Even if the derivation is sucessful, the resultant mathe-
7 Copyright c
Increasing accumulated plastic strain
"= !#= !!!
"= !#= !!!
"= !#= !!
"= !#= !!!
"= !#= !!!
"= !#= !!
Fig. 5. AI discovered yield surfaces and hardening mechanisms as
evolving level sets from synthetic data. The yield surface and cor-
responding yield function level set are evolved in according to the
increasing accumulated plastic strain.
matical expression might require additional material param-
eters that lacks physics underpinning. The capability of the
neural network to approximate arbitrary function therefore
offers us a flexible and simple treatment to handle the evolu-
tion of yield function.
To analyze the sensitivity with respect to the random
neural network weight initialization, we have repeated the
supervised learning for the synthetic yield function problem
showcased in Fig. 5 five times, each with a different random
seed. The results, which are shown in Fig. 6, indicate that
the training and the resultant losses for both the training and
testing cases are close. This result suggests that the training
is not very sensitive to the random seeds. Furthermore, the
small difference in the training and testing loss of the yield
function also suggests that there is no significant overfitting.
Our proposed algorithm also automates the discovery
of yield surfaces for new materials. We generate a yield
surface database for a randomly generated polycrystal mi-
crostructure through efficient data sampling of the invariant
stress space with FFT solver elastoplastic simulations. To
gather the yield surface data points for the polycrystal mate-
rial, we subdivide the π-plane uniformly at 140 Lode’s angles
and sample the stress space with monotonic loading simula-
tions at each angle direction. The yield surface data points
are gathered as soon as yielding is first detected, recording
the stress response and the accumulated plastic strain. The
FFT simulations provide 157500 sample points that are pre-
processed into 2362500 level set sample points. It is noted
that the material was observed to be pressure-independent.
Thus, sampling on the π-plane at a constant mean pressure
was enough to capture the entire stress response.
The yield surface data points are pre-processed into a
level set data base and the results of the trained polycrys-
tal neural network yield function are demonstrated in Fig. 5.
The neural network parameters for the new model training
in this section remain identical as previously described. In-
vesting the modeling effort to describe the complex yielding
behavior of a material could be proven futile – especially if
Fig. 6. Loss vs. epoch for the synthetic yield function shown in Fig
5for the training data set (TOP) and the testing data set for cross
validation (BOTTOM). The test data set is mutually exclusive with the
training data set.
Increasing accumulated plastic strain
"= !#= !!!
"= !#= !!
"= !#= !!
Fig. 7. Yield function level set of a new polycrystal microstructure
for increasing accumulated plastic strain.
the material is highly heterogeneous. Conceiving a new yield
function for every new material studied can become rather
impractical and automation in yield surface generation can
accelerate the plasticity study of novel materials.
3.3 Offline multiscale FFT-FEM numerical experi-
In engineering practice, a constitutive law is seldom
used as a standalone forecast engine but is often incorporated
into a solver that provides a discretized numerical solution.
Here, we test whether the AI-generated models can be de-
8 Copyright c
ployed into an existing finite element solution. The yield
surface neural networks combined with a hyperelastic en-
ergy functional neural network can be readily plugged into
a strain space return mapping algorithm to make strain-stress
predictions. In this work, we utilize a linear elasticity energy
functional as the neural network that will provide the elastic
response in the algorithm. We train a two layer feed-forward
neural network that inputs the elastic volumetric εe
vand de-
viatoric εe
sstrain invariants to approximate the hyperelastic
energy funtional ψe. The network is trained on 2500 data
points sampled from a uniform grid of (εe
s)pairs. The ar-
chitecture consists of a hidden Dense layer (100 neurons /
ReLU), followed by two Multiply layers, then another hid-
den Dense layer (100 neurons / ReLU), and an output Dense
layer (Linear). The models were trained for 1000 epochs
with a batch size of 32 using the NAdam optimizer [35], set
with default values in the Keras library. Using a Sobolev
training framework, the model was optimized with a higher-
order H2training objective – the loss function constrains the
predicted energy, stress, and stiffness similar to (9). The re-
sulting stress predictions for the literature yield surfaces for
random cyclic loading and unloading strain paths are demon-
strated in Fig. 2for each approximated yield surface model.
We have also successfully incorporated the trained neu-
ral network plasticity model into a finite element solver to de-
liver an excellent match with the higher-cost FFT-FEM pre-
dictions for unseen loading paths not included in the training
data set. The discovered yield function for a randomly gen-
erated polycrystal microstructure is demonstrated in Fig. 7.
In Fig. 8, the polycrystal plasticity model trained by a neu-
ral network is used to replace the FFT solver that provides
the constitutive updates from DNS simulations at the sub-
scale level. The simulation is performed on a square plate
with a circular hole supported on frictionless rollers on the
top and bottom surfaces. Results shown in Fig. 8indicate
that the NN-FEM model is capable of replacing the compu-
tationally heavy FFT-FEM simulations (cf. [36]) with a frac-
tion of the cost. In this offline multiscale problem, the finite
element contains 960 elements with 2880 integration points.
An FFT-FEM framework may take an average of 11110 (ap-
proximately 3.85 seconds per integration point) seconds to
complete the incremental constitutive updates for all integra-
tion points whereas the neural network counterpart require
an average of 230 seconds (approximately 0.08 seconds per
integration point) to finish the same task in a MacBook Pro
with 8-core CPU. As for the overhead cost to generate the
training data from the FFT polycrystal simulations, the time
to generate the training data set for the polycrystal yield func-
tion (157500 yield function sample points) is approximately
5 hours.
4 Discussion
The proposed algorithm provides a general approach of
discovering complex yield surface shapes and their evolu-
tion directly from data. In the result section of this work, all
yield functions and hardening mechanisms are predicted by
neural networks without any specific modeler intervention
Fig. 8. The discovered yield function can be readily implemented in
FEM simulations, replacing the FFT solver. The accumulated plas-
tic strain profile for an FEM simulation and the predicted stress re-
sponses at different points of the domain against the FFT benchmark
simulations are also shown. The stress measure is in kPa
with hand-crafted derivations. All models in this work, be it
models from the plasticity literature or models designed for
new materials, followed identical data pre-processing, neural
network training, and return mapping implementation proce-
Our neural network yield functions provide a unique ad-
vantage in crafting interpretable data-driven plasticity mod-
els. The capacity to predict and visualize the entire yield
locus at every time step of an elastoplastic simulation al-
lows for the anticipation of elastic or plastic responses and
the inspection of thermodynamic consistency (e.g convex-
ity). Especially, by adopting a lower-dimensional stress rep-
resentation (Lode’s coordinates), not only is the model com-
plexity reduced but also a transparent yield surface data sam-
pling scheme becomes possible. The alternative of random
sampling of strain paths comes with the uncertainty of suffi-
ciently visiting the yield surface in the entire stress space.
9 Copyright c
4.1 Physics underpinning for the partition of elastic and
plastic strain
Decomposing the elastoplastic behavior prediction into
two simple feed-forward neural networks – a hyperelastic en-
ergy functional and a yield function – is central to the al-
gorithm’s interpretability and allows for a clear-cut distinc-
tion of elastic and plastic behavior. This is not necessarily
true with the classical recurrent network approach, such as
the common LSTM or GRU [16,37]. When training neu-
ral networks with these architectures, the elastic and plastic
constitutive responses are often indistinguishable. This treat-
ment does not only cause issues with interpretability but also
renders the black-box predictions vulnerable to erroneous
causality or correlation structures. For instance, experimen-
tal data of the nonlinear elasticity response may actually af-
fect the yielding response as there is no explicit mechanism
to distinguish the two. For instance, the models trained on
monotonic loading data can readily predict non-monotonic
constitutive responses due to the explicitly defined elastic
range whereas the black-box alternative cannot (see Fig. 2).
Furthermore, the recurrent network’s dependency on the in-
put strain rate, the importance of the sampling frequencies in
the time domain, and the more difficult training due to the
vanishing or exploding [38] are rarely addressed in the ma-
chine learning plasticity literature.
Note that the machine learning algorithm proposed here
does not exhibit better interpretability than the hand-craft
counterpart, but is easier to interpret than the RNN and the
multi-step ANN approaches that do not provide definite dis-
tinction between the elastic region and the yielding. An ex-
ception is the recent work by Huang and Darve [39], in which
the neural network partitioned the total strain into elastic and
plastic components via a partition-of-unity function. Never-
theless, when a continuous weighting function (such as sig-
moid function), is used to partition for the elastic and plastic
strain, it may introduce a transition zone where the materials
are considered both path-independent and path-dependent.
4.2 Representation of parametric space and geometri-
cal interpretation of elastoplasticity models
Another advantage of the interpretable machine learning
approach is that the geometrical interpretation is helpful for
determining the optimal data exploration strategies. Given
the fact both real experiments and direct numerical simula-
tions are often costly to conduct, a Monte Carlo simulation
to randomly sample the parametric space for path-dependent
materials is too costly to be feasible [40]. By introducing
the level set to define the yielding criterion, however, we can
conceptualize the elastic range as a multi-dimensional ob-
ject in a Euclidean space. This feature may help us to visual
the abstract concept of yielding on a Euclidean space and
help us estimating the sufficiency of the data by defining a
proper metrics in the parametric vector space and decide the
distribution of data that helps us better captures important
features such as replicating sharp gradient, determining con-
vexity by checking the Hessian and ensuring connectivity of
the learned models. These tasks are not necessarily impossi-
ble but are difficult to achieve with a black-box model.
4.3 Smoothness of the machine learning plasticity
Training the neural networks of this work with a higher
degree of continuity activation functions and higher-order
Sobolev loss function constraints allows one to control the
prediction accuracy of the derivatives of the approximated
functionals. This control of stress gradient to the yield
function is crucial whereas the automatic differentiation
used in the back-propagation can help us generate suffi-
ciently smooth elastoplastic tangent operators suitable for
PDE solvers. On the other hand, classical black-box neural
network elastoplasticity approaches usually do not control
the quality of the derivatives of the trained functions. While
finite difference methods can be used to approximate the tan-
gent tensor obtained from neural network without Sobolev
training if necessary [41], the smoothness and accuracy of
the approximated tangent cannot be guaranteed. Further-
more, the Sobolev training and higher-order activation func-
tions allow controlling the smoothness and continuity of the
yield surface. This can be a more efficient alternative to the
current practice where a plasticity model with a non-smooth
yield surface either requires specific algorithmic algorithm
to genrate incremental constitutive updates [42] or modified
manually into a smoothed version to bypass the numerical
barrier [43,44].
In principle, the approach may generate a sufficiently
smooth yield surface in parametric space of different dimen-
sions (e.g. principal stress space, strain space, porosity-stress
space). However, if the yield surface is non-smooth for phys-
ical reasons, then (1) specific supervised learning algorithms
that detect the singular point and (2) the corresponding spe-
cific treatment to handle the bifurcated stress gradient of the
yield surface are both necessary. Furthermore, unlike the
classical hand-crafted model or models generated from ge-
ometric learning (see [45]) that are designed for an entire
class of materials of similar but distinctive microstructures,
the proposed algorithm is designed to generate a surrogate
model specifically tailored for one RVE or specimen.
4.4 Comparison with parameter identification of pre-
determined models
Note that, while both parameter identification and su-
pervised machine learning involve solving inverse problems
and, in many cases, multi-objective optimization, the pro-
posed approach does not assume specific forms of equations
a priori for the hyperelasticity energy functional and yield
function. With a sufficient neural network architecture, the
neural network approach may offer more flexibility in find-
ing the optimal forms of equations (see universal approxi-
mation theorem [46] ). However, this flexibility comes at the
expense of having to deal with the Banach space (cf. Parhi
and Nowak [47] and Weinan and Wojtowytsch [48]) of much
higher dimensions (of the neural network learned function)
than the Euclidean space for a typical parameter identifica-
tion problem.
10 Copyright c
A similar analogy can be drawn between nonparamet-
ric/symbolic regression and polynomial regression where
the lack of predetermined form of the former approach of-
fers greater flexibility but also increases the difficulty of the
inverse problem. As demonstrated in the previous work
(cf. Wang et al. [49]), even in the case where the inverse
problem is merely used to determine the optimal set of
choices among a handful of pre-determined components of
the elasto-plasticity model, the additional effort and cost to
solve the combinatorial optimization on top of the CPU time
required to identify the parameter identification process can
be enormous.
This complexity motivates us to propose this alternative
paradigm that enable us to learn the elasto-plasticity prob-
lem in a divide-and-conquered manner, i.e., (1) learning the
elasticity first, (2) then the initial yield function and (3) the
hardening/softening rules that evolve the yield function, all
with multilayer perceptrons. In the cases we demonstrated
here, there is no need to use recurrent neural networks that
are more difficult to train well and apply regularization than
the simpler multilayer perceptrons [50]. In the future, we
may explore proper ways to generate more complex rules for
the yield function evolution with recurrent neural network,
but this is out of the scope of this study.
5 Conclusion
We propose a generalized machine learning paradigm
capable of generating pressure-sensitive and rate-dependent
plasticity models consisting of with interpretable compo-
nents. The component approach enables geometrical inter-
pretation of the hyperelastic energy and yield function in
the corresponding stress and strain spaces. This treatment
allow us examine thermodynamic constraints through geo-
metrical interpretation (e.g. convexity) and provide a higher
degree of modularity and simulatability required to interpret
mechanisms of plastic deformation. In the numerical exper-
iments presented in this paper, we first verify the capacity of
the paradigm to recover existing plasticity models with the
corresponding data. Then we provide additional examples
to show that the revised Hamilton-Jacobi solution formu-
lated for rate-dependent plasticity may generate model from
experimental data for steel. Finally, the machine learning
paradigm is used to generate macroscopic elasto-plasticity
surrogate model from FFT simulations of polycrystal con-
sists of FCC grains. The resultant macroscopic surrogate
model is tested against FFT direct numerical simulation at
the Gauss point. The results of the numerical experiments
that the generated models are able to recover old plasticity
laws but also capable of deducing new ones, with a reason-
able level of predictive and descriptive accuracy for the given
amount of data. This interpretability is necessary for ensur-
ing trustworthiness for engineering applications.
The authors would like to thank Dr. Ran Ma for pro-
viding the implementation of the polycrystal microstructure
generation and the FFT solver. The authors are supported by
the NSF CAREER grant from Mechanics of Materials and
Structures program at National Science Foundation under
grant contracts CMMI-1846875 and OAC-1940203, the Dy-
namic Materials and Interactions Program from the Air Force
Office of Scientific Research under grant contracts FA9550-
17-1-0169 and FA9550-19-1-0318. These supports are grate-
fully acknowledged. The views and conclusions contained
in this document are those of the authors, and should not
be interpreted as representing the official policies, either ex-
pressed or implied, of the sponsors, including the Army Re-
search Laboratory or the U.S. Government. The U.S. Gov-
ernment is authorized to reproduce and distribute reprints for
Government purposes notwithstanding any copyright nota-
tion herein.
[1] Cheng, Z., Zhou, H., Lu, Q., Gao, H., and Lu, L., 2018.
“Extra strengthening and work hardening in gradient
nanotwinned metals”. Science, 362(6414).
[2] Van der Giessen, E., and Needleman, A., 1995. “Dis-
crete dislocation plasticity: a simple planar model”.
Modelling and Simulation in Materials Science and En-
gineering, 3(5), p. 689.
[3] Aydin, A., Borja, R. I., and Eichhubl, P., 2006. “Geo-
logical and mathematical framework for failure modes
in granular rock”. Journal of Structural Geology, 28(1),
pp. 83–98.
[4] Gurson, A. L., 1977. “Continuum Theory of Duc-
tile Rupture by Void Nucleation and Growth: Part I—
Yield Criteria and Flow Rules for Porous Ductile Me-
dia”. Journal of Engineering Materials and Technol-
ogy, 99(1), 01, pp. 2–15.
[5] Martin, C., Bouvard, D., and Shima, S., 2003. “Study
of particle rearrangement during powder compaction
by the discrete element method”. Journal of the Me-
chanics and Physics of Solids, 51(4), pp. 667–693.
[6] Coulomb, C. A., 1773. Essai sur une application des
regles de maximis et minimis a quelques problemes de
statique relatifs a l´
architecture. Tech. rep., Mem. Div.
Sav. Acad.
[7] Mises, R. v., 1913. “Mechanik der festen k¨
im plastisch-deformablen zustand”. Nachrichten von
der Gesellschaft der Wissenschaften zu G¨
Mathematisch-Physikalische Klasse, 1913, pp. 582–
[8] Drucker, D. C., and Prager, W., 1952. “Soil mechan-
ics and plastic analysis or limit design”. Quarterly of
applied mathematics, 10(2), pp. 157–165.
[9] Sun, X., Bahmani, B., Vlassis, N. N., Sun, W., and
Xu, Y., 2021. “Data-driven discovery of interpretable
causal relations for deep learning material laws with
uncertainty propagation”. Granular Matter, pp. doi:
[10] Ehlers, W., and Scholz, B., 2007. “An inverse algorithm
for the identification and the sensitivity analysis of the
parameters governing micropolar elasto-plastic granu-
11 Copyright c
lar material”. Archive of Applied Mechanics, 77(12),
pp. 911–931.
[11] Wang, K., Sun, W., Salager, S., Na, S., and Khaddour,
G., 2016. “Identifying material parameters for a micro-
polar plasticity model via x-ray micro-computed tomo-
graphic (ct) images: lessons learned from the curve-
fitting exercises”. International Journal for Multiscale
Computational Engineering, 14(4).
[12] Jang, J., and Smyth, A. W., 2017. “Model updating of a
full-scale fe model with nonlinear constraint equations
and sensitivity-based cluster analysis for updating pa-
rameters”. Mechanical Systems and Signal Processing,
83, pp. 337–355.
[13] Ghaboussi, J., Pecknold, D. A., Zhang, M., and Haj-
Ali, R. M., 1998. “Autoprogressive training of neu-
ral network constitutive models”. International Journal
for Numerical Methods in Engineering, 42(1), pp. 105–
[14] Ghaboussi, J., Garrett Jr, J., and Wu, X., 1991.
“Knowledge-based modeling of material behavior with
neural networks”. Journal of engineering mechanics,
117(1), pp. 132–153.
[15] Lefik, M., and Schrefler, B. A., 2003. “Artificial neu-
ral network as an incremental non-linear constitutive
model for a finite element code”. Computer meth-
ods in applied mechanics and engineering, 192(28-30),
pp. 3265–3283.
[16] Mozaffar, M., Bostanabad, R., Chen, W., Ehmann, K.,
Cao, J., and Bessa, M., 2019. “Deep learning predicts
path-dependent plasticity”. Proceedings of the National
Academy of Sciences, 116(52), pp. 26414–26420.
[17] Wang, K., and Sun, W., 2018. A multiscale multi-
permeability poroplasticity model linked by recur-
sive homogenizations and deep learning”. Computer
Methods in Applied Mechanics and Engineering, 334,
pp. 337–380.
[18] Chi, H., Zhang, Y., Tang, T. L. E., Mirabella, L., Dal-
loro, L., Song, L., and Paulino, G. H., 2021. “Univer-
sal machine learning for topology optimization”. Com-
puter Methods in Applied Mechanics and Engineering,
375, p. 112739.
[19] Baker, N., Alexander, F., Bremer, T., Hagberg, A.,
Kevrekidis, Y., Najm, H., Parashar, M., Patra, A.,
Sethian, J., Wild, S., et al., 2019. Workshop report
on basic research needs for scientific machine learn-
ing: Core technologies for artificial intelligence. Tech.
rep., USDOE Office of Science (SC), Washington, DC
(United States).
[20] Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl,
R., and Yu, B., 2019. “Definitions, methods, and appli-
cations in interpretable machine learning”. Proceed-
ings of the National Academy of Sciences, 116(44),
pp. 22071–22080.
[21] Liu, Y., Sun, W., Yuan, Z., and Fish, J., 2016. “A nonlo-
cal multiscale discrete-continuum model for predicting
mechanical behavior of granular materials”. Interna-
tional Journal for Numerical Methods in Engineering,
106(2), pp. 129–160.
[22] Wang, K., and Sun, W., 2016. A semi-implicit
discrete-continuum coupling method for porous media
based on the effective stress principle at finite strain”.
Computer Methods in Applied Mechanics and Engi-
neering, 304, pp. 546–583.
[23] Feyel, F., and Chaboche, J.-L., 2000. “Fe2 multi-
scale approach for modelling the elastoviscoplastic be-
haviour of long fibre sic/ti composite materials”. Com-
puter methods in applied mechanics and engineering,
183(3-4), pp. 309–330.
[24] Hartmaier, A., Buehler, M. J., and Gao, H., 2005.
“Multiscale modeling of deformation in polycrystalline
thin metal films on substrates”. Advanced Engineering
Materials, 7(3), pp. 165–169.
[25] Vlassis, N. N., and Sun, W., 2021. “Sobolev training
of thermodynamic-informed neural networks for inter-
pretable elasto-plasticity models with level set harden-
ing”. Computer Methods in Applied Mechanics and En-
gineering, 377, p. 113695.
[26] Molnar, C., Casalicchio, G., and Bischl, B., 2018. “iml:
An r package for interpretable machine learning”. Jour-
nal of Open Source Software, 3(26), p. 786.
[27] Coombs, W. M., and Motlagh, Y. G., 2017. “Nurbs
plasticity: yield surface evolution and implicit stress in-
tegration for isotropic hardening”. Computer Methods
in Applied Mechanics and Engineering, 324, pp. 204–
[28] Borja, R. I., 2013. Plasticity: modeling & computation.
Springer Science & Business Media.
[29] Bishop, C. M., et al., 1995. Neural networks for pattern
recognition. Oxford university press.
[30] Czarnecki, W. M., Osindero, S., Jaderberg, M.,
Swirszcz, G., and Pascanu, R., 2017. “Sobolev train-
ing for neural networks”.
[31] Roscoe, K., and Burland, J., 1970. “On the generalized
stress-strain behavior of “wet” clay: 60. k. h. roscoe
and j. b. burland. engineering plasticity (papers for a
conference held in cambridge, mar. 1968), cambridge,
university press, 535–609 (1968)”. Journal of Terrame-
chanics, 7(2), pp. 107–108.
[32] Argyris, J., Faust, G., Szimmat, J., Warnke, E., and
Willam, K., 1974. “Recent developments in the finite
element analysis of prestressed concrete reactor ves-
sels”. Nuclear Engineering and Design, 28(1), pp. 42–
[33] Chollet, F., et al., 2015. Keras.
[34] Cowper, G. R., and Symonds, P. S., 1957. Strain-
hardening and strain-rate effects in the impact loading
of cantilever beams. Tech. rep., Brown Univ Provi-
dence Ri.
[35] Dozat, T., 2016. Incorporating nesterov momentum
into adam.
[36] Kochmann, J., Wulfinghoff, S., Reese, S., Mianroodi,
J. R., and Svendsen, B., 2016. “Two-scale fe-fft-and
phase-field-based computational modeling of bulk mi-
crostructural evolution and macroscopic material be-
havior”. Computer Methods in Applied Mechanics and
Engineering, 305, pp. 89–110.
12 Copyright c
[37] Fuchs, A., Heider, Y., Wang, K., Sun, W., and Kaliske,
M., 2021. “Dnn2: A hyper-parameter reinforcement
learning game for self-design of neural network based
elasto-plastic constitutive descriptions”. Computers &
Structures, 249, p. 106505.
[38] Pascanu, R., Mikolov, T., and Bengio, Y., 2013. “On
the difficulty of training recurrent neural networks”. In
International conference on machine learning, PMLR,
pp. 1310–1318.
[39] Xu, K., Huang, D. Z., and Darve, E., 2021. “Learning
constitutive relations using symmetric positive definite
neural networks”. Journal of Computational Physics,
428, p. 110072.
[40] Giunta, A., Wojtkiewicz, S., and Eldred, M., 2003.
“Overview of modern design of experiments methods
for computational simulations”. In 41st Aerospace Sci-
ences Meeting and Exhibit, p. 649.
[41] Hashash, Y., Jung, S., and Ghaboussi, J., 2004. “Nu-
merical implementation of a neural network based ma-
terial model in finite element analysis”. International
Journal for numerical methods in engineering, 59(7),
pp. 989–1005.
[42] de Souza Neto, E. A., Peric, D., and Owen, D. R., 2011.
Computational methods for plasticity: theory and ap-
plications. John Wiley & Sons.
[43] Abbo, A., and Sloan, S., 1995. “A smooth hyperbolic
approximation to the mohr-coulomb yield criterion”.
Computers & structures, 54(3), pp. 427–441.
[44] Matsuoka, H., and Nakai, T., 1974. “Stress-
deformation and strength characteristics of soil under
three different principal stresses”. In Proceedings of
the Japan Society of Civil Engineers, no. 232, Japan
Society of Civil Engineers, pp. 59–70.
[45] Vlassis, N. N., Ma, R., and Sun, W., 2020. “Geomet-
ric deep learning for computational mechanics part i:
Anisotropic hyperelasticity”. Computer Methods in Ap-
plied Mechanics and Engineering, 371, p. 113299.
[46] Scarselli, F., and Tsoi, A. C., 1998. “Universal approxi-
mation using feedforward neural networks: A survey of
some existing methods, and some new results”. Neural
networks, 11(1), pp. 15–37.
[47] Parhi, R., and Nowak, R. D., 2020. “Banach space
representer theorems for neural networks and ridge
splines”. arXiv preprint arXiv:2006.05626.
[48] Weinan, E., and Wojtowytsch, S., 2020. “On the
banach spaces associated with multi-layer relu net-
works: Function representation, approximation the-
ory and gradient descent dynamics”. arXiv preprint
[49] Wang, K., Sun, W., and Du, Q., 2019. “A cooperative
game for automated learning of elasto-plasticity knowl-
edge graphs and models with ai-guided experimenta-
tion”. Computational Mechanics, 64(2), pp. 467–499.
[50] Zaremba, W., Sutskever, I., and Vinyals, O., 2014. “Re-
current neural network regularization”. arXiv preprint
13 Copyright c
... Those include model-free data-driven methods that operate directly on the underlying data set, as presented in [1]. On the other hand, model-based approaches that rely on training artificial neural networks (ANNs) on the data set have been applied to large success for constitutive modeling [2][3][4]. However, all those descriptions rely on a constitutive law or a single mesostructural representation with characteristic properties. ...
Full-text available
Numerical modeling and optimization of advanced composite materials can require huge computational effort when considering their heterogeneous mesostructure and interactions between different material phases within the framework of multiscale modeling. Employing machine learning methods for computational homogenization enables the reduction of computational effort for the evaluation of the mesostructural behavior while retaining high accuracy. Classically, one unit cell with representative characteristics of the material is chosen for the description of the heterogeneous structure, which presents a simplification of the actual composite. This contribution presents a neural network-based approach for computational homogenization of composite materials with the ability to consider arbitrary compositions of the mesostructure. Therefore, various statistical volume elements and their respective constitutive responses are evaluated. Thereby, the naturally occurring fluctuation within the composition of the phases can be considered. Different approaches using distinct metrics to represent the arbitrary mesostructures are investigated in terms of required computational effort and accuracy.
... [37][38][39][40] Thereby, the response of a representative volume element is reproduced by a network including a collection of connected mechanistic building blocks with analytical homogenization solutions, which enables to describe a complex effective response without the loss of essential physics. Within the works [41][42][43][44][45][46] the idea to replace parts of classical models with NNs is pursued to achieve this. For example, the yield function or the evolution equations are described by FNNs instead of using a particular model. ...
Full-text available
The mathematical formulation of constitutive models to describe the path‐dependent, that is, inelastic, behavior of materials is a challenging task and has been a focus in mechanics research for several decades. There have been increased efforts to facilitate or automate this task through data‐driven techniques, impelled in particular by the recent revival of neural networks (NNs) in computational mechanics. However, it seems questionable to simply not consider fundamental findings of constitutive modeling originating from the last decades research within NN‐based approaches. Herein, we propose a comparative study on different feedforward and recurrent neural network architectures to model 1D small strain inelasticity. Within this study, we divide the models into three basic classes: black box NNs, NNs enforcing physics in a weak form, and NNs enforcing physics in a strong form. Thereby, the first class of networks can learn constitutive relations from data while the underlying physics are completely ignored, whereas the latter two are constructed such that they can account for fundamental physics, where special attention is paid to the second law of thermodynamics in this work. Conventional linear and nonlinear viscoelastic as well as elastoplastic models are used for training data generation and, later on, as reference. After training with random walk time sequences containing information on stress, strain, and—for some models—internal variables, the NN‐based models are compared to the reference solution, whereby interpolation and extrapolation are considered. Besides the quality of the stress prediction, the related free energy and dissipation rate are analyzed to evaluate the models. Overall, the presented study enables a clear recording of the advantages and disadvantages of different NN architectures to model inelasticity and gives guidance on how to train and apply these models.
... For instance, the material model in Ames et al. [1] requires 37 material parameters and complex forms to capture the thermo-mechanical response of amorphous polymers, and Ma and Sun [45] requires 25 material parameters to capture the crystal plasticity and phase transition of salt under high pressure and high temperature. While the use of machine learning to generate constitutive laws [30,64] may enable one to bypass the need to identify a particular model form, the data required to obtain a sufficiently accurate neural network or Gaussian process model may be considerable and complex/ambiguous, and hence costly to obtain [16,22,66,67]. ...
Full-text available
Experimental data are often costly to obtain, which makes it difficult to calibrate complex models. For many models an experimental design that produces the best calibration given a limited experimental budget is not obvious. This paper introduces a deep reinforcement learning (RL) algorithm for design of experiments that maximizes the information gain measured by Kullback–Leibler divergence obtained via the Kalman filter (KF). This combination enables experimental design for rapid online experiments where manual trial-and-error is not feasible in the high-dimensional parametric design space. We formulate possible configurations of experiments as a decision tree and a Markov decision process, where a finite choice of actions is available at each incremental step. Once an action is taken, a variety of measurements are used to update the state of the experiment. This new data leads to a Bayesian update of the parameters by the KF, which is used to enhance the state representation. In contrast to the Nash–Sutcliffe efficiency index, which requires additional sampling to test hypotheses for forward predictions, the KF can lower the cost of experiments by directly estimating the values of new data acquired through additional actions. In this work our applications focus on mechanical testing of materials. Numerical experiments with complex, history-dependent models are used to verify the implementation and benchmark the performance of the RL-designed experiments.
Full-text available
This paper introduces a neural kernel method to generate machine learning plasticity models for micropolar and micromorphic materials that lack material symmetry and have internal structures. Since these complex materials often require higher-dimensional parametric space to be precisely characterized, we introduce a representation learning step where we first learn a feature vector space isomorphic to a finite-dimensional subspace of the original parametric function space from the augmented labeled data expanded from the narrow band of the yield data. This approach simplifies the data augmentation step and enables us to constitute the high-dimensional yield surface in a feature space spanned by the feature kernels. In the numerical examples, we first verified the implementations with data generated from known models, then tested the capacity of the models to discover feature spaces from meso-scale simulation data generated from representative elementary volume (RVE) of heterogeneous materials with internal structures. The neural kernel plasticity model and other alternative machine learning approaches are compared in a computational homogenization problem for layered geomaterials. The results indicate that the neural kernel feature space may lead to more robust forward predictions against sparse and high-dimensional data.
Full-text available
Conventional neural network elastoplasticity models are often perceived as lacking interpretability. This paper introduces a two-step machine-learning approach that returns mathematical models interpretable by human experts. In particular, we introduce a surrogate model where yield surfaces are expressed in terms of a set of single-variable feature mappings obtained from supervised learning. A postprocessing step is then used to re-interpret the set of single-variable neural network mapping functions into mathematical form through symbolic regression. This divide-and-conquer approach provides several important advantages. First, it enables us to overcome the scaling issue of symbolic regression algorithms. From a practical perspective, it enhances the portability of learned models for partial differential equation solvers written in different programming languages. Finally, it enables us to have a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning. Numerical examples have been provided, along with an open-source code to enable third-party validation.
Full-text available
The shapes and morphological features of grains in sand assemblies have far-reaching implications in many engineering applications, such as geotechnical engineering, computer animations, petroleum engineering, and concentrated solar power. Yet, our understanding of the influence of grain geometries on macroscopic response is often only qualitative, due to the limited availability of high-quality 3D grain geometry data. In this paper, we introduce a denoising diffusion algorithm that uses a set of point clouds collected from the surface of individual sand grains to generate grains in the latent space. By employing a point cloud autoencoder, the three-dimensional point cloud structures of sand grains are first encoded into a lower-dimensional latent space. A generative denoising diffusion probabilistic model is trained to produce synthetic sand that maximizes the log-likelihood of the generated samples belonging to the original data distribution measured by a Kullback-Leibler divergence. Numerical experiments suggest that the proposed method is capable of generating realistic grains with morphology, shapes and sizes consistent with the training data inferred from an F50 sand database . We then use a rigid contact dynamic simulator to pour the synthetic sand in a confined volume to form granular assemblies in a static equilibrium state with targeted distribution properties. To ensure third-party validation, 50,000 synthetic sand grains and the 1,542 real synchrotron microcomputed tomography (SMT) scans of the F50 sand, as well as the granular assemblies composed of synthetic sand grains are made available in an open-source repository.
Full-text available
This paper introduces a publicly available PyTorch-ABAQUS deep-learning framework of a family of plasticity models where the yield surface is implicitly represented by a scalar-valued function. In particular, our focus is to introduce a practical framework that can be deployed for engineering analysis that employs a user-defined material subroutine (UMAT/VUMAT) for ABAQUS, which is written in FORTRAN. To accomplish this task while leveraging the back-propagation learning algorithm to speed up the neural-network training, we introduce an interface code where the weights and biases of the trained neural networks obtained via the PyTorch library can be automatically converted into a generic FORTRAN code that can be a part of the UMAT/VUMAT algorithm. To enable third-party validation, we purposely make all the data sets, source code used to train the neural-network-based constitutive models, and the trained models available in a public repository. Furthermore, the practicality of the workflow is then further tested on a dataset for anisotropic yield function to showcase the extensibility of the proposed framework. A number of representative numerical experiments are used to examine the accuracy, robustness, and reproducibility of the results generated by the neural network models.
Full-text available
We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.
Full-text available
This paper presents a computational framework that generates ensemble predictive mechanics models with uncertainty quantification (UQ). We first develop a causal discovery algorithm to infer causal relations among time-history data measured during each representative volume element (RVE) simulation through a directed acyclic graph. With multiple plausible sets of causal relationships estimated from multiple RVE simulations, the predictions are propagated in the derived causal graph while using a deep neural network equipped with dropout layers as a Bayesian approximation for UQ. We select two representative numerical examples (traction-separation laws for frictional interfaces, elastoplasticity models for granular assembles) to examine the accuracy and robustness of the proposed causal discovery method for the common material law predictions in civil engineering applications. Graphic abstract
Full-text available
We introduce a deep learning framework designed to train smoothed elastoplasticity models with interpretable components, such as stored elastic energy function, field surface, and plastic flow that may evolve based on a set of deep neural network predictions. By recasting the yield function as an evolving level set, we introduce a deep learning approach to deduce the solutions of the Hamilton-Jacobi equation that governs the hardening/softening mechanism. This machine learning hardening law may recover any classical hand-crafted hardening rules and discover new mechanisms that are either unbeknownst or difficult to express with mathematical expressions. Leveraging Sobolev training to gain control over the derivatives of the learned functions, the resultant machine learning elastoplasticity models are thermody-namically consistent, interpretable, while exhibiting excellent learning capacity. Using a 3D FFT solver to create a polycrystal database, numerical experiments are conducted and the implementations of each component of the models are individually verified. Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than those obtained from black-box deep neural network models such as the recurrent neural network, the 1D convolutional neural network, and the multi-step feed-forward models.
Full-text available
We present a new neural-network architecture, called the Cholesky-factored symmetric positive definite neural network (SPD-NN), for modeling constitutive relations in computational mechanics. Instead of directly predicting the stress of the material, the SPD-NN trains a neural network to predict the Cholesky factor of the tangent stiffness matrix, based on which the stress is calculated in incremental form. As a result of this special structure, SPD-NN weakly imposes convexity on the strain energy function, satisfies the second order work criterion (Hill's criterion) and time consistency for path-dependent materials, and therefore improves numerical stability, especially when the SPD-NN is used in finite element simulations. Depending on the types of available data, we propose two training methods, namely direct training for strain and stress pairs and indirect training for loads and displacement pairs. We demonstrate the effectiveness of SPD-NN on hyperelastic, elasto-plastic, and multiscale fiber-reinforced plate problems from solid mechanics. The generality and robustness of SPD-NN make it a promising tool for a wide range of constitutive modeling applications.
Full-text available
We present a machine learning approach that integrates geometric deep learning and Sobolev training to generate a family of finite strain anisotropic hyperelastic models that predict the homogenized responses of polycrystals previously unseen during the training. While hand-crafted hyperelasticity models often incorporate homogenized measures of microstructural attributes, such as the porosity or the averaged orientation of constitutes, these measures may not adequately reflect the topological structures of the attributes. We fill this knowledge gap by introducing the concept of the weighted graph as a new high-dimensional descriptor that represents topological information, such as the connectivity of anisotropic grains in an assemble. By leveraging a graph convolutional deep neural network in a hybrid machine learning architecture previously used in Frankel et al. 2019, the artificial intelligence extracts low-dimensional features from the weighted graphs and subsequently learns the influence of these low-dimensional features on the resultant stored elastic energy functionals. To ensure smoothness and prevent unintentionally generating a non-convex stored energy functional, we adopt the Sobolev training method for neural networks such that a stress measure is obtained implicitly by taking directional derivatives of the trained energy functional. Results from numerical experiments suggest that Sobolev training is capable of generating a hyperelastic energy functional that predicts both the elastic energy and stress measures more accurately than the classical training that minimizes L2 norm. Verification exercises against unseen benchmark FFT simulations and phase-field fracture simulations using the geometric learning generated elastic energy functional are conducted to demonstrate the quality of the predictions.
Full-text available
We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.
This contribution presents a meta-modeling framework that employs artificial intelligence to design a neural network that replicates the path-dependent constitutive responses of composite materials sampled by a numerical testing procedure of Representative Volume Elements (RVE). A Deep Reinforcement Learning (DRL) combinatorics game is invented to automatically search for the optimal set of hyper-parameters from a decision tree. Besides the typical hyper-parameters for ANN training, such as the network topology, the size and composition of the considered training data are incorporated as additional hyper-parameters to help investigate the amount of data necessary for training and validation. The proposed meta modeling framework is able to identify hyper-parameter configurations with a weighted trade-off between prediction accuracy and computational cost. The capabilities and limitations of the introduced framework are shown and discussed via several numerical examples. Moreover, the possibility of transferring the gained knowledge of hyper-parameters among different RVE is explored in numerical experiments.
We put forward a general machine learning-based topology optimization framework, which greatly accelerates the design process of large-scale problems, without sacrifice in accuracy. The proposed framework has three distinguishing features. First, a novel online training concept is established using data from earlier iterations of the topology optimization process. Thus, the training is done during, rather than before, the topology optimization. Second, a tailored two-scale topology optimization formulation is adopted, which introduces a localized online training strategy. This training strategy can improve both the scalability and accuracy of the proposed framework. Third, an online updating scheme is synergistically incorporated, which continuously improves the prediction accuracy of the machine learning models by providing new data generated from actual physical simulations. Through numerical investigations and design examples, we demonstrate that the aforementioned framework is highly scalable and can efficiently handle design problems with a wide range of discretization levels, different load and boundary conditions, and various design considerations (e.g., the presence of non-designable regions).
Stronger copper through twin power Materials with structural gradients often have unique combinations of properties. Gradient-structured materials are found in nature and can be engineered. Cheng et al. made a structural gradient by introducing gradients of crystallographic twins into copper. This strategy creates bundles of dislocations in the crystal interiors, which makes the metal stronger than any of the individual components. This method offers promise for developing high-performance metals. Science , this issue p. eaau1925