PreprintPDF Available

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This paper is the first attempt to use geometric deep learning and Sobolev training to incorporate non-Euclidean microstructural data such that anisotropic hyperelastic material machine learning models can be trained in the finite deformation range. While traditional hyperelasticity models often incorporate homogenized measures of microstructural attributes, such as porosity averaged orientation of constitutes, these measures cannot reflect the topological structures of the attributes. We fill this knowledge gap by introducing the concept of weighted graph as a new mean to store topological information, such as the connectivity of anisotropic grains in assembles. Then, by leveraging a graph convolutional deep neural network architecture in the spectral domain, we introduce a mechanism to incorporate these non-Euclidean weighted graph data directly as input for training and for predicting the elastic responses of materials with complex microstructures. To ensure smoothness and prevent non-convexity of the trained stored energy functional, we introduce a Sobolev training technique for neural networks such that stress measure is obtained implicitly from taking directional derivatives of the trained energy functional. By optimizing the neural network to approximate both the energy functional output and the stress measure, we introduce a training procedure the improves efficiency and generalize the learned energy functional for different microstructures. The trained hybrid neural network model is then used to generate new stored energy functional for unseen microstructures in a parametric study to predict the influence of elastic anisotropy on the nucleation and propagation of fracture in the brittle regime.
Content may be subject to copyright.
Computer Methods in Applied Mechanics and Engineering manuscript No.
(will be inserted by the editor)
Geometric deep learning for computational mechanics Part I: Anisotropic1
Nikolaos N. Vlassis ·Ran Ma ·WaiChing Sun3
Received: May 18, 2020/ Accepted: date5
Abstract We present a machine learning approach that integrates geometric deep learning and Sobolev6
training to generate a family of finite strain anisotropic hyperelastic models that predict the homoge-7
nized responses of polycrystals previously unseen during the training. While hand-crafted hyperelasticity8
models often incorporate homogenized measures of microstructural attributes, such as the porosity or9
the averaged orientation of constitutes, these measures may not adequately reflect the topological struc-10
tures of the attributes. We fill this knowledge gap by introducing the concept of the weighted graph as11
a new high-dimensional descriptor that represents topological information, such as the connectivity of12
anisotropic grains in an assemble. By leveraging a graph convolutional deep neural network in a hybrid13
machine learning architecture previously used in Frankel et al. (2019), the artificial intelligence extracts14
low-dimensional features from the weighted graphs and subsequently learns the influence of these low-15
dimensional features on the resultant stored elastic energy functionals. To ensure smoothness and prevent16
unintentionally generating a non-convex stored energy functional, we adopt the Sobolev training method17
for neural networks such that a stress measure is obtained implicitly by taking directional derivatives of18
the trained energy functional. Results from numerical experiments suggest that Sobolev training is capa-19
ble of generating a hyperelastic energy functional that predicts both the elastic energy and stress measures20
more accurately than the classical training that minimizes L2norms. Verification exercises against unseen21
benchmark FFT simulations and phase-field fracture simulations using the geometric learning generated22
elastic energy functional are conducted to demonstrate the quality of the predictions.23
Keywords geometric machine learning; graph; polycrystals; microstructures; anisotropic energy func-24
tional; phase-field fracture25
1 Introduction26
Conventional constitutive modeling efforts often rely on human interpretations of geometric descrip-27
tors of microstructures. These descriptors, such as volume fraction of void/constituents, dislocation den-28
sity, twinning, degradation function, slip system, orientation, and shape factors are often incorporated as29
state variables in a system of ordinary differential equations that leads to the constitutive responses at a ma-30
terial point. Classical examples include the family of Gurson models in which the volume fraction of voids31
is related to ductile fracture (Gurson,1977;Needleman,1987;Zhang et al.,2000;Nahshon and Hutchinson,32
2008;Nielsen and Tvergaard,2010), critical state plasticity in which porosity and over-consolidation ratio33
dictates the plastic dilatancy and hardening law (Schofield and Wroth,1968;Borja and Lee,1990;Manzari34
and Dafalias,1997;Sun,2013;Liu et al.,2016;Wang et al.,2016b) and crystal plasticity where the activation35
of slip systems leads to plastic deformation (Anand and Kothari,1996;Na and Sun,2018;Ma et al.,2018).36
Corresponding author: WaiChing Sun
Assistant Professor, Department of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail
Code: 4709, New York, NY 10027 Tel.: 212-854-3143, Fax: 212-854-6267, E-mail:
2 Nikolaos N. Vlassis et al.
In these cases, a specific subset of descriptors is often incorporated manually such that the most crucial37
deformation mechanisms for the stress-strain relationships are described mathematically.38
While this approach has achieved a level of success, especially for isotropic materials, materials of39
complex microstructures often require more complex geometric and topological descriptors to sufficiently40
describe the geometrical features (Jerphagnon et al.,1978;Sun and Mota,2014;Kuhn et al.,2015). The hu-41
man interpretation limits the complexity of the state variables and may lead to lost opportunity of utilizing42
all the available information for the microstructure, which could in turn reduce the prediction quality. A43
data-driven approach should be considered to discover constitutive law mechanisms when human inter-44
pretation capabilities become restrictive (Kirchdoerfer and Ortiz,2016;Eggersmann et al.,2019;He and45
Chen,2019;Stoffel et al.,2019;Bessa et al.,2017;Liu et al.,2018). In this work, we consider the general form46
of a strain energy functional that reads,47
F, (1)
where Gis a graph that stores the non-Euclidean data of the microstructures (e.g. crystal connectivity, grain48
connectivity). Specifically, we attempt to train a neural network approximator of the anisotropic stored49
elastic energy functional across different polycrystals with the sole extra input to describe the anisotropy50
being the weighted crystal connectivity graph.51
Fig. 1: Polycrystal interpreted as a weighted connectivity graph. The graph is undirected and weighted at
the nodes.
It can be difficult to directly incorporate either Euclidean or non-Euclidean data to a hand-crafted con-52
stitutive model. There have been attempts to infer information directly from scanned microstructural im-53
ages using neural networks that utilize a convolutional layer architecture (CNN) (Lubbers et al.,2017). The54
endeavor to distill physically meaningful and interpretable features from scanned microstructural images55
stored in a Euclidean grid can be a complex and sometimes futile process. While recent advancements in56
convolutional neural networks have provided an effective means to extract features that lead to extraordi-57
nary superhuman performance for image classification tasks (Krizhevsky et al.,2012), similar success has58
not been recorded for mechanics predictions. Image-related problems, such as camera noise, saturation,59
image compression as well as ring artifacts, which often occur in micro-CT images, may lead to issues in60
the deconvolution operators and, in some cases, may constitute an obstacle in acquiring useful and in-61
terpretable features from the image dataset (Xu et al.,2014). In some cases, over-fitting and under-fitting62
can both render the trained CNN extremely vulnerable to adversarial attacks and hence not suitable for63
high-risk, high-regret applications.64
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 3
As demonstrated in previous works (Jones et al.,2018;Frankel et al.,2019), using images directly as65
an additional input to the polycrystal energy functional approximator may be contingent to the quality66
and size of the training pool. A large number of images, possibly in three dimensions, and in high enough67
resolution would be necessary to represent the latent features that will aid the approximator to distinguish68
successfully between different polycrystals. Using data in a Euclidean grid is an esoteric process that is69
dependent on empirical evidence that the current training sample holds adequate information to infer70
features useful in formulating a constitutive law. However, gathering that evidence can be a laborious71
process as it requires numerous trial and error runs and is weighed down by the heavy computational72
costs of performing filtering on Euclidean data (e.g. on high resolution 3D image voxels).73
Graph representation of the data structures can provide a momentous head-start to overcome this very74
impediment. An example is the connectivity graph used in granular mechanics community where the75
formations and evolution of force chains are linked to macroscopic phenomena, such as shear band forma-76
tion and failures (Satake,1992;Kuhn et al.,2015;Sun et al.,2013;Tordesillas et al.,2014;Wang and Sun,77
2019a,b). The distinct advantage of the graph representation of data is the relatively high interpretability of78
the data structures (Kuhn et al.,2015;Wang and Sun,2019a). This graph representation is not only helpful79
for understanding the topology of interrelated entities in a network but also provides a convenient means80
to create universal and interpretable features via graph convolutional neural networks (Altae-Tran et al.,81
2017;Xie and Grossman,2018).82
At the same time, by concisely selecting appropriate graph weights, one may incorporate only the es-83
sential information of micro-structural data critical for mechanics predictions which could prove to be more84
interpretable, flexible, economical, and efficient than incorporating feature spaces inferred from 3D voxel85
images. Furthermore, since one may easily use rotational and transitional invariant data as weights, the86
graph approach is also advantageous for predicting constitutive responses that require frame indifference.87
Currently, machine learning applications often employ two families of algorithms to take graphs as88
inputs, i.e., representation learning algorithms and graph neural networks. The former usually refers to89
unsupervised methods that convert graph data structures into formats or features that are easily compre-90
hensible by machine learning algorithms (Bengio et al.,2013). The latter refers to neural network algorithms91
that accept graphs as inputs with layer formulations that can operate directly on graph structures (Scarselli92
et al.,2008). Representation learning on graphs shares concepts with the rather popular embedding tech-93
niques on text and speech recognition (Mikolov et al.,2013) to encode the input in a vector format that94
can be utilized by common regression and classification algorithms. There have been multiple studies on95
encoding graph structures, spanning from the level of nodes (Grover and Leskovec,2016) up to the level96
of entire graphs (Perozzi, Al-Rfou, and Skiena,2014;Narayanan, Chandramohan, Venkatesan, Chen, Liu,97
and Jaiswal,2017). Graph embedding algorithms, like DeepWalk (Perozzi et al.,2014), utilize techniques98
such as random walks to ”read” sequences of neighboring nodes resembling reading word sequences in a99
sentence and encode those graph data in an unsupervised fashion.100
While these algorithms have been proven to be rather powerful and demonstrate competitive results in101
tasks like classification problems, they do come with disadvantages that can be limiting for use in engineer-102
ing problems. Graph representation algorithms work very well on encoding the training dataset. However,103
they could be difficult to generalize and cannot accommodate dynamic data structures. This can be proven104
problematic for mechanics problems, where we expect a model to be as generalized as much as possible in105
terms of material structure variations (e.g. polycrystals, granular assemblies). Furthermore, representation106
learning algorithms can be difficult to combine with another neural network architecture for a supervised107
learning task in a sequential manner. In particular, when the representation learning is performed sepa-108
rately and independently from the supervised learning task that generates the energy functional approxi-109
mation, there is no guarantee that the clustering or classifications obtained from the representative learning110
are physically meaningful. Hence, the representation learning may not be capable of generating features111
that facilitate the energy functional prediction task in a completely unsupervised setting.112
For the above reasons, we have opted for a hybrid neural network architecture that combines an unsu-113
pervised graph convolutional neural network with a multilayer perceptron to perform the regression task114
of predicting an energy functional. Both branches of our suggested hybrid architecture learn simultane-115
ously from the same back-propagation process with a common loss function tailored to the approximated116
function. The graph encoder part - borrowing its name from the popular autoencoder architecture (Ran-117
zato et al.,2007;Vincent et al.,2008) - learns and adjusts its weights to encode input graphs in a manner118
4 Nikolaos N. Vlassis et al.
that serves the approximation task at hand. Thus, it does eliminate the obstacle of trying to coordinate119
the asynchronous steps of graph embedding and approximator training by parallel fitting both the graph120
encoder and the energy functional approximator with a common training goal (loss function).121
As for notations and symbols in this current work, bold-faced letters denote tensors (including vectors122
which are rank-one tensors); the symbol ’·’ denotes a single contraction of adjacent indices of two tensors123
(e.g. a·b=aibior c·d=cij djk ); the symbol ‘:’ denotes a double contraction of adjacent indices of tensor124
of rank two or higher ( e.g. C:ee=Cijk l ee
kl ); the symbol ‘’ denotes a juxtaposition of two vectors (e.g.125
ab=aibj) or two symmetric second order tensors (e.g. (αβ)ijkl =αij βkl ). Moreover, (αβ)ijk l =αjl βik
and (αβ)ijkl =αil βjk. We also define identity tensors (I)ij =δij ,(I4)ijkl =δikδjl, and (I4
sym)i jkl =127
2(δikδjl +δil δkj), where δij is the Kronecker delta. As for sign conventions, unless specified otherwise, we128
consider the direction of the tensile stress and dilative pressure as positive.129
2 Graphs as non-Euclidean descriptors for micro-structures130
This section provides a detailed account of how to incorporate microstructural data represented by131
weighted graphs as descriptors for modeling hyperelastic responses of different microstructures. In partic-132
ular, we describe how topological information of an assemble composed of grains with different properties133
can be effectively represented as a node-weighted graph (Section 2.1). A brief review of some basic con-134
cepts of graph theory is included in Appendix A. An illustrative example of the graph representation for a135
polycrystal structure in Figure 2is included in Appendix B.136
2.1 Polycrystals represented as node-weighted undirected graphs137
Inferring microstructural data as weighted graphs from field data may require pooling, a down-sampling138
procedure to convert field data of a specified domain into low-dimensional features that preserve topolog-139
ical information. Examples of applications of pooling inferring include the grain connectivity graph from140
micro-CT images of assembles (Jaquet et al.,2013;Wang et al.,2016a) or realization of micro-structures141
generated from software packages such as Neper or DREAM.3D (Quey et al.,2011;Groeber and Jackson,142
2014). In these cases, the node and edge set can be defined in a rather intuitive manner, as the micro-143
structures are formed by assembles consisting of parts (grains) connected in a specific way represented by144
the edge set as illustrated in Figure 2. In this work, we treat each crystal grain as a node or vertex in a graph145
and create an edge for each in-contact grain pair. Attributes of each grain are represented in a collection146
of node weights. The features of the edges (such as the contact surface areas, roughness) are neglected to147
simplify the learning procedures but will be considered in the future. For simplicity, we also assume that148
the polycrystal contains no voids and the contacts remain intact.149
Given a polycrystal microstructure consisting of a finite number of crystal grains N, we define the150
graph representation used in this work. The shape of the grains are idealized as polyhedrons such that the151
faces of each grain may be in contact with at most one face of the other grain. As such, an edge is assigned152
between each in-contact (adjacent) grain pair such that there exist Eedges in the graph. The collection of153
the grains is then represented as a vertex set V={v1, ..., vN}, and the collection of edges as a edge set154
EV×V. There can only be one unique edge defined between two vertices and the order of the vertices155
of an edge does not matter - i.e. the pairs are unordered or undirected. As a result, the connectivity of the156
polycrystal can then be represented by an undirected graph G= (V,E)where V={v1, ..., vN}(cf. Def.157
Note that Galone only provides the connectivity information. To predict the elasticity of the polycrys-159
tals, more information about the geometrical features and mechanical properties of grains must be ex-160
tracted in the machine learning process. In our design, these features and properties are stored as weights161
assigned to each vertex and the purpose of the geometric learning is to find a set of descriptors of lower162
dimensions than the weighted graph such that they can be directly incorporated into the energy functional163
calculations. For each vertex viin the graph, we define a feature vector fi={fi
1, ..., fi
D}where Dis the164
number of node weights that represent the geometrical features (e.g. size, number of faces, aspect ratios)165
and mechanical properties (e.g. elastic moduli, crystal orientation) of the grain vi. In this work, the feature166
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 5
(a) (b)
Fig. 2: A illustrative example of a polycrystal (a) represented as an undirected weighted graph (b). If two
crystals in the formation share a face, their nodes are also connected in the graph. Each node is weighted
by two features fAand fBin this illustrative example.
vectors store information about the volume, the crystal orientation (in Euler angles), the total surface area,167
the number of faces, the numbers of neighbors, as well as other shape descriptors, (e.g. the equivalent168
diameter) for every crystal grain in the polycrystals. The set of node weights for the entire graph reads169
F={f1,f2, ..., fN}. (2)
Remark 1 Note that including the edge weights in the learning problems is likely to provide a more rich170
source of data for the learning problems. Information such as the attributes of each contact between each171
grain pair, including the contact surface areas and the angle of contact - could be used as weights for the172
edges of the graph. While this current work is solely focused on node-weighted graphs, future work will173
examine an efficient way to generate energy functional from a node-weighted/edge-weighted graph.174
2.2 Adjacency matrix, graph Laplacian and feature matrix175
In order to prepare the available data for the geometric learning, it is often more convenient to adopt a176
matrix representation of a graph. In this work, the geometric learning algorithm used requires the normal-177
ized symmetric Laplacian matrix Lsym and the node feature matrix Xas inputs (see Appendix A).178
First of all, we define the node feature matrix Xby simply stacking the feature vectors fitogether.179
As such, the dimension of the node feature matrix Xis N×D. The symmetric normalized Laplacian is180
obtained from the Laplacian matrix L=DAwhere Aand Dare the adjacency and degree matrices (cf.181
Def. 9) and Def. 10) . The adjacency matrix Ais a symmetric matrix of dimensions N×Nrepresenting the182
connectivity of the microstructure. The entries αij of the matrix Aare defined as,183
αij =1, viis adjacent to vj
0, otherwise. (3)
The degree diof a vertex viis defined as the total number of adjacent (neighboring) vertices to vior,184
equivalently, the number of crystal grains in contact with the crystal grain vi. The matrix Dis a N×N185
diagonal matrix. The entries rij of the diagonal matrix Dare defined as,186
rij =di,i=j
0, otherwise. (4)
The symmetric normalized Laplacian matrix Lsym =D1/2 LD1/2 represents the graph connectivity187
structure and is one of the inputs for the geometric learning algorithm described in Section 3.2. The matrix188
Lsym is of N×Ndimensions. The entries lsym
ij of the matrix Lsym read,189
6 Nikolaos N. Vlassis et al.
ij =
1, i=jand di6=0
2,i6=jand viis adjacent to vj
0, otherwise.
Note that while the Laplacian matrix Land the symmetric normalized Laplacian matrix Lsym both190
represent the connectivity of the grains in the polycrystals, the normalized Lsym is often a more popular191
choice for spectral-based graph convolutions due to the symmetric and positive-semi-definite properties,192
as well as properties .193
3 Deep learning on graphs194
Machine learning often involves algorithms designed to generate functions to represent the available195
data. Some common applications in machine learning are those of regression and classification. A regres-196
sion algorithm attempts to make predictions of a numerical value provided with input data. A classifi-197
cation algorithm attempts to assign a label to an input and place it to one or multiple classes/categories198
that it belongs to. Classification tasks can be supervised, if information for the true labels of the inputs is199
available during the learning process. Classification tasks can also be unsupervised, if the algorithm is not200
exposed to the true labels of the input during the learning process but attempts to infer labels for the input201
by learning properties of the input dataset structure. The hybrid geometric learning neural network intro-202
duced in this work simultaneously performs an unsupervised classification of polycrystal graph structures203
and the regression of an anisotropic elastic energy potential functional. This combination of unsupervised204
learning classification and supervised learning regression has been first adopted for solid mechanics prob-205
lems in Frankel et al. (2019) where convolutional neural network is used to generate features of 3D voxel206
images to aid predictions of elasto-plastic responses under monotonically increasing strain. The only fea-207
ture of the grain incorporated in the learning problem is the crystal orientation. In our design, a 3D voxel208
image is first converted into a lower dimensional weighted graph that contains only connectivity infor-209
mation stored in the symmetric normalized graph Laplacian Lsym, while we consider multiple effective210
properties of each grain stored in the matrix X. Then, a geometric learning encoder is trained to provide a211
even lower dimensional latent representation of the weighted graph to aid the predictions of hyperelastic212
energy functional.213
In this section, we provide a brief description of the supervised and unsupervised components of the214
hybrid architecture. The supervised learning component conducted via regression with the multilayer per-215
ceptron (MLP) is reviewed in Section 3.1. While the graph convolution technique that will carry out the216
unsupervised classification of the polycrystals is described in Section 3.2. Finally, in Section 3.3, we intro-217
duce our hybrid architecture that combines these two architectures to perform their tasks simultaneously.218
3.1 Deep learning for supervised regression219
The architecture described in this section constitutes the energy functional regression branch of the hy-220
brid architecture described in Section 3.3. The regression task is completed via training an artificial neural221
network (ANN) with multiple layers. While there are other feasible options, such as support vector regres-222
sion machines (Drucker et al.,1997) or Gaussian process regression (Qui˜
nonero-Candela and Rasmussen,223
2005), we choose to train a multilayer perceptron (MLP) or often called feed-forward neural network due224
to the ease of implementations via various existing libraries and the fact that it is a universal function225
The formulation for the two-layer perceptron in Fig. 3, that will also used in this work, is presented227
below as a series of matrix multiplications:228
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 7
Fig. 3: A two-layer perceptron. The input vector xhas dfeatures, each of the two hidden layers hlhas m
zout =h2W3+b3(10)
ˆy=σout(zout). (11)
In the above formulation, the input vector xlcontains the features of a sample, the weight matrix Wl
contains the weights - parameters of the network, and blis the bias vector for every layer. The function σ230
is the chosen activation function for the hidden layers. In the current work, the ELU function is used as an231
activation function for the MLP hidden layers, defined as:232
ELU(α) = eα1, α<0
α,α0. (12)
The vector hlcontains the activation function values for every neuron in the hidden layer. The vector ˆy233
is the output vector of the network with linear activation σout() = ().234
Defining yas the true function values corresponding to the inputs x, then the MLP architecture could235
be simplified as an approximator function ˆy=ˆy(x|W,b)of the true function ywith inputs xparametrized236
by Wand b, such that:237
`(ˆy(x|W,b),y), (13)
where W0and b0are the optimal weights and biases of the neural network that arrive from the opti-238
mization - training process such that a defined loss function `is minimized. The loss functions used in this239
work are discussed in Section 4.240
The fully-connected (Dense) layer that is used as the hidden layer for a standard MLP architecture has241
the following general formulation:242
dense =σ(h(l)W(l)+b(l)). (14)
In the supervised learning branch, the neural network consists of two layers and each layer contains243
200 neurons. The number of layers and the number of neurons per layer are hyperparameters. The optimal244
combination of hyperparameters can be estimated through repeated trial and error or sometimes through245
Bayesian optimization (Gardner et al.,2014). To examine if overfitting occurs, we use a k-fold validation246
8 Nikolaos N. Vlassis et al.
to split the training and testing data and measure the differences in performance when the trained neural247
network is used to make predictions within and outside the training data. A brief review of this issue can248
be found in Wang and Sun (2018).249
3.2 Graph convolution network for unsupervised classification of polycrystals250
Geometric learning refers to the extension of previously established neural network techniques to251
graph structures and manifold-structured data. Graph Neural Networks (GNN) refer to a specific type252
of neural networks architectures that operate directly on graph structures. An extensive summary of dif-253
ferent graph neural network architectures currently developed can be found in (Wu et al.,2019). Graph254
convolution networks (GCN) (Defferrard, Bresson, and Vandergheynst,2016;Kipf and Welling,2017) are255
variations of graph neural networks that bear similarities with the highly popular convolutional neural256
network (CNN) algorithms, commonly used in image processing (Lecun, Bottou, Bengio, and Haffner,257
1998;Krizhevsky, Sutskever, and Hinton,2012). The mutual term convolutional refers to the use of filter258
parameters that are shared over all locations in the graph similar to image processing. Graph convolu-259
tion networks are designed to learn a function of features or signals in graphs G= (V,E)and they have260
demonstrated competitive scores at tasks of classification (Kipf and Welling,2017;Simonovsky and Ko-261
modakis,2017;Kearnes, McCloskey, Berndl, Pande, and Riley,2016;Altae-Tran, Ramsundar, Pappu, and262
In this current work, we utilize a GCN layer implementation similar to that introduced in (Kipf and264
Welling,2017). The implementation is based on the open-source neural network library Keras (Chollet265
et al.,2015) and the open-source library Spektral (Grattarola,2019) on graph neural networks. The GCN266
layers will be the ones that learn from the polycrystal connectivity graph information. A GCN layer ac-267
cepts two inputs, a symmetric normalized graph Laplacian matrix Lsym and a node feature matrix Xas268
described in Section 2.1. The matrix Lsym holds information about the graph structure. The matrix Xholds269
information about the features of every node in the graph - every crystal in the polycrystal. In matrix form,270
the GCN layer has the following structure:271
GCN =σ(Lsymh(l)W(l)+b(l)). (15)
In the above formulation, hlis the output of a layer l. For l=0, the first GCN layer of the network272
accepts the graph features as input such that h0=X. For l>1, hrepresents a higher dimensional repre-273
sentation of the graph features that are produced from the convolution function, similar to a CNN layer.274
The function σis a non-linear activation function. In this work, the GCN layers use the Rectified Linear275
Unit activation function, defined as ReLU() = max(0, ). The weight matrix Wland bias vector blare the276
parameters of the layer that will be optimized during training.277
The matrix Lsym acts as an operator on the node feature matrix Xso that, for every node, the sum of278
every neighboring node features and the node itself is accounted for. The i-th row of the feature matrix279
h(l)represents the weights for the i-th node/crystal in the graph. The output of the multiplication with280
the i-th row of the feature matrix, controlled by the i-th row of the Lsym matrix, corresponds to a weighted281
aggregation output of the features of the i-th node and all its neighbors. In order to include the features282
of the node itself, the matrix Lsym is derived, as defined in Section 2.1 and demonstrated in Appendix B,283
from the binary adjacency matrix ˆ
Aallowing self-loops and the equivalent degree matrix D. Using the284
normalized Laplacian matrix Lsym , instead of the adjacency matrix ˆ
A, for feature filtering remedies possible285
numerical and vanishing / exploding gradient issues when using the GCN layer in deep neural networks.286
This type of spatial filtering can be of great use in constitutive modeling of microstructures where both287
the statistics and topology of attributes may both significantly affect the macroscopic outcome. In the case288
of the polycrystals, for example, the neural network model does not solely learn on the features of every289
crystal separately. It also learns by aggregating the features of the neighboring crystals in the graph to290
potentially reveal a behavior that stems from the feature correlation among different nodes. This property291
deems this filtering function a considerable candidate for learning on spatially heterogeneous material292
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 9
Fig. 4: Hybrid neural network architecture. The network is comprised of two branches - a graph convo-
lutional encoder and a multi-layer perceptron. The first branch accepts the graph structure (normalized
Laplacian Lsym) and graph weights (feature matrix X) (Input A) as inputs and outputs an encoded feature
vector. The second branch accepts the concatenated encoded feature vector and the right Cauchy–Green
deformation tensor Cin Voigt notation (Input B) as inputs and outputs the energy functional ˆ
Remark 2 It is noted that the GCN method can use unweighted graphs as input - in that case the feature294
matrix is a vector of length Nwith every component equal to unity, as suggested in Kipf and Welling (2017).295
However, due to the significantly less amount of information represented by unweighted graphs, we spec-296
ulate that the performance of the resultant trained neural network with unweighted graphs is likely to be297
inferior to that trained on the weighted graph counterparts. The effects of incorporating different combi-298
nations of node features on the performance of predictions are examined in the numerical experiments299
performed in Section 7.300
3.3 Hybrid neural network architecture for simultaneous unsupervised classification and regression301
The hybrid network architecture employed in this current work is designed to perform two tasks si-302
multaneously, guided by a common objective function. This hybrid design is first applied to mechanics303
problem by Frankel et al. (2019) who combine a spatial convolution network with a recurrent neural net-304
work to predict constitutive responses. In this work, we adopt the hybrid design where a graph convolu-305
tional neural network is combined with a feed-forward neural network to generate elastic stored energy306
that leads to constitutive responses. While both approaches generate feature vectors as additional inputs307
for the mechanical predictions, the feature vectors generated from the data stored in voxels (i.e. the Eu-308
clidean data) in Frankel et al. (2019) and the feature vectors generated from the weighted graph (i.e. the309
non-Euclidean data) are fundamentally different. This is due to the graph convolutional approach requires310
only grain-scale data where all the feature of the crystal grain is stored as weights at each node, whereas311
the spatial convolutional approach uses information that is stored at each voxel and, hence, potentially312
much larger, especially for higher voxel grid resolutions.313
The first task is the unsupervised classification of the connectivity graphs of the polycrystals. This is314
carried through by the first branch of the hybrid architecture that resembles that of a convolutional encoder,315
commonly used in image classification (Lecun, Bottou, Bengio, and Haffner,1998;Krizhevsky, Sutskever,316
and Hinton,2012) and autoencoders (Ranzato et al.,2007;Vincent et al.,2008). However, the convolutional317
layers are now following the aforementioned GCN layer formulation. A convolutional encoder passes a318
complex structure (i.e images, graphs) through a series of filters to generate a higher level representation319
and encode - compress the information in a structure of lower dimensions (i.e. a vector). It is common320
practice, for example, in image classification (Krizhevsky et al.,2012), to pass an image through a series of321
10 Nikolaos N. Vlassis et al.
stacked convolutional layers, that increase the feature space dimensionality, and then encode the informa-322
tion in a vector through a multilayer perceptron - a series of stacked fully connected layers. The weights323
of every layer in the network are optimized using a loss function so that the output vector matches the324
classification labels of the input image.325
A similar concept is employed for the geometric learning encoder branch of the hybrid architecture.326
This branch accepts as inputs the normalized graph Laplacian and the node feature matrices. The graph327
convolutional layers read the graph features and increase the dimensionality of the node features. These328
features are flattened and then fed into fully connected layers that encode the graph information in a329
feature vector.330
The second task performed by the hybrid network is a supervised regression task - the prediction of331
the energy functional. The architecture of this branch of the network follows that of a simple feed-forward332
network with fully connected layers, similar to the one described in Section 3.1. The input of this branch333
is the encoded feature vector, arriving from the geometric learning encoder branch, concatenated with the334
second-order right Cauchy–Green deformation tensor Cin Voigt vector notation. The output of this branch335
is the predicted energy functional ˆ
ψ. It is noted that in this current work, an elastic energy functional is pre-336
dicted and the not history-dependent behavior can be adequately mapped with feed-forward architectures.337
Applications of geometric learning on plastic behavior will be the object of future work and will require338
recurrent network architectures that can capture the material’s behavior history, similar to Wang and Sun339
The layer weights of these two branches are updated in tandem with a common back-propagation341
algorithm and an objective function that rewards the better energy functional and stress field predictions,342
using a Sobolev training procedure, described in Section 4.343
Simultaneously, we implement regularization on the graph encoder branch of the hybrid architecture,344
in the form of Dropout layers (Srivastava et al.,2014). We have discovered that regularization techniques345
provide a competent method for combating overfitting issues, addressed later in this work. This work is a346
first attempt to utilizing geometric learning in material mechanics and the model refinement will be consid-347
ered when approaching more complex problems in the future (e.g. history-dependent plasticity problems).348
4 Sobolev training for hyperelastic energy functional predictions349
In principle, forecast engines for elastic constitutive responses are trained by (1) an energy-conjugate350
pair of stress and strain measures (Ghaboussi, Garrett Jr, and Wu,1991;Wang and Sun,2018;Lefik, Boso,351
and Schrefler,2009), (2) a power-conjugate pair of stress and strain rates (Liu et al.,2019) and (3) a pair of352
strain measure and Helmholtz stored energy (Lu et al.,2019;Huang et al.,2019). While options (1) and (2)353
can both be simple and easy to train once the proper configuration of the neural networks is determined,354
one critical drawback is that the resultant model may predict non-convex energy response and exhibit355
ad-hoc path-dependence (Zytynski et al.,1978;Borja et al.,1997).356
An alternative is to introduce supervised learning that takes strain measure as input and output the357
stored energy functional. This formulation leads to the so-called hyperelastic or Green-elastic material,358
which postulates the existence of a Helmholtz free-energy function (Holzapfel et al.,2000). The concept of359
learning a free energy function as a means to describe multi-scale materials has been previously explored360
(Le, Yvonnet, and He,2015;Teichert, Natarajan, Van der Ven, and Garikipati,2019). However, without361
direct control of the gradient of the energy functional, the predicted stress and elastic tangential operator362
may not be sufficiently smooth unless the activation functions and the architecture of the neural network363
are carefully designed. To rectify the drawbacks of these existing options, we leverage the recent work on364
Sobolev training (Czarnecki et al.,2017) in which we incorporate both the stored elastic energy functional365
and the derivatives (i.e. conjugate stress tensor) into the loss function such that the objective of the training366
is not solely minimizing the errors of the energy predictions but the discrepancy of the stress response as367
Traditional deep learning regression algorithms aim to train a neural network to approximate a function369
by minimizing the discrepancy between the predicted values and the benchmark data. However, the metric370
or norm used to measure discrepancy is often the L2norm, which does not regularize the derivative or371
gradients of the learned function. When combined with the types of activation functions that include a372
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 11
high-frequency basis, the learned function may exhibit spurious oscillations and, hence, be unsuitable for373
training hyperelastic energy function that requires convexity.374
The Sobolev training method that we adopt from Czarnecki et al. (2017) is designed to maximize the375
utilization of data by leveraging the available additional higher order data in the form of higher order376
constraints in the training objective function. In the Sobolev training, objective functions are constructed377
for minimizing the HKSobolev norms of the corresponding Sobolev space. Recall that a Sobolev space378
refers to the space of functions equipped with norm comprised of Lpnorms of the functions and their379
derivatives up to a certain order K.380
Since it has been shown that neural networks with the ReLU activation function (as well as functions381
similar to that) can be universal approximators for C1functions in a Sobolev space (Sonoda and Murata,382
2017), our goal here is to directly predict the elastic energy functional by using the Sobolev norm as loss383
function to train the hybrid neural network models.384
This current work focuses on the prediction of an elastic stored energy functional listed in Eq. 1, thus,385
for simplicity, the superscript e(denoting elastic behavior) will be omitted for all energy, strain, stress, and386
stiffness scalar and tensor values herein. In the case of the simple MLP feed-forward network, the network387
can be seen as an approximator function ˆ
ψ(C|W,b)of the true energy functional ψwith input the right388
Cauchy–Green deformation tensor C, parametrized by weights Wand biases b. In the case of the hybrid389
neural network architecture, the network can be seen as an approximator function ˆ
ψ(C,G|W,b)of the390
true energy functional ψwith input the polycrystal connectivity graph information (as described in Fig. 4)391
and the tensor C, parametrized by weights Wand biases b. The first training objective in Equation 16 for392
the training samples i[1, ..., N]is modeled after an L2norm, constraining only ψ:393
W,b 1
2!. (16)
The second training objective in Equation 17 for the training samples i[1, ..., N]is modeled after394
an H1norm, constraining both ψand its first derivative with respect to C- i.e. one half of the 2nd Piola395
Kirchhoff stress tensor S:396
W,b 1
2!, (17)
where in the above:397
C. (18)
It is noted that higher order objective functions can be constructed as well, such as an H2norm con-398
straining the predicted ˆ
ψ, stress, and stiffness values. This would be expected to procure even more ac-399
curate ˆ
ψresults, smoother stress predictions, and more accurate stiffness predictions. However, since a400
neural network is a combination of linear functions - the second-order derivative of the ReLU and its adja-401
cent activation functions is zero, it becomes innately difficult to control the second-order derivative during402
training, thus in this work we mainly focus on the first-order Sobolev method. In case it is desired to con-403
trol the behavior of the stiffness tensor, a first-order Sobolev training scheme can be designed with strain404
as input and stress as output. The gradient of this approximated relationship would be the stiffness tensor.405
This experiment would also be meaningful and useful in finite element simulations.406
It is noted that, in this current work, the Sobolev training is implemented using the available stress in-407
formation as the higher order constraint, assuring that the predicted stress tensors are accurate component-408
wise. In simpler terms, the H1norm constrains every single component of the second-order stress tensor.409
It is expected that this could be handled more efficiently and elegantly by constraining the spectral de-410
composition of the stress tensor - the principal values and directions. It has been shown in (Heider et al.,411
2020) that using loss functions structured to constrain tensorial values in such a manner can be beneficial412
in mechanics-oriented problems and will be investigated in future work.413
12 Nikolaos N. Vlassis et al.
Fig. 5: Schematic of the training procedure of a hyperelastic material surrogate model with the right
Cauchy–Green deformation tensor Cas input and the energy functional ˆ
ψas output. A Sobolev trained
surrogate model will output smooth ˆ
ψpredictions and the gradient of the model with respect to Cwill be
a valid stress tensor ˆ
5 Verification exercises for checking compatibility with physical constraints414
While data-driven techniques, such as the neural network architectures discussed in this work, have415
provided unprecedented efficiency in generating constitutive laws, the consistency of these laws with well-416
known mechanical theory principles can be rather dubious. Generating black-box constitutive models by417
blindly learning from the available data is considered to be one of the pitfalls of data-driven methods.418
If the necessary precautions are not taken, a data-driven model while appearing to be highly accurate in419
replicating the behaviors discerned from the available database, it may lack the utility of a mechanically420
consistent law and, thus, be inappropriate to use in describing physical phenomena. In this work, we421
leverage the mechanical knowledge on fundamental properties of hyperelastic constitutive laws to check422
and - if necessary - enforce the consistency of the approximated material models with said properties.423
In particular for this work, the generated neural network energy functional models are tested for their424
objectivity, isotropy (or lack of), and convexity properties. A brief discussion of these desired properties is425
presented in this section.426
5.1 Objectivity427
Objectivity requires that the energy and stress response of a deformed elastic body remains unchanged428
when rigid body motion takes place. The trained models are expected to meet the objectivity condition - i.e.429
the material response should not depend on the choice of the reference frame. While translation invariance430
is automatically ensured by describing the material response as a function of the deformation, invariance431
for rigid body rotations is not necessarily imposed and must be checked. For a given microstructure repre-432
sented by a graph G, the definition of objectivity for an elastic energy functional ψformulation is described433
as follows (Borja,2013;Kirchdoerfer and Ortiz,2016):434
ψ(QF,G) = ψ(F,G)for all FGL+(3, R),QSO(3), (19)
where Qis a rotation tensor. The above definition can be proven to expand for the equivalent stress435
and stiffness measures:436
P(QF,G) = QP(F,G)for all FG L+(3, R),QSO(3), (20)
c(QF,G) = QQc(F,G)for all FGL+(3, R),QSO(3). (21)
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 13
Thus, a constitutive law is frame-indifferent, if the responses for the energy, the stress, and stiffness437
predictions are left rotationally invariant. This is automatically satisfied when the response is modeled as438
an equivalent function of the right Cauchy-Green deformation tensor C, since:439
C+= (F+)TF+=FTQTQF =FTFC, for all F+=QF. (22)
By training all the models in this work as a function of the right Cauchy-Green deformation tensor C,440
this condition is automatically satisfied and, thus, it will not be further checked.441
5.2 Isotropy442
For a given microstructure represented by a graph G, the material response described by a constitutive443
law is expected to be isotropic, if the following is valid:444
ψ(FQ,G) = ψ(F,G)for all FGL+(3, R),QSO(3). (23)
This expands to the stress and stiffness response of the material:445
P(FQ,G) = P(F,G)Qfor all FGL+(3, R),QSO(3), (24)
c(FQ,G) = c(F,G)QQ for all FGL+(3, R),QSO(3). (25)
Thus, for a material to be isotropic, its response must be right rotationally invariant. In the case that the446
response is anisotropic, as in the inherently anisotropic material studied in this work, the above should not447
be valid.448
5.3 Convexity449
To ensure the thermodynamical consistency of the trained neural network models, the predicted energy450
functional must be convex. Testing the convexity of a black box data-driven function without an explicitly451
stated equation is not necessarily a straightforward process. There have been developed certain algorithms452
to estimate the convexity of black-box functions (Tamura and Gallagher,2019), however, it is outside the453
scope of this work and will be considered in the future. While convexity would be straight-forward to454
visually check for a low-dimensional function, this is not necessarily true for a high-dimensional function455
described by the hybrid models.456
A function f:RnRis convex over a compact domain Dif for all x,yDand all λ[0, 1], if:457
f(λx+ (1λ)y)λf(x) + (1λ)f(y). (26)
For a twice differentiable function f:RnRover a compact domain D, the definition of convexity458
can be proven to be equivalent with the following statement:459
f(y)f(x) + f(x)T(yx), for all x,yD. (27)
The above can be interpreted as the first-order Taylor expansion at any point of the domain being a460
global under-estimator of the function f. In terms of the approximated black-box function ˆ
ψ(C,G)used in461
the current work, the inequality 27 can be rewritten as:462
ψ(Cβ,G) + ˆ
C(Cβ,G):(CαCβ), for all Cα,CβD. (28)
The above constitutes a necessary condition for the approximated energy functional for a specific poly-463
crystal (represented by the connectivity graph G) to be convex, if it is valid for any pair of right Cauchy464
deformation tensors Cαand Cβin a compact domain D. This check is shown to be satisfied in Section
14 Nikolaos N. Vlassis et al.
Remark 3 The trained neural network models in this work will be shown in Section 7to satisfy the checks466
and necessary conditions for being consistent with the expected objectivity, anisotropy, and convexity prin-467
ciples. However, in the case where one or more of these properties appears to be absent, it is noted that it468
can be enforced during the optimization procedure by modifying the loss function. Additional weighted469
penalty terms could be added to the loss function to promote consistency to required mechanical princi-470
ples. For example, in the case of objectivity, the additional training objective, parallel to those expressed in471
Eq. 16 and 17, could be expressed as:472
W,b 1
2!,QSO(3), (29)
where λis a weight variable, chosen between [0, 1], setting the importance of this objective in the now473
multi-objective loss function, and Qare randomly sampled rigid rotations from the SO(3)group. Con-474
straints of this kind were not deemed necessary in the current paper and will be investigated in future475
6 FFT offline database generation477
This section firstly introduces the fast Fourier transform (FFT) based method for the mesoscale homog-478
enization problem, which was chosen to efficiently provide the database of graph structures and material479
responses to be used in geometric learning. Following that, the anisotropic Fung hyperelastic model is480
briefly summarized as the constitutive relation at the basis of the simulations. Finally, the numerical setup481
is introduced focusing on the numerical discretization, grain structure generation, and initial orientation482
of the structures in question.483
6.1 FFT based method with periodic boundary condition484
This section deals with solving the mesoscale homogenization problem using an FFT-based method.485
Supposing that the mesoscale problem is defined in a 3D periodic domain, where the displacement field is486
periodic while the surface traction is anti-periodic, the homogenized deformation gradient Fand first P-K487
stress Pcan be defined as:488
F=hFi,P=hPi, (30)
where h·i denotes the volume average operation.489
Within a time step, when the average deformation gradient increment Fis prescribed, the local stress490
Pwithin the periodic domain can be computed by solving the Lippman-Schwinger equation:491
F+Γ0P(F)C0:F=F, (31)
where denotes a convolution operation, Γ0is Green’s operator, and C0is the homogeneous stiffness of492
the reference material. The convolution operation can be conveniently performed in the Fourier domain, so493
the Lippman-Schwinger equation is usually solved by the FFT based spectral method (Ma and Sun,2019).494
Note that due to the periodicity of the trigonometric basis functions, the displacement field and the strain495
field are always periodic.496
6.2 Anisotropic Fung elasticity497
An anisotropic elasticity model at the mesoscale level is utilized to generate the homogenized response498
database for then training the graph-based model in the macroscale. In this section, a generalized Fung499
elasticity model is utilized as the mesoscale constitutive relation due to its frame-invariance and convenient500
implementation (Fung,1965;Ateshian and Costa,2009).501
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 15
In the generalized Fung elasticity model, the strain energy density function Wis written as:502
2c[exp (Q)1],Q=1
2E:a:E, (32)
where cis a scalar material constant, Eis the Green strain tensor, and ais the fourth-order stiffness tensor.503
The material anisotropy is reflected in the stiffness tensor a, which is a function of the spatial orientation504
and the material symmetry type.505
For a material with orthotropic symmetry, the strain energy density can be written in a simpler form506
λab A0
a, (33)
where µaand λab are anisotropic Lam´
e constants, and a0
ais the unit vector of the orthotropic plane normal,508
which represents the orientation of the material point in the reference configuration. Note that λab is a509
symmetric second-order tensor, and the material symmetry type becomes cubic symmetry when certain510
values of λand µare adopted.511
The elastic constants take the value:512
c=2 (MPa), λ=
0.6 0.7 0.6
0.7 1.4 0.7
0.6 0.7 0.5
(MPa), µ=
(MPa), (34)
and remain constant across all the mesoscale simulations. The only changing variable is the grain structure513
and the initial orientation of the representative volume element (RVE), which is introduced in the following514
6.3 Numerical aspects of the database generation516
The grain structures and initial orientations of the mesoscale simulations are randomly generated in517
the parameter space to generate the database. The mesoscale RVE is equally divided into 49 ×49 ×49 grid518
points to maintain a high enough resolution at an acceptable computational cost. The grain structures are519
generated by the open-source software NEPER (Quey et al.,2011). An equiaxed grain structure is randomly520
generated with 40 to 50 grains. A sample RVE is shown in Figure 6.521
The initial orientations are generated using the open source software MTEX (Bachmann et al.,2010).522
The orientation distribution function (ODF) is randomly generated by combining uniform orientation and523
unimodal orientation:524
f(x;g) = w+ (1w)ψ(x,g),xSO(3), (35)
where w[0, 1]is a random weight value, gSO(3)is a random modal orientation, and ψ(x,g)is525
the von Mises–Fisher distribution function considering cubic symmetry. The half width of the unimodal526
texture ψ(x,g)is 10, and the preferential orientation gof the unimodal texture is also randomly generated.527
A sample initial ODF is shown in Figure 6(b).528
The average strain is randomly generated from an acceptable strain space, and simulations are per-529
formed for each RVE with 200 average strains. Note that the constitutive relation is hyperelastic, so the530
simulation result is path independent. To avoid any numerical convergence issues, the range of each strain531
component (FI)is between 0.0 and 0.1 in the RVE coordinate.532
7 Numerical Experiments533
One major advantage of the hybrid (Frankel et al.,2019) or graph-based training (Wang and Sun,2018)534
is that the resultant neural network is not only suitable to be a surrogate model for one RVE but a fam-535
ily of RVEs with different microstructures. In this section, we present the results of 13 sets of numerical536
experiments grouped in four subsections to examine and demonstrate the performances of the neural net-537
work models we trained. In section Section 7.1, we conducted a numerical experiment to examine the538
16 Nikolaos N. Vlassis et al.
(a) Sample polycrystal microstructure. (b) Sample initial orientation.
Fig. 6: Sample of the randomly generated initial microstructure: (a) Initial RVE with 50 equiaxed grains,
which is equally discretized by 49 ×49 ×49 grid points; (b) Pole figure plot of initial orientation distribu-
tion function (ODF) combining uniform and unimodal ODF. The Euler angles of the unimodal direction
are (304, 61, 211)in Bunge notation, and the half width of the unimodal ODF is 10. The weight value
is 0.50 for uniform ODF and 0.50 for unimodal ODF.
neural network trained by the Sobolev method and compare the predictions obtained from the classical539
loss function that employs L2norm. In Section 7.2, we include 4 sets of numerical experiments (a k-fold540
validation of a Sobolev trained MLP on data for a single polycrystal, a number of input features test for541
the hybrid architecture, an overfitting check, and a comparison of the k-fold validation results from both542
the hybrid and the MLP models). In Section 7.3, we include 5 additional verification tests (homogenization543
experiments, blind predictions for microstructures in the training set, blind predictions for microstructures544
in the testing set, an isomorphism check, and a model convexity check) to further examine whether the545
predictions violate any necessary conditions that are not explicitly enforced in the objective functions but546
are crucial for forward predictions. In Section 7.4, we introduced 3 set of tests (dynamic simulations of a547
single polycrystal, mesh refinement simulations, comparison of crack patterns for different polycrystal in-548
puts) to demonstrate the potential applications of the hyperelastic energy functional predictions for brittle549
fracture problems using the hybrid architecture. Each of these experiments include multiple calculations550
that involve both predictions within the calibration range and forward predictions for unseen data previ-551
ously unused in the training process. The abbreviations we have used for all the model architectures we552
implemented and tested are summarized in Table 1.
Table 1: Summary of the considered model and training algorithm combinations.
Model Description
ml p Multilayer perceptron feed-forward architecture. Loss function used is
the L2norm (Eq. 16)
ml p Multilayer perceptron feed-forward architecture. Loss function used is
the H1norm (Eq. 17)
hybrid Hybrid architecture described in Fig. 4. Loss function used is the H1
norm (Eq. 17)
reg Hybrid architecture described in Fig. 4. Loss function used is the H1
norm (Eq. 17). The geometric learning branch of the network is regular-
ized against overfitting.
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 17
To compare the performance of the training and testing results, the scaled MSE performances of differ-554
ent models are represented using non-parametric, empirical cumulative distribution functions (eCDFs), as555
in (Kendall et al.,1946;Gentle,2009). The results are plotted in scaled MSE vs eCDF curves in a semilog-556
arithmic scale for the training and testing partitions of the dataset. The distance between these curves can557
be a qualitative metric for the performance of the various models on various datasets - e.g. the distance of558
the eCDF curves of a model for the training and testing datasets is a qualitative metric of the overfitting559
phenomenon. For a dataset Mwith MSEisorted in ascending order, the eCDF can be computed as follows:560
0, MSE <MSE1,
M, MSErMSE <MSEr+1,r=1, ..., M1,
To compare all the models in equal terms, the neural network training hyperparameters throughout561
all the experiments were kept identical wherever it was possible. All the strain and node weight inputs as562
well as the energy and stress outputs were scaled in the range between [0, 1]during the neural network563
training. The learning capacity of the models (i.e. layer depth and layer dimensions) for the multilayer564
perceptrons for ML2
ml p and MH1
ml p, as well as the multilayer perceptron branch of the MH1
hybrid and MH1
were kept identical. The multilayer perceptron branch in all the networks consists of two Dense layers566
(200 neurons each) with ELU activation functions. The geometric learning branch consists of two GCN567
layers (64 filters each with ReLU activation functions) followed by two Dense layers (100 neurons each568
with ReLU activation functions). The selected encoded feature vector layer was chosen to have 9 neurons.569
For the MH1
reg model, Dropout layers (dropout rate of 0.2) are defined between every GCN and Dense layer570
in the geometric learning branch. The optimizer used for the training of the neural networks was Nadam571
and all the networks were trained for 1000 epochs, utilizing an early stopping algorithm to terminate572
training when the performance would stop improving. The hyperparameter space of the neural network573
architecture was deemed rather large to perform a comprehensive parameter search study and the network574
was tuned through consecutive trial and error iterations. An illustrative example of this trial and error575
process to tune the number of neurons of the encoded feature vector is demonstrated in Appendix D. In576
this current work, the values used for the hyperparameters were deemed adequate to provide as accurate577
results as possible for all methods while maintaining fair comparison terms. The optimization of these578
hyperparameters to achieve the maximum possible accuracy will be the objective of future work.579
Remark 4 Since the energy functional ψand the stress values are on different scales of magnitude, the580
prediction errors are demonstrated using a common scaled metric. For all the numerical experiments in581
this current work, to demonstrate the discrepancy between the predicted values of energy (ψpred) and582
the equivalent true energy values (ψtrue) as well as between the predicted principal values of the 2nd583
Piola-Kirchhoff stress tensor (SA,pred) and the equivalent true principal values (SA,true) for A=1, 2, 3, the584
following scaled mean squared error (scaled MSE) metrics are defined respectively for a sample of size M:585
scaled MSEψ=1
i=1h(ψtrue)i(ψpred )ii2with ψ:=ψψmin
ψmax ψmin
. (37)
scaled MSESA=1
A=1h(SA,true)i(SA,pred )ii2with SA:=SASA,min
SA,max SA,min
, (38)
The functions mentioned above scale the values ψpred,ψtrue,SA,pred , and SA,true to be in the feature586
range [0, 1]. It is noted that the scaling functions ψand SAare defined on the training data set - i.e. the587
values ψmin,ψmax,SA,min, and SA,max are derived from the true values of the training data.588
Remark 5 When comparing the performance of models in predicting the directions of a second-order stress589
tensor, we utilize a distance function between two rotation tensors R1,R2belonging to the Special Or-590
thogonal Group, SO(3). The rotation tensors are constructed by concatenating the orthogonal, normalized591
eigenvectors of the stress tensors. The Euclidean distance measure φEu, discussed in detail in (Huynh,2009;592
Heider, Wang, and Sun,2020), can be expressed as:593
18 Nikolaos N. Vlassis et al.
φEu =qd(¯
ψ2)2. (39)
In the above, {¯
ψi} ∈ ER+are the set of Euler angles associated with Ri, and the Euclidean594
distance dbetween two scalar-valued quantities α1,α2is expressed as d(α1,α2)=min{|α1α2|, 2π595
|α1α2|} ∈ [0, π]. The distance measure φEu belongs to the range [0, π3]and the results used in the596
figures in this work are presented normalized in the range [0, 1]. For this distance measure, the statement597
φEu(R1,R2)is equivalent to R1=R2.598
7.1 Numerical Experiment 1: Generating an isotropic hyperelastic energy functional with Sobolev training599
In this section, a numerical experiment is presented to demonstrate the benefits of training a neural600
network on hyperelastic energy functional data in the Sobolev training framework. The experiment was601
performed on synthetic data generated from a small-strain hyperelastic energy functional designed for the602
Modified Cam-Clay plasticity model (Roscoe and Burland,1968;Houlsby,1985;Borja et al.,2001). The603
hyperelastic elastic stored energy functional is described in a strain invariant space (volumetric strain ev,604
deviatoric strain es). The strain invariants are defined as:605
ev=tr (e),es=r2
3ev1, (40)
where eis the small strain tensor and ethe deviatoric part of the small strain tensor. Using the chain606
rule, the Cauchy stress tensor can be described in the invariant space as follows:607
e. (41)
In the above, the mean pressure pand deviatoric (von Mises) stress qcan be defined as:608
2ksk, (42)
where sis the deviatoric part of the Cauchy stress tensor. Thus, the Cauchy stress tensor can be ex-609
pressed by the stress invariants as:610
3qbn, (43)
where bn=ee/keek=2/3ee/ee
s. (44)
The hyperelastic energy functional allows full coupling between the elastic volumetric and deviatoric611
responses and is described as:612
ψ(ev,es)=p0crexp ev0ev
2cµp0exp ev0ev
κ(es)2, (45)
where ev0is the initial volumetric strain, p0is the initial mean pressure when ev=ev0,κ>0 is the613
elastic compressibility index, and cµ>0 is a constant. The hyperelastic energy functional is designed to614
describe an elastic compression law where the equivalent elastic bulk modulus and the equivalent shear615
modulus vary linearly with p, while the mean pressure pvaries exponentially with the change of the616
volumetric strain ev=ev0ev. The specifics and the utility of this hyperelastic law is outside the scope617
of this current work and will be omitted. The numerical parameters of this model where chosen as ev0=0,618
p0=100 KPa, cµ=5.4, and κ=0.018. By taking the partial derivatives of the energy functional with619
respect to the strain invariants, the stress invariants are derived as:620
2κ(es)2exp ev0ev
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 19
=3cµp0exp ev0ev
To compare the performance of the Sobolev training method a two-layer feed-forward neural network621
is trained on synthetic data generated for the above hyperelastic law. The training data set includes 225 data622
points, sampled as shown in Fig. 7, 25 of which are randomly selected to be used as a validation set during623
training. The testing is performed on 1000 data points. The inputs of the neural network are the two strain624
invariants ev,esand the output is the predicted energy ψ. The network has two hidden Dense layers (100625
neurons each) with ELU activation functions and an output Dense layer with a linear activation function.626
The training experiment is performed with an L2norm loss function (constraining only the predicted ψ627
values) and with an H1norm loss function (constraining the predicted ψ,p, and qvalues).628
Fig. 7: Comparison of L2and H1norm training performance for a hyperelastic energy functional used for
the Modified Cam-Clay plasticity model (Borja et al.,2001).
The results of the two training experiments are shown in Fig. 7and Fig. 8. Both training algorithms seem629
to be able to capture the energy functional ψvalues well with the H1trained model demonstrating slightly630
higher accuracy. However, closer examination in the results shown in fig. 8reveals that the neural network631
trained with a H1norm perform better both in predicting the energy functional and the first derivative that632
leads to the stress invariants pand q. In particular, the neural network trained with the L2norm generates633
a mean pressure and deviatoric stress response that oscillates spuriously with respect to the strain whereas634
the H1counterpart produces results that exhibit no oscillation. Such oscillation is not desirable particularly635
if the neural network predictions were to be incorporated into an implicit finite element model.636
7.2 Numerical Experiment 2: Training an anisotropic hyperelastic model for polycrystals with637
non-Euclidean data638
To determine whether the incorporation of graph data improves the accuracy and robustness of the639
forward prediction, we conduct both the hybrid learning and the classical supervised machine learning.640
20 Nikolaos N. Vlassis et al.
Fig. 8: Comparison of L2and H1predictions for the energy functional ψ, the stress invariant p, and stress
invariant q.
The latter is used as a control experiment. The ability to capture the elastic stored energy functional of a641
single polycrystal is initially tested on that MLP model. A two-hidden-layer feed-forward neural network642
is trained and tested on 200 sample points - 200 different, randomly generated deformation tensors with643
their equivalent elastic stored energy and stress measures for only one of the generated microstructures.644
A Sobolev trained model as described in Section 4(model type MH1
ml p) was utilized. This architecture will645
also constitute the multilayer perceptron branch of the hybrid network described previously in Fig. 4. To646
eliminate as much as possible any objectivity on the dataset of the experiment, the network’s capability is647
tested with a K-fold cross-validation algorithm (cf. (Bengio and Grandvalet,2004)). The 200 sample points648
are separated into 10 different groups - folds of 20 sample points each and, recursively, a fold is selected as649
a testing set and the rest are selected as a training set for the network.650
The K-Fold testing results can be seen in Fig. 9where the model can predict the data for a single RVE651
formation adequately, as well as interpolate smoothly between the data points to generate the response652
surface estimations for the energy and the stress field (Fig. 10). A good performance for both training and653
testing on a single polycrystal structure was expected as no additional input is necessary, other than the654
strain tensor. Any additional input - i.e. structural information - would be redundant in the training since655
it would be constant for the specific RVE.656
In this current work, we generalize the learning problem by introducing the polycrystal weighted con-657
nectivity graph as the additional input data. This connectivity graph is inferred directly from the micro-658
structure by assigning each grain in the poly-crystal as a vertex (node) and assigning an edge on each grain659
contact pair. As discussed in Section 2.1, the nodes of the graphs can have weights carrying information660
about the crystals they represent in the form of a feature matrix X. The available node features in the data661
set are described in Appendix C. A model of type MH1
hybrid is tested and trained on 150 polycrystals - 100662
RVEs in the training set and 50 RVEs in the testing set - with different sets of features to evaluate the effect663
of the node features to the model’s performance. Four combinations of data were tested: a model M1
with the crystal volume as the only features, a model M4
hybrid with the crystal volume and the crystal Eu-665
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 21
Fig. 9: Correlation plot for true vs predicted K-fold testing results for the energy functional ψ(left) and the
first component of the 2nd Piola-Kirchhoff stress tensor (right) by a surrogate neural network model MH1
ml p
trained on data for a single RVE.
Fig. 10: Estimated ψenergy functional surface (left) and the first component of the 2nd Piola-Kirchhoff
stress tensor (right) generated by a surrogate neural network model (MH1
ml p) trained on data for a single
ler angles as features, a model M8
hybrid that utilizes the crystal volumes, the Euler angles, the equivalent666
diameter, the number of faces, the total area of the faces, the number of neighbors, and, finally, a model667
hybrid that utilizes all the available features. The abbreviations of the model names are also described in668
Table 2. The results of the training experiment are demonstrated in Figure 11. Increasing the available node669
features during training seems to generally increase the model’s performance in training and testing. The670
model that uses the crystal volumes as node features demonstrates the lowest performance. The largest671
improvement in performance is observed when the crystal Euler angles are included in the feature matrix.672
As it was previously mentioned in Section 3.3, the hybrid architecture proposed can be prone to overfit-673
ting. To avoid that, we utilize additional Dropout layers in the geometric learning branch of the network as674
a regularization method during the training. The models representing the hybrid architecture without and675
with regularization are of the type MH1
hybrid and MH1
reg respectively. To demonstrate that, the two models676
are tested and trained on 150 polycrystals - 100 RVEs in the training set and 50 RVEs in the testing set. The677
comparison results are shown in Fig. 12. While MH1
hybrid appears to be prone to overfitting - the training er-678
ror is lower than the blind prediction error. This issue can be alleviated with regularization techniques that679
22 Nikolaos N. Vlassis et al.
Table 2: Abbreviations of hybrid model names with different number of node weight features.
Model Node weight features
hybrid crystal volume
hybrid crystal volume and three Euler angles
hybrid crystal volume, three Euler angles, equivalent diameter, number of
faces, total area of faces, and number of neighbors
hybrid crystal volume, three Euler angles, equivalent diameter, number of
faces, total area of faces, number of neighbors, and centroid position
Fig. 11: Comparison of the model’s performance for different number of node weight features (M1
hybrid, and M11
hybrid) for the second Piola - Kirchhoff stress Stensor principal values (scaled
MSE) and direction predictions (φEu). The abbreviations of the model names are described in Table 2.
promote the model’s robustness in blind predictions. This can be qualitatively seen on the scaled MSE vs680
eCDF plot for the MH1
reg model - the distance between training and testing curves closes for the regularized681
model. Since the MH1
reg model appears to procure superior results in blind prediction accuracy compared to682
the not regularized model MH1
hybrid, from this point on, we will be working and comparing with the MH1
and omitting the MH1
hybrid for simplicity reasons.684
The ability of the hybrid architecture proposed in Fig. 4to leverage the information from a weighted685
connectivity graph to expand learning over multiple polycrystals in comparison with the classical mul-686
tilayer perceptron methods is tested in the following experiment. A K-fold validation algorithm is per-687
formed on 100 generated polycrystal RVEs. The 100 RVEs are separated into 5 folds of 20 RVEs each. In688
doing so, every polycrystal RVE will be considered as blind data for the model at least once. The K-fold689
cross-validation algorithm is repeated for the model architectures and training algorithms ML2
ml p,MH1
ml p,690
and MH1
reg . The results are presented in Fig. 13 as scaled MSE vs eCDF curves for the energy functional ψ691
and second Piola-Kirchhoff stress Stensor principal values and as φEu vs eCDF for the principal direction692
predictions. It can be seen that using the Sobolev training method greatly reduces the blind prediction693
errors - both the ML2
ml p energy and stress prediction errors are higher than those of the MH1
ml p and MH1
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 23
Fig. 12: Scaled mean squared error comparison for the second Piola - Kirchhoff stress Stensor principal
value and φEu error for the Sdirection predictions for the models MH1
hybrid and MH1
reg .
models. The MH1
reg model demonstrates superior predictive results than the MH1
ml p model, as it can distin-695
guish between different RVE behaviors.696
In addition to the performance measured by this quantitative metric, enabling the weighted graph as697
an additional input for the hybrid network also provides the opportunity to further generalize the learning698
process. In figure 14, the energy potential surface estimations are shown for the simple multilayer percep-699
tron and the hybrid architecture for two different polycrystals. Without the graph as input, the network700
cannot distinguish behaviors, while the hybrid architecture estimates two distinctive elastic stored energy701
surfaces. The weighted connectivity graph of each polycrystal is encoded in a perceivably different feature702
vector that aids the downstream multilayer perceptron to identify and interpolate among different behav-703
iors for the RVEs. Furthermore, this hybrid strategy, if trained successfully and carefully validated, is also704
potentially more efficient than a black-box surrogate model, as the hybrid model does not require a new705
training process when encountering a new RVE that can be sufficiently described by a weighted graph.706
7.3 Numerical Experiment 3: Verification tests on unseen data707
To ensure that the constitutive response predicted by the trained neural network is consistent with the708
known mechanics principles, we introduce numerical experiments to verify our ML model and assess the709
accuracy and robustness of the predictions made by the graph-based model. The predictive capacity of710
the neural network is tested for blind predictions against homogeneous RVE simulations and new FFT711
simulations performed on microstructures within and out of the range of the training dataset. The hybrid712
architecture is also tested on whether it can satisfy the isomorphism condition of the graph inputs and on713
whether it produces convex energy functionals.714
7.3.1 Verification Test 1: Responses of unseen homogeneous anisotropic RVEs715
In this blind verification, our goal is to check whether the machine learning model predicts the right716
anisotropic responses of a homogeneous RVE. We use the model trained with the training data described717
in Appendix Cto make a forward prediction on 5 RVEs with all grains of the same crystalline orientation.718
Since there is no cohesive zone model used in the grain boundary, setting the crystal orientation identical719
to all the grains essentially makes the RVEs homogeneous.720
The ML model then takes the weighted graph that represents the topology of the microstructures as721
additional input and is used to predict the constitutive responses of these 5 extra microstructures unseen722
24 Nikolaos N. Vlassis et al.
Fig. 13: Scaled MSE vs eCDF curves for ψenergy functional (top left), scaled MSE vs eCDF second Piola-
Kirchhoff stress Stensor principal values (top right), and φEu vs eCDF second Piola-Kirchhoff stress S
tensor principal direction predictions (bottom) for the models ML2
ml p,MH1
ml p, and MH1
reg . The models’ per-
formance is tested with a K-fold algorithm on a dataset of 100 RVEs - only the blind prediction results are
during the training. The results of uniaxial unconfined tensile tests performed on the 5 RVEs of two dif-723
ferent orientations (0, 0, 0)and (45, 45, 45)are compared with the benchmark solution as shown in724
15. Meanwhile, we also applied pure shear loading on all three pairs of orthogonal planes of these 5 RVES725
with (0, 0, 0)orientation. The comparison with the benchmark solution is shown in 16.726
7.3.2 Verification Test 2: Blind test of RVEs with unseen FFT simulations727
In this verification test, we test the model’s blind prediction capabilities against unseen data generated728
by FFT simulations. The hybrid architecture model is tested against uniaxial unconfined tension tests and729
pure shear tests conducted using the FFT solver. The hybrid architecture used in this test was trained on730
data from 100 RVEs. It is noted that the model was trained solely on randomly generated deformation731
gradients and, thus, the strain paths prescribed for these tests are unseen. The results of these tests for732
three RVEs sampled from the training dataset can bee seen in Fig. 17, while the results for three unseen733
microstructures sampled from the testing dataset can bee seen in Fig. 18.734
7.3.3 Verification Test 3: Test of graph isomorphism responses735
In this test, we check whether the trained geometric learning models procure the same predictions for736
isomorphic graphs. The definition of graph isomorphism is provided in Appendix A. With the original737
graph structure of the input known, we can generate any number of isomorphic graphs by applying the738
same random permutation to the rows and the columns of the original normalized Laplacian matrix of739
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 25
Fig. 14: Without any additional input (other than the strain tensor), the neural network cannot differentiate
between these two polycrystals. The two anisotropic behaviors can be distinguished when the weighted
connectivity graph is also provided as input. Through the unsupervised encoding branch of the hybrid
architecture, each polycrystal is mapped on an encoded feature vector. The feature vector is fed to the
multilayer perceptron branch and procures a unique energy prediction.
Fig. 15: Estimated ψand S11 responses for 5 unseen RVEs homogenized at Euler angles (0, 0, 0)(top)
and at (45, 45, 45)(bottom) of crystal orientations in Bunge notation under uniaxial unconfined tensile
the graph. The same random permutation is applied to the rows of the feature matrix of the input as well.740
The permuted isomorphic graphs and the equivalent feature matrices carry the same information for the741
microstructure and the predictions of the hybrid architecture should be consistent under any permutation742
26 Nikolaos N. Vlassis et al.
Fig. 16: Estimated S12,S23 , and S13 responses for 5 unseen RVEs homogenized at Euler angles (0, 0, 0)
of crystal orientations in Bunge notation under pure shear loading for 3 directions.
of the graph input. To test this hypothesis, 10 iterations of permutations were performed on the graph743
Laplacian matrix and feature matrix of a microstructure to produce isomorphic graph representations of744
that graph microstructure. These isomorphic graph inputs were then used to make predictions against745
unseen FFT simulation data. The results of this experiment can be seen in Fig. 19 where the isomorphic746
graph inputs procure consistent responses.747
7.3.4 Verification Test 4: Convexity check748
To check the convexity for the trained hybrid models, a numerical check was conducted on the trained749
hybrid architecture models. The models were tested for the check described in Eq. 28. The Cαand Cβwere750
chosen to be right Cauchy deformation tensors sampled from the training and testing sets of deformations.751
The input Gwas checked for all the 150 RVEs the hybrid architecture was trained and tested on. For every752
graph input, the approximated energy functional must be convex. Thus, to verify that for all the poly-753
crystal formations, the convexity check is repeated for every RVE in the dataset. It is noted that, while754
these checks describe a necessary condition for convexity, they do not describe a sufficient condition and755
more robust methods of checking convexity will be considered in the future. For a specific polycrystal756
- graph input, the network has six independent variables - deformation tensor Ccomponents. To check757
the convexity, for every RVE in the dataset, deformation tensors Care sampled in a grid and are checked758
pairwise (approximately 265,000 combinations of points/checks per RVE) and are found to satisfy the759
inequality 28. In Figure 20, a sub-sample of 100 convexity checks for three RVEs is demonstrated.760
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 27
Fig. 17: Comparison of hybrid model predictions with FFT simulation data for 3 RVEs from the training
data set. The tests conducted are uniaxial unconfined tension (left and middle columns) and pure shear
(right column).
7.4 Numerical Experiment 4: Parametric studies on anisotropic responses of polycrystals in phase-field761
The anisotropic elastic responses predicted using the hybrid neural network trained by both non-763
Euclidean descriptors and FFT simulations performed on polycrystals are further examined in the phase-764
field fracture simulations in which the stored energy functional generated from the hybrid learning model765
is degraded according to a driving force. In this series of parametric studies, the Kalthoff-Wikler experi-766
ment is numerically simulated via a phase-field model in which the elasticity is predicted by the hybrid767
neural network (Kalthoff and Winkler,1988;Kalthoff,2000). We adopt the effective stress theory (Simo768
and Ju,1987) is valid such that the stored energy can be written in terms of the product of a degradation769
function and the stored elastic energy. The degradation function and the driving force are both pre-defined770
in this study. The training of an incremental functional for the path-dependent constitutive responses will771
be considered in the second part of this series of work.772
In the first numerical experiment, we conduct a parametric study by varying the orientation of the773
RVE to analyze how the elastic anisotropy predicted by the graph-dependent energy functional affects the774
nucleation and propagation of cracks. In the second numerical experiment, the hybrid neural network is775
given new microstructures. Forward predictions of the elasticity of the two new RVEs are made by the776
hybrid neural network without further calibration. We then compare the crack patterns for the two RVEs777
28 Nikolaos N. Vlassis et al.
Fig. 18: Comparison of hybrid model predictions with FFT simulation data for 3 RVEs from the testing data
set. The tests conducted are uniaxial unconfined tension (left and middle columns) and pure shear (right
and compare the predictions made without the graph input to analyze the impact of the incorporation of778
non-Euclidean descriptors on the quality of the predictions of crack growths.779
While previous work, such as Kochmann et al. (2018), has utilized FFT simulations to generate incre-780
mental constitutive updates, the efficiency of the FFT-FEM model may highly depend on the complexity of781
the microstructures and the existence of a sharp gradient of material properties of the RVEs. In this work,782
the FFT simulations are not performed during the multiscale simulations. Instead, they are used as the783
training and validation data to generate a ML surrogate model following the treatment in Wang and Sun784
(2018) and Wang and Sun (2019b).785
For brevity, we omit the detailed description of the phase-field model for brittle fracture. Interested786
readers please refer to, for instance, Bourdin et al. (2008) and Borden et al. (2012a). In this work, we adopt787
the viscous regularized version of phase-field brittle fracture model in Miehe et al. (2010b) in which the788
degradation function and the critical energy release rate are pre-defined. The equations solved are the789
balance of linear momentum and the rate-dependent phase-field governing equation:790
U, (48)
0X·[dγ]) + η˙
d=2(1d)H, (49)
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 29
Fig. 19: Hybrid model prediction for isomorphic graph inputs. The mean value and the range of the predic-
tions are compared to the FFT simulation benchmark results for an unconfined uniaxial tension test (top
row) and pure shear test (bottom row).
Fig. 20: Approximated energy functional convexity check results for three different polycrystals. Each point
represents a convexity check and must be above the [LHS RHS =0]line so that the inequality 28 is
where γis the crack density function that represents the diffusive fracture, i.e.,791
γ(d,d) = 1
2|∇d|2. (50)
The problem is solved following a standard staggered time discretization (Borden et al.,2012b) such792
that the balance of linear momentum and the phase-field governing equations are updated sequentially.793
30 Nikolaos N. Vlassis et al.
In the above Eq. (48), Pis the first Piola-Kirchhoff stress tensor, Bis the body force and ¨
Uis the second794
time derivative of the displacement U. In Eq. (49), following (Miehe et al.,2010a), drefers to the phase-795
field variable, with d=0 signifying the undamaged and d=1 the fully damaged material. The variable796
l0refers to the length scale parameter used to approximate the sharp crack topology as a diffusive crack797
profile, such that as l00 the sharp crack is recovered. The parameter gcis the critical energy release rate798
from the Griffith crack theory. The parameter ηrefers to an artificial viscosity term used to regularize the799
crack propagation by giving it a viscous resistance. The term His the force driving the crack propagation800
and, in order to have an irreversible crack propagation in tension, it is defined as the maximum tensile801
(”positive”) elastic energy that a material point has experienced up to the current time step tn, formulated802
H(Ftn,G) = max
tntψ+(Ftn,G). (51)
The degradation of the energy due to fracture should take place only under tension and can be linked804
to that of the undamaged elastic solid as:805
ψ(F,d,G) = (g(d) + r)ψ+(F,G) + ψ(F,G). (52)
The parameter rrefers to a residual energy remaining even in the full damaged material and it is set806
r0 for these experiments. For these numerical experiments, the degradation function that was used was807
the commonly used quadratic (Miehe, Hofacker, and Welschinger,2010a):808
g(d) = (1d)2with g(0) = 1 and g(1) = 0. (53)
In order to perform a tensile-compressive split, the deformation gradient is split into a volumetric809
and an isochoric part. The energy and the stress response of the material should not be degraded under810
compression. The split of the deformation gradient, following (de Souza Neto et al.,2011), is performed as811
F=FisoFvol =FvolFiso, (54)
where the volumetric component of Fis defined as813
Fvol = (det F)1/3 I, (55)
and the volume-preserving isochoric component as814
Fiso = (det F)1/3 F. (56)
The strain energy is, thus, split in a ”tensile” and ”compressive” part, such that:815
ψ(F,G)ψ(Fvol,G)J<1, (57)
ψ(Fvol,G)J<1. (58)
where J=det(F). In these examples, the energy values are calculated using the hybrid architecture816
neural network model, whose derivatives with respect to the strain input will be the stress. Since the817
model’s input is in terms of the right Cauchy-Green deformation tensor, the degraded stress is calculated818
P(F,d,G) = 2Fg(d)ˆ
C. (59)
The experiment in question studies the crack propagation due to the high velocity impact of a projectile.820
The geometry and boundary conditions of the domain, as well as the configuration of the pre-existing821
crack, is shown in Fig. 21. It is noted that, while only half of the domain of the problem is studied in this822
work, the rest of the domain would not necessarily demonstrate a symmetric response due to the material’s823
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 31
anisotropic behavior. In this preliminary study, it was deemed that the half domain would be adequate to824
illustrate and compare the different anisotropic responses of the model in question. (Kalthoff and Winkler,825
1988;Kalthoff,2000) have observed the crack to propagate at 70for an isotropic material, results that have826
previously been reproduced with numerical simulations in other studies (Belytschko et al.,2003;Song827
et al.,2008;Borden et al.,2012b). The experiment is conducted for two impact velocities (v0=16.5 m/sand828
v0=33.0 m/s) to test the crack branching phenomenon expected for higher impact velocities.829
Fig. 21: The geometry and boundary conditions of the domain for the dynamic shear loading experiment.
The velocity is prescribed progressively at the bottom left corner of the domain. The mesh is designed to
have a pre-existing crack of 50.0 mm.
The experiment lasts for 80 µsand the prescribed velocity is applied progressively following the scheme830
below for t0=1µs:831
The domain is meshed uniformly with 20,000 triangular elements and the length scale is chosen to832
be l0=1.2 ×103m. While this mesh is rather coarse compared to previous studies of the same prob-833
lem, it was deemed adequate to simulate the problem at hand with acceptable accuracy to qualitatively834
demonstrate the anisotropic model behavior. The time-step used for the explicit method was chosen to be835
t=5×108sto provide stable results. Changing the time step of the explicit method did not appear to836
affect the phase-field solution, as long as the explicit solver for the momentum equation was stable.837
For the first numerical example, the experiment is initially conducted on an isotropic material with838
parameters commonly used in the literature (E=190 GPa, ν=0.3, l0=1.2 ×103m) to verify the for-839
mulation and compare with the anisotropic results. It can be seen that with the current formulation, the840
isotropic model can recover the approximately 70angle previously reported in the experiments and nu-841
merical simulations. Following that, the behavior of a single polycrystal was tested. In other words, the842
graph input of the hybrid architecture material model remained constant for all the simulations. The ma-843
terial model is a trained neural network of type MH1
reg with the graph input set constant. All the neural844
networks used in this section were trained on a dataset of 100 RVEs with 200 sample points each. The pur-845
pose of this experiment is to show that by rotating the highly anisotropic RVE, under the same boundary846
conditions, different wave propagation and crack nucleation patterns can be observed. This experiment847
could be paralleled to rotating a transversely isotropic material - different fiber orientations should pro-848
cure different results under identical boundary conditions. In Fig. 22, it is demonstrated that the neural849
network material model is indeed anisotropic, showing varying behaviors while rotating the RVE for 0,850
32 Nikolaos N. Vlassis et al.
isotropic (a) (b)
φ=0(c) (d)
φ=30(e) (f)
φ=60(g) (h)
Fig. 22: Crack patterns at 65 µsfor the dynamic shear loading experiment for the isotropic material and the
anisotropic material for a constant graph, rotated at various angles. The left column shows the experiments
for v=16.5 m/sand the right column for v=33.0 m/s.
30, and 60. The nature of the anisotropy becomes more apparent when the impact velocity is doubled851
and the crack branching is more prevalent.852
Dynamic simulations can be prone to numerical instabilities that may affect the predicted crack prop-853
agation patterns (Wei and Chen,2018). To ensure the crack propagation patterns demonstrated in this ex-854
periment are not prone to numerical instabilities depending on the mesh, the simulations were repeated on855
different meshes and different levels of mesh refinement. The simulation shown in Fig. 22 (d) (v=33.0 m/s,856
φ=0) was repeated on a mesh with 11,450 quadrilateral elements, as well as a mesh with 80,000 trian-857
gular elements. For the quadrilateral elements and the refined triangular meshes, the simulation time step858
were chosen to be t=5×108sand t=2.5 ×108srespectively. The simulation shown in Fig. 22859
(h) (v=33.0 m/s,φ=60) was also repeated on a mesh with 80,000 triangular elements to investigate860
whether the mesh would affect the crack propagation patterns of the RVEs under rotation. The comparison861
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 33
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Fig. 23: Crack patterns at 65 µs(left column) and 85 µs(middle column) (v=33.0 m/s) for the dynamic
shear loading experiment on three different meshes: 11,450 quadrilateral elements (top row), 20,000 trian-
gular elements (middle row), and 80,000 triangular elements (bottom row). The area around the branching
at 85 µsis zoomed in to demonstrate the mesh resolution (right column).
of the crack patterns is demonstrated in Fig. 23 and Fig. 24. Neither the selection of element shape nor the862
level of refinement seem to be greatly affecting the propagated cracks. The main cracks for all the simula-863
tions at different levels of refinement appear to be close to identical. There are secondary cracks in Fig. 23864
(h) as well as Fig. 23 (b) appear to be more defined, which is expected due to the higher resolution of the865
34 Nikolaos N. Vlassis et al.
(a) (b)
Fig. 24: Crack patterns at 70 µs(v=33.0 m/s) for the dynamic shear loading experiment for the RVE rotated
at φ=60on two different meshes: 20,000 triangular elements (left), and 80,000 triangular elements (right).
no graph (a) (b) (c) (d)
RVE A (e) (f) (g) (h)
RVE B (i) (j) (k) (l)
Fig. 25: Crack patterns at 30 µs, 50 µs, 65 µs, 85 µsfor the dynamic shear loading experiment with an impact
velocity of v=33.0 m/sfor a model without a graph input (a, b, c, d) and two different polycrystals (e, f,
g, h and i, j, k, l). It is noted that all the parameters are identical for all the simulations but the graph input.
For the second numerical experiment, the material response was tested for different polycrystals (model867
type MH1
reg ) as well as for a model without any graph inputs (type MH1
ml p). The aim of this experiment was868
to verify that the hybrid architecture and the graph input can capture the anisotropy of the polycrystal869
material that is originating from the interactions between crystals, as expressed by the connectivity graph.870
The above experiment was repeated for different graph inputs and the results are demonstrated in Fig 25.871
In the absence of a graph input, while there is crack propagation, the results look noisy and the direction of872
the propagation is not similar to that of specific RVEs, something that could be potentially attributed to the873
model being trained on multiple polycrystal behaviors. For the model with the graph input, the difference874
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 35
in behaviors appears to become more apparent in the areas where branching is more prevalent, with the875
polycrystal affecting the crack branching phenomena. No additional anisotropy measures or crack branch-876
ing criteria were utilized for these simulations. The sole additional information in the input of the material877
model would be the weighted connectivity graph.878
8 Conclusion879
We introduce a machine learning method that incorporates geometric learning to extract low-dimensional880
descriptors from microstructures represented by weighted graphs and use these low-dimensional descrip-881
tors to enhance a supervised learning that predicts the stored elastic energy functional via Sobolev training.882
By utilizing non-Euclidean data structures, we introduce these weighted graphs as new descriptors for ge-883
ometric learning such that the hybrid deep learning can produce an energy functional that leverages the884
rich micro-structural information not describable by the classical Euclidean descriptors, such as porosity885
and density. To overcome the potential spurious oscillations of the learned functions due to lack of con-886
straints on their derivatives, we adopt the Sobolev training and the resultant hyperelastic energy functional887
is more accurate and smoother, compared to classical machine learning techniques. This work also laid the888
foundation for several new potential research directions. For instance, the energy functional approach can889
be extended to a variational consititutive update framework where discrete Lagrangian can be constructed890
incrementally to predict path-dependent behaviors. Furthermore, more sophisticated geometric learning891
methods that involve directed graphs (for representing hierarchical information), edge-weighted graphs892
(for representing attributes of grain contacts), and the evolution of the graphs (for path-dependent behav-893
iors) will provide us a fuller picture to examine the relationships between topology of micorstructures and894
the resultant macroscopic responses.895
9 Acknowledgments896
The authors would like to thank the two anonymous reviewers for their insightful feedback and their897
many suggestions that helped improve the quality of this paper. The authors are supported by the NSF898
CAREER grant from Mechanics of Materials and Structures program at National Science Foundation under899
grant contracts CMMI-1846875 and OAC-1940203, the Dynamic Materials and Interactions Program from900
the Air Force Office of Scientific Research under grant contracts FA9550-17-1-0169 and FA9550-19-1-0318.901
These supports are gratefully acknowledged. The views and conclusions contained in this document are902
those of the authors, and should not be interpreted as representing the official policies, either expressed903
or implied, of the sponsors, including the Army Research Laboratory or the U.S. Government. The U.S.904
Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding905
any copyright notation herein.906
A Appendix: Graph theory terminologies and definitions907
In this section, a brief review of several terms of graph theory is provided to facilitate the illustration of908
the concepts in this current work. More elaborate descriptions can be found in Graham et al. (1989); West909
et al. (2001); Bang-Jensen and Gutin (2008):910
Definition 1 Agraph is a two-tuple G= (V,E)where V={v1, ..., vN}is a non-empty vertex set (also912
referred to as nodes) and EV×Vis an edge set. To define a graph, there exists a relation that associates913
each edge with two vertices (not necessarily distinct). These two vertices are called the edge’s endpoints.914
The pair of endpoints can either be unordered or ordered.915
Definition 2 An undirected graph is a graph whose edge set EV×Vconnects unordered pairs of916
vertices together.917
36 Nikolaos N. Vlassis et al.
(a) (b) (c) (d)
Fig. 26: Different types of graphs. (a) Undirected (simple) binary graph (b) Directed binary graph (c) Edge-
weighted undirected graph (d) Node-weighted undirected graph.
Definition 3 Aloop is an edge whose endpoint vertices are the same. When all the nodes in the graph are918
in a loop with themselves, the graph is referred to as allowing self-loops.919
Definition 4 Multiple edges are edges having the same pair of endpoint vertices.920
Definition 5 Asimple graph is a graph that does not have loops or multiple edges.921
Definition 6 Two vertices that are connected by an edge are referred to as adjacent or as neighbors.922
Definition 7 The term weighted graph traditionally refers to graph that consists of edges that associate923
with edge-weight function wij :ERnwith (i,j)Ethat maps all edges in Eonto a set of real924
numbers. nis the total number of edge weights and each set of edge weights can be represented by a925
matrix Wwith components wij .926
In this current work, unless otherwise stated, we will be referring to weighted graphs as graphs weighted927
at the vertices - each node carries information as a set of weights that quantify features of microstructures.928
All vertices are associated with a vertex-weight function fv:VRDwith vVthat maps all vertices929
in Vonto a set of real numbers, where Dis the number of weights - features. The node weights can be930
represented by a N×Dmatrix Xwith components xik , where the index i[1, ..., N]represents the node931
and the index k[1, ..., D]represents the type of node weight - feature.932
Definition 8 A graph whose edges are unweighted (we=1eE) can be called a binary graph.934
To facilitate the description of graph structures, several terms for representing graphs are introduced:935
Definition 9 The adjacency matrix Aof a graph Gis the N×Nmatrix whose entry αi j is the number of937
edges in Gwith endpoints {vi,vj}, as shown in Eq. 3.938
Definition 10 If the vertex vis an endpoint of edge e, then vand eare incident. The degree dof a vertex939
vis the number of incident edges. The degree matrix Dof a graph Gis the N×Ndiagonal matrix with940
diagonal entries diequal to the degree of vertex vi, as shown in Eq. 4.941
Definition 11 An isomorphism from a graph Gto another graph His a bijection mthat maps V(G)942
to V(H)and E(G)to E(H)such that each edge of Gwith endpoints uand vis mapped to an edge943
with endpoints m(u)and m(v). Applying the same permutation to both the rows and the columns of the944
adjacency matrix of graph Gresults to the adjacency matrix of an isomorphic graph H.945
Definition 12 The unnormalized Laplacian operator is defined such that:946
wij (fifj)(61)
wij fj. (62)
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 37
By writing the equation above in matrix form, the unnormalized Laplacian matrix of a graph Gis the948
N×Npositive semi-definite matrix defined as =DW.949
In this current work, binary graphs will be used, thus, the equivalent expression is used for the unnor-950
malized Laplacian matrix L, defined as L=DAwith the entries lij calculated as:951
lij =
1, i6=jand viis adjacent to vj
0, otherwise.
Definition 13 For binary graphs, the symmetric normalized Laplacian matrix Lsym of a graph Gis the953
N×Nmatrix defined as:954
Lsym =D1
2. (64)
The entries lsym
ij of the matrix Lsym are shown in Eq. 5.955
B Appendix: Sample problem of graph representation of polycrystal microstructures956
To demonstrate how graphs used to represent a polycrystalline assemble are generated, we introduce957
a simple example where an assembly consists of 5 crystals shown in Fig. 2(a) is converted into a node-958
weighted graph. Each node of the graph represents a crystal. An edge is defined between two nodes if959
they are connected/share a surface. The graph is undirected meaning that there is no direction specified960
for the edges. The vertex set Vand edge set Efor this specific graph are V={v1,v2,v3,v4,v5}and961
E={e12,e23 ,e34,e35,e45}respectively.962
An undirected graph can be represented by an adjacency matrix A(cf. Def. 9) that holds information for963
the connectivity of the nodes. The entries of the adjacency matrix A, in this case, are binary - each entry of964
the matrix is 0 if an edge does not exist between two nodes and 1 if it does. Thus, for the example in Fig. 2,965
crystals 1 and 2 are connected so the entries (1, 2)and (2, 1)of the matrix Awould be 1, while crystals 1966
and 3 are not so the entries (1, 3)and (3, 1)will be 0 and so on. If the graph allows self-loops, then the967
entries in the diagonal of the matrix are equal to 1 and the adjacency matrix with self-loops is defined as968
A=A+I. The complete symmetric matrices Aand ˆ
Afor this example will be:969
0 1 0 0 0
0 1 0 0
0 1
sym. 0
1 1 0 0 0
1 1 0 0
1 1
sym. 1
. (65)
A diagonal degree matrix Dcan also be useful to describe a graph representation. The degree matrix970
Donly has diagonal terms that equal the number of neighbors of the node represented in that row. The971
diagonal terms can simply be calculated by summing all the entries in each row of the adjacency matrix.972
It is noted that, when self-loops are allowed, a node is a neighbor of itself, thus it must be added to the973
number of total neighbors for each node. The degree matrix Dfor the example graph in Fig. 2would be:974
. (66)
The polycrystal connectivity graph can be represented by its graph Laplacian matrix L- defined as975
L=DA, as well as the normalized symmetric graph Laplacian matrix Lsym =D1
2. The two976
matrices for the example of Fig. 2are calculated below:977
38 Nikolaos N. Vlassis et al.
11 0 0 0
21 0 0
sym. 2
,Lsym =
60 0
sym. 1
. (67)
Assume that, for the example in Fig. 2, there is information available for two features Aand Bfor978
each crystal in the graph that will be used as node weights - this could be the volume of each crystal, the979
orientations, and so on. The node weights for each crystal represented by a vertex vican be described as a980
vector, fi= ( fA,fB), such that each component of the vector corresponds to a feature of the i-th node. The981
node features can all be represented in a feature matrix Xwhere each row corresponds to a node and each982
column corresponds to a feature. For the example in question, the feature matrix would be:983
. (68)
C Appendix: Database statistics984
This section describes the statistics for the generated microstructures that were used for training the985
geometric learning based neural network model. The database consists of 150 polycrystal RVEs generated986
as described in Section 6.3. For every polycrystal in the database, the grain connectivity information is987
available in the form of adjacency matrices. For every crystal in a polycrystal RVE, there is available infor-988
mation on the crystal features that will be used as weights for the undirected graph input. The available989
node features in the data set contain information on the volume, the three Euler angles (in Bunge nota-990
tion), the equivalent diameter (the diameter of the sphere with the equivalent volume as the crystal), the991
number of faces, the total area of the faces, the number of neighbors, and the centroid position vector of992
each crystal. More elaborate descriptions of the crystal features can be found in the documentation of the993
open-source software NEPER (Quey et al.,2011). The distribution of the polycrystal features, separated994
into 100 training and 50 testing cases, is demonstrated in Fig. 27.995
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 39
Fig. 27: Feature distributions for the 150 polycrystals in the database, separated into 100 polycrystals used
for training and 50 polycrystals used for testing.
40 Nikolaos N. Vlassis et al.
D Appendix: Encoded feature vector dimension996
The hyperparameter space of a complex architecture is rather large to conduct a comprehensive hyper-997
parameter search and provide a confident explanation of how the dimensions of the hybrid architecture998
affect the model’s performance. The number of neurons of the encoded feature vector layer was one of999
these tuned hyperparameters. To provide an insight into how the encoded feature vector layer dimension1000
choice was made, we are providing the results from three training experiments that we conducted while1001
testing various architectures for the neural network through trial and error. We noticed that, in iterations1002
of the network with lower dimensions than 9 neurons, the predictions were less accurate. For feature di-1003
mensions much higher than 9 neurons, the performance seemed to not drastically improve and, thus, they1004
were not chosen to reduce the training time of the network. In the Fig. 28, we are showing the performance1005
results of training the hybrid neural network architecture with an encoded feature vector dimension of 1,1006
9, and 32 neurons.1007
Fig. 28: Comparison of hybrid neural network performance for architectures with encoded feature vector
dimensions of 1, 9, and 32 neurons.
Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. Low data drug discovery with1009
one-shot learning. ACS central science, 3(4):283–293, 2017.1010
L Anand and M Kothari. A computational procedure for rate-independent crystal plasticity. Journal of the1011
Mechanics and Physics of Solids, 44(4):525–558, 1996.1012
Gerard A Ateshian and Kevin D Costa. A frame-invariant formulation of fung elasticity. Journal of biome-1013
chanics, 42(6):781–785, 2009.1014
F. Bachmann, Ralf Hielscher, and Helmut Schaeben. Texture Analysis with MTEX – Free and Open Source1015
Software Toolbox. 2010. doi: 10.4028/
Jørgen Bang-Jensen and Gregory Z Gutin. Digraphs: theory, algorithms and applications. Springer Science &1017
Business Media, 2008.1018
Ted Belytschko, Hao Chen, Jingxiao Xu, and Goangseup Zi. Dynamic crack propagation based on loss1019
of hyperbolicity and a new discontinuous enrichment. International Journal for Numerical Methods in1020
Engineering, 58(12):1873–1905, 2003. ISSN 1097-0207. doi: 10.1002/nme.941.1021
Yoshua Bengio and Yves Grandvalet. No unbiased estimator of the variance of k-fold cross-validation.1022
Journal of machine learning research, 5(Sep):1089–1105, 2004.1023
Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 41
Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspec-1024
tives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.1025
MA Bessa, R Bostanabad, Z Liu, A Hu, Daniel W Apley, C Brinson, Wei Chen, and Wing Kam Liu. A frame-1026
work for data-driven analysis of materials under uncertainty: Countering the curse of dimensionality.1027
Computer Methods in Applied Mechanics and Engineering, 320:633–667, 2017.1028
Michael J Borden, Clemens V Verhoosel, Michael A Scott, Thomas JR Hughes, and Chad M Landis. A1029
phase-field description of dynamic brittle fracture. Computer Methods in Applied Mechanics and Engineer-1030
ing, 217:77–95, 2012a.1031
M.J. Borden, C.V. Verhoosel, M.A. Scott, T.J.R. Hughes, and C.M. Landis. A phase-field description of1032
dynamic brittle fracture. Computer Methods in Applied Mechanics and Engineering, 217-220:77–95, 2012b.1033
ISSN 00457825. doi: 10.1016/j.cma.2012.01.008.1034
Ronaldo I Borja. Plasticity. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. ISBN 978-3-642-38546-9.1035
doi: 10.1007/978-3-642-38547-6.1036
Ronaldo I Borja and Seung R Lee. Cam-clay plasticity, part 1: implicit integration of elasto-plastic constitu-1037
tive relations. Computer Methods in Applied Mechanics and Engineering, 78(1):49–72, 1990.1038