Content uploaded by Waiching Sun

Author content

All content in this area was uploaded by Waiching Sun on Jul 19, 2020

Content may be subject to copyright.

Computer Methods in Applied Mechanics and Engineering manuscript No.

(will be inserted by the editor)

Geometric deep learning for computational mechanics Part I: Anisotropic1

Hyperelasticity2

Nikolaos N. Vlassis ·Ran Ma ·WaiChing Sun3

4

Received: May 18, 2020/ Accepted: date5

Abstract We present a machine learning approach that integrates geometric deep learning and Sobolev6

training to generate a family of ﬁnite strain anisotropic hyperelastic models that predict the homoge-7

nized responses of polycrystals previously unseen during the training. While hand-crafted hyperelasticity8

models often incorporate homogenized measures of microstructural attributes, such as the porosity or9

the averaged orientation of constitutes, these measures may not adequately reﬂect the topological struc-10

tures of the attributes. We ﬁll this knowledge gap by introducing the concept of the weighted graph as11

a new high-dimensional descriptor that represents topological information, such as the connectivity of12

anisotropic grains in an assemble. By leveraging a graph convolutional deep neural network in a hybrid13

machine learning architecture previously used in Frankel et al. (2019), the artiﬁcial intelligence extracts14

low-dimensional features from the weighted graphs and subsequently learns the inﬂuence of these low-15

dimensional features on the resultant stored elastic energy functionals. To ensure smoothness and prevent16

unintentionally generating a non-convex stored energy functional, we adopt the Sobolev training method17

for neural networks such that a stress measure is obtained implicitly by taking directional derivatives of18

the trained energy functional. Results from numerical experiments suggest that Sobolev training is capa-19

ble of generating a hyperelastic energy functional that predicts both the elastic energy and stress measures20

more accurately than the classical training that minimizes L2norms. Veriﬁcation exercises against unseen21

benchmark FFT simulations and phase-ﬁeld fracture simulations using the geometric learning generated22

elastic energy functional are conducted to demonstrate the quality of the predictions.23

Keywords geometric machine learning; graph; polycrystals; microstructures; anisotropic energy func-24

tional; phase-ﬁeld fracture25

1 Introduction26

Conventional constitutive modeling efforts often rely on human interpretations of geometric descrip-27

tors of microstructures. These descriptors, such as volume fraction of void/constituents, dislocation den-28

sity, twinning, degradation function, slip system, orientation, and shape factors are often incorporated as29

state variables in a system of ordinary differential equations that leads to the constitutive responses at a ma-30

terial point. Classical examples include the family of Gurson models in which the volume fraction of voids31

is related to ductile fracture (Gurson,1977;Needleman,1987;Zhang et al.,2000;Nahshon and Hutchinson,32

2008;Nielsen and Tvergaard,2010), critical state plasticity in which porosity and over-consolidation ratio33

dictates the plastic dilatancy and hardening law (Schoﬁeld and Wroth,1968;Borja and Lee,1990;Manzari34

and Dafalias,1997;Sun,2013;Liu et al.,2016;Wang et al.,2016b) and crystal plasticity where the activation35

of slip systems leads to plastic deformation (Anand and Kothari,1996;Na and Sun,2018;Ma et al.,2018).36

Corresponding author: WaiChing Sun

Assistant Professor, Department of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail

Code: 4709, New York, NY 10027 Tel.: 212-854-3143, Fax: 212-854-6267, E-mail: wsun@columbia.edu

2 Nikolaos N. Vlassis et al.

In these cases, a speciﬁc subset of descriptors is often incorporated manually such that the most crucial37

deformation mechanisms for the stress-strain relationships are described mathematically.38

While this approach has achieved a level of success, especially for isotropic materials, materials of39

complex microstructures often require more complex geometric and topological descriptors to sufﬁciently40

describe the geometrical features (Jerphagnon et al.,1978;Sun and Mota,2014;Kuhn et al.,2015). The hu-41

man interpretation limits the complexity of the state variables and may lead to lost opportunity of utilizing42

all the available information for the microstructure, which could in turn reduce the prediction quality. A43

data-driven approach should be considered to discover constitutive law mechanisms when human inter-44

pretation capabilities become restrictive (Kirchdoerfer and Ortiz,2016;Eggersmann et al.,2019;He and45

Chen,2019;Stoffel et al.,2019;Bessa et al.,2017;Liu et al.,2018). In this work, we consider the general form46

of a strain energy functional that reads,47

ψ=ψ(F,G),P=∂ψ

∂F, (1)

where Gis a graph that stores the non-Euclidean data of the microstructures (e.g. crystal connectivity, grain48

connectivity). Speciﬁcally, we attempt to train a neural network approximator of the anisotropic stored49

elastic energy functional across different polycrystals with the sole extra input to describe the anisotropy50

being the weighted crystal connectivity graph.51

Fig. 1: Polycrystal interpreted as a weighted connectivity graph. The graph is undirected and weighted at

the nodes.

It can be difﬁcult to directly incorporate either Euclidean or non-Euclidean data to a hand-crafted con-52

stitutive model. There have been attempts to infer information directly from scanned microstructural im-53

ages using neural networks that utilize a convolutional layer architecture (CNN) (Lubbers et al.,2017). The54

endeavor to distill physically meaningful and interpretable features from scanned microstructural images55

stored in a Euclidean grid can be a complex and sometimes futile process. While recent advancements in56

convolutional neural networks have provided an effective means to extract features that lead to extraordi-57

nary superhuman performance for image classiﬁcation tasks (Krizhevsky et al.,2012), similar success has58

not been recorded for mechanics predictions. Image-related problems, such as camera noise, saturation,59

image compression as well as ring artifacts, which often occur in micro-CT images, may lead to issues in60

the deconvolution operators and, in some cases, may constitute an obstacle in acquiring useful and in-61

terpretable features from the image dataset (Xu et al.,2014). In some cases, over-ﬁtting and under-ﬁtting62

can both render the trained CNN extremely vulnerable to adversarial attacks and hence not suitable for63

high-risk, high-regret applications.64

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 3

As demonstrated in previous works (Jones et al.,2018;Frankel et al.,2019), using images directly as65

an additional input to the polycrystal energy functional approximator may be contingent to the quality66

and size of the training pool. A large number of images, possibly in three dimensions, and in high enough67

resolution would be necessary to represent the latent features that will aid the approximator to distinguish68

successfully between different polycrystals. Using data in a Euclidean grid is an esoteric process that is69

dependent on empirical evidence that the current training sample holds adequate information to infer70

features useful in formulating a constitutive law. However, gathering that evidence can be a laborious71

process as it requires numerous trial and error runs and is weighed down by the heavy computational72

costs of performing ﬁltering on Euclidean data (e.g. on high resolution 3D image voxels).73

Graph representation of the data structures can provide a momentous head-start to overcome this very74

impediment. An example is the connectivity graph used in granular mechanics community where the75

formations and evolution of force chains are linked to macroscopic phenomena, such as shear band forma-76

tion and failures (Satake,1992;Kuhn et al.,2015;Sun et al.,2013;Tordesillas et al.,2014;Wang and Sun,77

2019a,b). The distinct advantage of the graph representation of data is the relatively high interpretability of78

the data structures (Kuhn et al.,2015;Wang and Sun,2019a). This graph representation is not only helpful79

for understanding the topology of interrelated entities in a network but also provides a convenient means80

to create universal and interpretable features via graph convolutional neural networks (Altae-Tran et al.,81

2017;Xie and Grossman,2018).82

At the same time, by concisely selecting appropriate graph weights, one may incorporate only the es-83

sential information of micro-structural data critical for mechanics predictions which could prove to be more84

interpretable, ﬂexible, economical, and efﬁcient than incorporating feature spaces inferred from 3D voxel85

images. Furthermore, since one may easily use rotational and transitional invariant data as weights, the86

graph approach is also advantageous for predicting constitutive responses that require frame indifference.87

Currently, machine learning applications often employ two families of algorithms to take graphs as88

inputs, i.e., representation learning algorithms and graph neural networks. The former usually refers to89

unsupervised methods that convert graph data structures into formats or features that are easily compre-90

hensible by machine learning algorithms (Bengio et al.,2013). The latter refers to neural network algorithms91

that accept graphs as inputs with layer formulations that can operate directly on graph structures (Scarselli92

et al.,2008). Representation learning on graphs shares concepts with the rather popular embedding tech-93

niques on text and speech recognition (Mikolov et al.,2013) to encode the input in a vector format that94

can be utilized by common regression and classiﬁcation algorithms. There have been multiple studies on95

encoding graph structures, spanning from the level of nodes (Grover and Leskovec,2016) up to the level96

of entire graphs (Perozzi, Al-Rfou, and Skiena,2014;Narayanan, Chandramohan, Venkatesan, Chen, Liu,97

and Jaiswal,2017). Graph embedding algorithms, like DeepWalk (Perozzi et al.,2014), utilize techniques98

such as random walks to ”read” sequences of neighboring nodes resembling reading word sequences in a99

sentence and encode those graph data in an unsupervised fashion.100

While these algorithms have been proven to be rather powerful and demonstrate competitive results in101

tasks like classiﬁcation problems, they do come with disadvantages that can be limiting for use in engineer-102

ing problems. Graph representation algorithms work very well on encoding the training dataset. However,103

they could be difﬁcult to generalize and cannot accommodate dynamic data structures. This can be proven104

problematic for mechanics problems, where we expect a model to be as generalized as much as possible in105

terms of material structure variations (e.g. polycrystals, granular assemblies). Furthermore, representation106

learning algorithms can be difﬁcult to combine with another neural network architecture for a supervised107

learning task in a sequential manner. In particular, when the representation learning is performed sepa-108

rately and independently from the supervised learning task that generates the energy functional approxi-109

mation, there is no guarantee that the clustering or classiﬁcations obtained from the representative learning110

are physically meaningful. Hence, the representation learning may not be capable of generating features111

that facilitate the energy functional prediction task in a completely unsupervised setting.112

For the above reasons, we have opted for a hybrid neural network architecture that combines an unsu-113

pervised graph convolutional neural network with a multilayer perceptron to perform the regression task114

of predicting an energy functional. Both branches of our suggested hybrid architecture learn simultane-115

ously from the same back-propagation process with a common loss function tailored to the approximated116

function. The graph encoder part - borrowing its name from the popular autoencoder architecture (Ran-117

zato et al.,2007;Vincent et al.,2008) - learns and adjusts its weights to encode input graphs in a manner118

4 Nikolaos N. Vlassis et al.

that serves the approximation task at hand. Thus, it does eliminate the obstacle of trying to coordinate119

the asynchronous steps of graph embedding and approximator training by parallel ﬁtting both the graph120

encoder and the energy functional approximator with a common training goal (loss function).121

As for notations and symbols in this current work, bold-faced letters denote tensors (including vectors122

which are rank-one tensors); the symbol ’·’ denotes a single contraction of adjacent indices of two tensors123

(e.g. a·b=aibior c·d=cij djk ); the symbol ‘:’ denotes a double contraction of adjacent indices of tensor124

of rank two or higher ( e.g. C:ee=Cijk l ee

kl ); the symbol ‘⊗’ denotes a juxtaposition of two vectors (e.g.125

a⊗b=aibj) or two symmetric second order tensors (e.g. (α⊗β)ijkl =αij βkl ). Moreover, (α⊕β)ijk l =αjl βik

126

and (αβ)ijkl =αil βjk. We also deﬁne identity tensors (I)ij =δij ,(I4)ijkl =δikδjl, and (I4

sym)i jkl =127

1

2(δikδjl +δil δkj), where δij is the Kronecker delta. As for sign conventions, unless speciﬁed otherwise, we128

consider the direction of the tensile stress and dilative pressure as positive.129

2 Graphs as non-Euclidean descriptors for micro-structures130

This section provides a detailed account of how to incorporate microstructural data represented by131

weighted graphs as descriptors for modeling hyperelastic responses of different microstructures. In partic-132

ular, we describe how topological information of an assemble composed of grains with different properties133

can be effectively represented as a node-weighted graph (Section 2.1). A brief review of some basic con-134

cepts of graph theory is included in Appendix A. An illustrative example of the graph representation for a135

polycrystal structure in Figure 2is included in Appendix B.136

2.1 Polycrystals represented as node-weighted undirected graphs137

Inferring microstructural data as weighted graphs from ﬁeld data may require pooling, a down-sampling138

procedure to convert ﬁeld data of a speciﬁed domain into low-dimensional features that preserve topolog-139

ical information. Examples of applications of pooling inferring include the grain connectivity graph from140

micro-CT images of assembles (Jaquet et al.,2013;Wang et al.,2016a) or realization of micro-structures141

generated from software packages such as Neper or DREAM.3D (Quey et al.,2011;Groeber and Jackson,142

2014). In these cases, the node and edge set can be deﬁned in a rather intuitive manner, as the micro-143

structures are formed by assembles consisting of parts (grains) connected in a speciﬁc way represented by144

the edge set as illustrated in Figure 2. In this work, we treat each crystal grain as a node or vertex in a graph145

and create an edge for each in-contact grain pair. Attributes of each grain are represented in a collection146

of node weights. The features of the edges (such as the contact surface areas, roughness) are neglected to147

simplify the learning procedures but will be considered in the future. For simplicity, we also assume that148

the polycrystal contains no voids and the contacts remain intact.149

Given a polycrystal microstructure consisting of a ﬁnite number of crystal grains N, we deﬁne the150

graph representation used in this work. The shape of the grains are idealized as polyhedrons such that the151

faces of each grain may be in contact with at most one face of the other grain. As such, an edge is assigned152

between each in-contact (adjacent) grain pair such that there exist Eedges in the graph. The collection of153

the grains is then represented as a vertex set V={v1, ..., vN}, and the collection of edges as a edge set154

E⊆V×V. There can only be one unique edge deﬁned between two vertices and the order of the vertices155

of an edge does not matter - i.e. the pairs are unordered or undirected. As a result, the connectivity of the156

polycrystal can then be represented by an undirected graph G= (V,E)where V={v1, ..., vN}(cf. Def.157

1).158

Note that Galone only provides the connectivity information. To predict the elasticity of the polycrys-159

tals, more information about the geometrical features and mechanical properties of grains must be ex-160

tracted in the machine learning process. In our design, these features and properties are stored as weights161

assigned to each vertex and the purpose of the geometric learning is to ﬁnd a set of descriptors of lower162

dimensions than the weighted graph such that they can be directly incorporated into the energy functional163

calculations. For each vertex viin the graph, we deﬁne a feature vector fi={fi

1, ..., fi

D}where Dis the164

number of node weights that represent the geometrical features (e.g. size, number of faces, aspect ratios)165

and mechanical properties (e.g. elastic moduli, crystal orientation) of the grain vi. In this work, the feature166

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 5

(a) (b)

Fig. 2: A illustrative example of a polycrystal (a) represented as an undirected weighted graph (b). If two

crystals in the formation share a face, their nodes are also connected in the graph. Each node is weighted

by two features fAand fBin this illustrative example.

vectors store information about the volume, the crystal orientation (in Euler angles), the total surface area,167

the number of faces, the numbers of neighbors, as well as other shape descriptors, (e.g. the equivalent168

diameter) for every crystal grain in the polycrystals. The set of node weights for the entire graph reads169

F={f1,f2, ..., fN}. (2)

Remark 1 Note that including the edge weights in the learning problems is likely to provide a more rich170

source of data for the learning problems. Information such as the attributes of each contact between each171

grain pair, including the contact surface areas and the angle of contact - could be used as weights for the172

edges of the graph. While this current work is solely focused on node-weighted graphs, future work will173

examine an efﬁcient way to generate energy functional from a node-weighted/edge-weighted graph.174

2.2 Adjacency matrix, graph Laplacian and feature matrix175

In order to prepare the available data for the geometric learning, it is often more convenient to adopt a176

matrix representation of a graph. In this work, the geometric learning algorithm used requires the normal-177

ized symmetric Laplacian matrix Lsym and the node feature matrix Xas inputs (see Appendix A).178

First of all, we deﬁne the node feature matrix Xby simply stacking the feature vectors fitogether.179

As such, the dimension of the node feature matrix Xis N×D. The symmetric normalized Laplacian is180

obtained from the Laplacian matrix L=D−Awhere Aand Dare the adjacency and degree matrices (cf.181

Def. 9) and Def. 10) . The adjacency matrix Ais a symmetric matrix of dimensions N×Nrepresenting the182

connectivity of the microstructure. The entries αij of the matrix Aare deﬁned as,183

αij =1, viis adjacent to vj

0, otherwise. (3)

The degree diof a vertex viis deﬁned as the total number of adjacent (neighboring) vertices to vior,184

equivalently, the number of crystal grains in contact with the crystal grain vi. The matrix Dis a N×N185

diagonal matrix. The entries rij of the diagonal matrix Dare deﬁned as,186

rij =di,i=j

0, otherwise. (4)

The symmetric normalized Laplacian matrix Lsym =D−1/2 LD−1/2 represents the graph connectivity187

structure and is one of the inputs for the geometric learning algorithm described in Section 3.2. The matrix188

Lsym is of N×Ndimensions. The entries lsym

ij of the matrix Lsym read,189

6 Nikolaos N. Vlassis et al.

lsym

ij =

1, i=jand di6=0

−(didj)−1

2,i6=jand viis adjacent to vj

0, otherwise.

(5)

Note that while the Laplacian matrix Land the symmetric normalized Laplacian matrix Lsym both190

represent the connectivity of the grains in the polycrystals, the normalized Lsym is often a more popular191

choice for spectral-based graph convolutions due to the symmetric and positive-semi-deﬁnite properties,192

as well as properties .193

3 Deep learning on graphs194

Machine learning often involves algorithms designed to generate functions to represent the available195

data. Some common applications in machine learning are those of regression and classiﬁcation. A regres-196

sion algorithm attempts to make predictions of a numerical value provided with input data. A classiﬁ-197

cation algorithm attempts to assign a label to an input and place it to one or multiple classes/categories198

that it belongs to. Classiﬁcation tasks can be supervised, if information for the true labels of the inputs is199

available during the learning process. Classiﬁcation tasks can also be unsupervised, if the algorithm is not200

exposed to the true labels of the input during the learning process but attempts to infer labels for the input201

by learning properties of the input dataset structure. The hybrid geometric learning neural network intro-202

duced in this work simultaneously performs an unsupervised classiﬁcation of polycrystal graph structures203

and the regression of an anisotropic elastic energy potential functional. This combination of unsupervised204

learning classiﬁcation and supervised learning regression has been ﬁrst adopted for solid mechanics prob-205

lems in Frankel et al. (2019) where convolutional neural network is used to generate features of 3D voxel206

images to aid predictions of elasto-plastic responses under monotonically increasing strain. The only fea-207

ture of the grain incorporated in the learning problem is the crystal orientation. In our design, a 3D voxel208

image is ﬁrst converted into a lower dimensional weighted graph that contains only connectivity infor-209

mation stored in the symmetric normalized graph Laplacian Lsym, while we consider multiple effective210

properties of each grain stored in the matrix X. Then, a geometric learning encoder is trained to provide a211

even lower dimensional latent representation of the weighted graph to aid the predictions of hyperelastic212

energy functional.213

In this section, we provide a brief description of the supervised and unsupervised components of the214

hybrid architecture. The supervised learning component conducted via regression with the multilayer per-215

ceptron (MLP) is reviewed in Section 3.1. While the graph convolution technique that will carry out the216

unsupervised classiﬁcation of the polycrystals is described in Section 3.2. Finally, in Section 3.3, we intro-217

duce our hybrid architecture that combines these two architectures to perform their tasks simultaneously.218

3.1 Deep learning for supervised regression219

The architecture described in this section constitutes the energy functional regression branch of the hy-220

brid architecture described in Section 3.3. The regression task is completed via training an artiﬁcial neural221

network (ANN) with multiple layers. While there are other feasible options, such as support vector regres-222

sion machines (Drucker et al.,1997) or Gaussian process regression (Qui˜

nonero-Candela and Rasmussen,223

2005), we choose to train a multilayer perceptron (MLP) or often called feed-forward neural network due224

to the ease of implementations via various existing libraries and the fact that it is a universal function225

approximator.226

The formulation for the two-layer perceptron in Fig. 3, that will also used in this work, is presented227

below as a series of matrix multiplications:228

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 7

Fig. 3: A two-layer perceptron. The input vector xhas dfeatures, each of the two hidden layers hlhas m

neurons.

z1=xW1+b1(6)

h1=σ(z1)(7)

z2=h1W2+b2(8)

h2=σ(z2)(9)

zout =h2W3+b3(10)

ˆy=σout(zout). (11)

In the above formulation, the input vector xlcontains the features of a sample, the weight matrix Wl

229

contains the weights - parameters of the network, and blis the bias vector for every layer. The function σ230

is the chosen activation function for the hidden layers. In the current work, the ELU function is used as an231

activation function for the MLP hidden layers, deﬁned as:232

ELU(α) = eα−1, α<0

α,α≥0. (12)

The vector hlcontains the activation function values for every neuron in the hidden layer. The vector ˆy233

is the output vector of the network with linear activation σout(•) = (•).234

Deﬁning yas the true function values corresponding to the inputs x, then the MLP architecture could235

be simpliﬁed as an approximator function ˆy=ˆy(x|W,b)of the true function ywith inputs xparametrized236

by Wand b, such that:237

W0,b0=argmin

W,b

`(ˆy(x|W,b),y), (13)

where W0and b0are the optimal weights and biases of the neural network that arrive from the opti-238

mization - training process such that a deﬁned loss function `is minimized. The loss functions used in this239

work are discussed in Section 4.240

The fully-connected (Dense) layer that is used as the hidden layer for a standard MLP architecture has241

the following general formulation:242

h(l+1)

dense =σ(h(l)W(l)+b(l)). (14)

In the supervised learning branch, the neural network consists of two layers and each layer contains243

200 neurons. The number of layers and the number of neurons per layer are hyperparameters. The optimal244

combination of hyperparameters can be estimated through repeated trial and error or sometimes through245

Bayesian optimization (Gardner et al.,2014). To examine if overﬁtting occurs, we use a k-fold validation246

8 Nikolaos N. Vlassis et al.

to split the training and testing data and measure the differences in performance when the trained neural247

network is used to make predictions within and outside the training data. A brief review of this issue can248

be found in Wang and Sun (2018).249

3.2 Graph convolution network for unsupervised classiﬁcation of polycrystals250

Geometric learning refers to the extension of previously established neural network techniques to251

graph structures and manifold-structured data. Graph Neural Networks (GNN) refer to a speciﬁc type252

of neural networks architectures that operate directly on graph structures. An extensive summary of dif-253

ferent graph neural network architectures currently developed can be found in (Wu et al.,2019). Graph254

convolution networks (GCN) (Defferrard, Bresson, and Vandergheynst,2016;Kipf and Welling,2017) are255

variations of graph neural networks that bear similarities with the highly popular convolutional neural256

network (CNN) algorithms, commonly used in image processing (Lecun, Bottou, Bengio, and Haffner,257

1998;Krizhevsky, Sutskever, and Hinton,2012). The mutual term convolutional refers to the use of ﬁlter258

parameters that are shared over all locations in the graph similar to image processing. Graph convolu-259

tion networks are designed to learn a function of features or signals in graphs G= (V,E)and they have260

demonstrated competitive scores at tasks of classiﬁcation (Kipf and Welling,2017;Simonovsky and Ko-261

modakis,2017;Kearnes, McCloskey, Berndl, Pande, and Riley,2016;Altae-Tran, Ramsundar, Pappu, and262

Pande,2017).263

In this current work, we utilize a GCN layer implementation similar to that introduced in (Kipf and264

Welling,2017). The implementation is based on the open-source neural network library Keras (Chollet265

et al.,2015) and the open-source library Spektral (Grattarola,2019) on graph neural networks. The GCN266

layers will be the ones that learn from the polycrystal connectivity graph information. A GCN layer ac-267

cepts two inputs, a symmetric normalized graph Laplacian matrix Lsym and a node feature matrix Xas268

described in Section 2.1. The matrix Lsym holds information about the graph structure. The matrix Xholds269

information about the features of every node in the graph - every crystal in the polycrystal. In matrix form,270

the GCN layer has the following structure:271

h(l+1)

GCN =σ(Lsymh(l)W(l)+b(l)). (15)

In the above formulation, hlis the output of a layer l. For l=0, the ﬁrst GCN layer of the network272

accepts the graph features as input such that h0=X. For l>1, hrepresents a higher dimensional repre-273

sentation of the graph features that are produced from the convolution function, similar to a CNN layer.274

The function σis a non-linear activation function. In this work, the GCN layers use the Rectiﬁed Linear275

Unit activation function, deﬁned as ReLU(•) = max(0, •). The weight matrix Wland bias vector blare the276

parameters of the layer that will be optimized during training.277

The matrix Lsym acts as an operator on the node feature matrix Xso that, for every node, the sum of278

every neighboring node features and the node itself is accounted for. The i-th row of the feature matrix279

h(l)represents the weights for the i-th node/crystal in the graph. The output of the multiplication with280

the i-th row of the feature matrix, controlled by the i-th row of the Lsym matrix, corresponds to a weighted281

aggregation output of the features of the i-th node and all its neighbors. In order to include the features282

of the node itself, the matrix Lsym is derived, as deﬁned in Section 2.1 and demonstrated in Appendix B,283

from the binary adjacency matrix ˆ

Aallowing self-loops and the equivalent degree matrix D. Using the284

normalized Laplacian matrix Lsym , instead of the adjacency matrix ˆ

A, for feature ﬁltering remedies possible285

numerical and vanishing / exploding gradient issues when using the GCN layer in deep neural networks.286

This type of spatial ﬁltering can be of great use in constitutive modeling of microstructures where both287

the statistics and topology of attributes may both signiﬁcantly affect the macroscopic outcome. In the case288

of the polycrystals, for example, the neural network model does not solely learn on the features of every289

crystal separately. It also learns by aggregating the features of the neighboring crystals in the graph to290

potentially reveal a behavior that stems from the feature correlation among different nodes. This property291

deems this ﬁltering function a considerable candidate for learning on spatially heterogeneous material292

structures.293

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 9

Fig. 4: Hybrid neural network architecture. The network is comprised of two branches - a graph convo-

lutional encoder and a multi-layer perceptron. The ﬁrst branch accepts the graph structure (normalized

Laplacian Lsym) and graph weights (feature matrix X) (Input A) as inputs and outputs an encoded feature

vector. The second branch accepts the concatenated encoded feature vector and the right Cauchy–Green

deformation tensor Cin Voigt notation (Input B) as inputs and outputs the energy functional ˆ

ψprediction.

Remark 2 It is noted that the GCN method can use unweighted graphs as input - in that case the feature294

matrix is a vector of length Nwith every component equal to unity, as suggested in Kipf and Welling (2017).295

However, due to the signiﬁcantly less amount of information represented by unweighted graphs, we spec-296

ulate that the performance of the resultant trained neural network with unweighted graphs is likely to be297

inferior to that trained on the weighted graph counterparts. The effects of incorporating different combi-298

nations of node features on the performance of predictions are examined in the numerical experiments299

performed in Section 7.300

3.3 Hybrid neural network architecture for simultaneous unsupervised classiﬁcation and regression301

The hybrid network architecture employed in this current work is designed to perform two tasks si-302

multaneously, guided by a common objective function. This hybrid design is ﬁrst applied to mechanics303

problem by Frankel et al. (2019) who combine a spatial convolution network with a recurrent neural net-304

work to predict constitutive responses. In this work, we adopt the hybrid design where a graph convolu-305

tional neural network is combined with a feed-forward neural network to generate elastic stored energy306

that leads to constitutive responses. While both approaches generate feature vectors as additional inputs307

for the mechanical predictions, the feature vectors generated from the data stored in voxels (i.e. the Eu-308

clidean data) in Frankel et al. (2019) and the feature vectors generated from the weighted graph (i.e. the309

non-Euclidean data) are fundamentally different. This is due to the graph convolutional approach requires310

only grain-scale data where all the feature of the crystal grain is stored as weights at each node, whereas311

the spatial convolutional approach uses information that is stored at each voxel and, hence, potentially312

much larger, especially for higher voxel grid resolutions.313

The ﬁrst task is the unsupervised classiﬁcation of the connectivity graphs of the polycrystals. This is314

carried through by the ﬁrst branch of the hybrid architecture that resembles that of a convolutional encoder,315

commonly used in image classiﬁcation (Lecun, Bottou, Bengio, and Haffner,1998;Krizhevsky, Sutskever,316

and Hinton,2012) and autoencoders (Ranzato et al.,2007;Vincent et al.,2008). However, the convolutional317

layers are now following the aforementioned GCN layer formulation. A convolutional encoder passes a318

complex structure (i.e images, graphs) through a series of ﬁlters to generate a higher level representation319

and encode - compress the information in a structure of lower dimensions (i.e. a vector). It is common320

practice, for example, in image classiﬁcation (Krizhevsky et al.,2012), to pass an image through a series of321

10 Nikolaos N. Vlassis et al.

stacked convolutional layers, that increase the feature space dimensionality, and then encode the informa-322

tion in a vector through a multilayer perceptron - a series of stacked fully connected layers. The weights323

of every layer in the network are optimized using a loss function so that the output vector matches the324

classiﬁcation labels of the input image.325

A similar concept is employed for the geometric learning encoder branch of the hybrid architecture.326

This branch accepts as inputs the normalized graph Laplacian and the node feature matrices. The graph327

convolutional layers read the graph features and increase the dimensionality of the node features. These328

features are ﬂattened and then fed into fully connected layers that encode the graph information in a329

feature vector.330

The second task performed by the hybrid network is a supervised regression task - the prediction of331

the energy functional. The architecture of this branch of the network follows that of a simple feed-forward332

network with fully connected layers, similar to the one described in Section 3.1. The input of this branch333

is the encoded feature vector, arriving from the geometric learning encoder branch, concatenated with the334

second-order right Cauchy–Green deformation tensor Cin Voigt vector notation. The output of this branch335

is the predicted energy functional ˆ

ψ. It is noted that in this current work, an elastic energy functional is pre-336

dicted and the not history-dependent behavior can be adequately mapped with feed-forward architectures.337

Applications of geometric learning on plastic behavior will be the object of future work and will require338

recurrent network architectures that can capture the material’s behavior history, similar to Wang and Sun339

(2018).340

The layer weights of these two branches are updated in tandem with a common back-propagation341

algorithm and an objective function that rewards the better energy functional and stress ﬁeld predictions,342

using a Sobolev training procedure, described in Section 4.343

Simultaneously, we implement regularization on the graph encoder branch of the hybrid architecture,344

in the form of Dropout layers (Srivastava et al.,2014). We have discovered that regularization techniques345

provide a competent method for combating overﬁtting issues, addressed later in this work. This work is a346

ﬁrst attempt to utilizing geometric learning in material mechanics and the model reﬁnement will be consid-347

ered when approaching more complex problems in the future (e.g. history-dependent plasticity problems).348

4 Sobolev training for hyperelastic energy functional predictions349

In principle, forecast engines for elastic constitutive responses are trained by (1) an energy-conjugate350

pair of stress and strain measures (Ghaboussi, Garrett Jr, and Wu,1991;Wang and Sun,2018;Leﬁk, Boso,351

and Schreﬂer,2009), (2) a power-conjugate pair of stress and strain rates (Liu et al.,2019) and (3) a pair of352

strain measure and Helmholtz stored energy (Lu et al.,2019;Huang et al.,2019). While options (1) and (2)353

can both be simple and easy to train once the proper conﬁguration of the neural networks is determined,354

one critical drawback is that the resultant model may predict non-convex energy response and exhibit355

ad-hoc path-dependence (Zytynski et al.,1978;Borja et al.,1997).356

An alternative is to introduce supervised learning that takes strain measure as input and output the357

stored energy functional. This formulation leads to the so-called hyperelastic or Green-elastic material,358

which postulates the existence of a Helmholtz free-energy function (Holzapfel et al.,2000). The concept of359

learning a free energy function as a means to describe multi-scale materials has been previously explored360

(Le, Yvonnet, and He,2015;Teichert, Natarajan, Van der Ven, and Garikipati,2019). However, without361

direct control of the gradient of the energy functional, the predicted stress and elastic tangential operator362

may not be sufﬁciently smooth unless the activation functions and the architecture of the neural network363

are carefully designed. To rectify the drawbacks of these existing options, we leverage the recent work on364

Sobolev training (Czarnecki et al.,2017) in which we incorporate both the stored elastic energy functional365

and the derivatives (i.e. conjugate stress tensor) into the loss function such that the objective of the training366

is not solely minimizing the errors of the energy predictions but the discrepancy of the stress response as367

well.368

Traditional deep learning regression algorithms aim to train a neural network to approximate a function369

by minimizing the discrepancy between the predicted values and the benchmark data. However, the metric370

or norm used to measure discrepancy is often the L2norm, which does not regularize the derivative or371

gradients of the learned function. When combined with the types of activation functions that include a372

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 11

high-frequency basis, the learned function may exhibit spurious oscillations and, hence, be unsuitable for373

training hyperelastic energy function that requires convexity.374

The Sobolev training method that we adopt from Czarnecki et al. (2017) is designed to maximize the375

utilization of data by leveraging the available additional higher order data in the form of higher order376

constraints in the training objective function. In the Sobolev training, objective functions are constructed377

for minimizing the HKSobolev norms of the corresponding Sobolev space. Recall that a Sobolev space378

refers to the space of functions equipped with norm comprised of Lpnorms of the functions and their379

derivatives up to a certain order K.380

Since it has been shown that neural networks with the ReLU activation function (as well as functions381

similar to that) can be universal approximators for C1functions in a Sobolev space (Sonoda and Murata,382

2017), our goal here is to directly predict the elastic energy functional by using the Sobolev norm as loss383

function to train the hybrid neural network models.384

This current work focuses on the prediction of an elastic stored energy functional listed in Eq. 1, thus,385

for simplicity, the superscript e(denoting elastic behavior) will be omitted for all energy, strain, stress, and386

stiffness scalar and tensor values herein. In the case of the simple MLP feed-forward network, the network387

can be seen as an approximator function ˆ

ψ=ˆ

ψ(C|W,b)of the true energy functional ψwith input the right388

Cauchy–Green deformation tensor C, parametrized by weights Wand biases b. In the case of the hybrid389

neural network architecture, the network can be seen as an approximator function ˆ

ψ=ˆ

ψ(C,G|W,b)of the390

true energy functional ψwith input the polycrystal connectivity graph information (as described in Fig. 4)391

and the tensor C, parametrized by weights Wand biases b. The ﬁrst training objective in Equation 16 for392

the training samples i∈[1, ..., N]is modeled after an L2norm, constraining only ψ:393

W0,b0=argmin

W,b 1

N

N

∑

i=1

ψi−ˆ

ψi

2

2!. (16)

The second training objective in Equation 17 for the training samples i∈[1, ..., N]is modeled after394

an H1norm, constraining both ψand its ﬁrst derivative with respect to C- i.e. one half of the 2nd Piola395

Kirchhoff stress tensor S:396

W0,b0=argmin

W,b 1

N

N

∑

i=1

ψi−ˆ

ψi

2

2+

∂ψi

∂Ci−∂ˆ

ψi

∂Ci

2

2!, (17)

where in the above:397

S=2∂ψ

∂C. (18)

It is noted that higher order objective functions can be constructed as well, such as an H2norm con-398

straining the predicted ˆ

ψ, stress, and stiffness values. This would be expected to procure even more ac-399

curate ˆ

ψresults, smoother stress predictions, and more accurate stiffness predictions. However, since a400

neural network is a combination of linear functions - the second-order derivative of the ReLU and its adja-401

cent activation functions is zero, it becomes innately difﬁcult to control the second-order derivative during402

training, thus in this work we mainly focus on the ﬁrst-order Sobolev method. In case it is desired to con-403

trol the behavior of the stiffness tensor, a ﬁrst-order Sobolev training scheme can be designed with strain404

as input and stress as output. The gradient of this approximated relationship would be the stiffness tensor.405

This experiment would also be meaningful and useful in ﬁnite element simulations.406

It is noted that, in this current work, the Sobolev training is implemented using the available stress in-407

formation as the higher order constraint, assuring that the predicted stress tensors are accurate component-408

wise. In simpler terms, the H1norm constrains every single component of the second-order stress tensor.409

It is expected that this could be handled more efﬁciently and elegantly by constraining the spectral de-410

composition of the stress tensor - the principal values and directions. It has been shown in (Heider et al.,411

2020) that using loss functions structured to constrain tensorial values in such a manner can be beneﬁcial412

in mechanics-oriented problems and will be investigated in future work.413

12 Nikolaos N. Vlassis et al.

Fig. 5: Schematic of the training procedure of a hyperelastic material surrogate model with the right

Cauchy–Green deformation tensor Cas input and the energy functional ˆ

ψas output. A Sobolev trained

surrogate model will output smooth ˆ

ψpredictions and the gradient of the model with respect to Cwill be

a valid stress tensor ˆ

S.

5 Veriﬁcation exercises for checking compatibility with physical constraints414

While data-driven techniques, such as the neural network architectures discussed in this work, have415

provided unprecedented efﬁciency in generating constitutive laws, the consistency of these laws with well-416

known mechanical theory principles can be rather dubious. Generating black-box constitutive models by417

blindly learning from the available data is considered to be one of the pitfalls of data-driven methods.418

If the necessary precautions are not taken, a data-driven model while appearing to be highly accurate in419

replicating the behaviors discerned from the available database, it may lack the utility of a mechanically420

consistent law and, thus, be inappropriate to use in describing physical phenomena. In this work, we421

leverage the mechanical knowledge on fundamental properties of hyperelastic constitutive laws to check422

and - if necessary - enforce the consistency of the approximated material models with said properties.423

In particular for this work, the generated neural network energy functional models are tested for their424

objectivity, isotropy (or lack of), and convexity properties. A brief discussion of these desired properties is425

presented in this section.426

5.1 Objectivity427

Objectivity requires that the energy and stress response of a deformed elastic body remains unchanged428

when rigid body motion takes place. The trained models are expected to meet the objectivity condition - i.e.429

the material response should not depend on the choice of the reference frame. While translation invariance430

is automatically ensured by describing the material response as a function of the deformation, invariance431

for rigid body rotations is not necessarily imposed and must be checked. For a given microstructure repre-432

sented by a graph G, the deﬁnition of objectivity for an elastic energy functional ψformulation is described433

as follows (Borja,2013;Kirchdoerfer and Ortiz,2016):434

ψ(QF,G) = ψ(F,G)for all F∈GL+(3, R),Q∈SO(3), (19)

where Qis a rotation tensor. The above deﬁnition can be proven to expand for the equivalent stress435

and stiffness measures:436

P(QF,G) = QP(F,G)for all F∈G L+(3, R),Q∈SO(3), (20)

c(QF,G) = QQc(F,G)for all F∈GL+(3, R),Q∈SO(3). (21)

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 13

Thus, a constitutive law is frame-indifferent, if the responses for the energy, the stress, and stiffness437

predictions are left rotationally invariant. This is automatically satisﬁed when the response is modeled as438

an equivalent function of the right Cauchy-Green deformation tensor C, since:439

C+= (F+)TF+=FTQTQF =FTF≡C, for all F+=QF. (22)

By training all the models in this work as a function of the right Cauchy-Green deformation tensor C,440

this condition is automatically satisﬁed and, thus, it will not be further checked.441

5.2 Isotropy442

For a given microstructure represented by a graph G, the material response described by a constitutive443

law is expected to be isotropic, if the following is valid:444

ψ(FQ,G) = ψ(F,G)for all F∈GL+(3, R),Q∈SO(3). (23)

This expands to the stress and stiffness response of the material:445

P(FQ,G) = P(F,G)Qfor all F∈GL+(3, R),Q∈SO(3), (24)

c(FQ,G) = c(F,G)QQ for all F∈GL+(3, R),Q∈SO(3). (25)

Thus, for a material to be isotropic, its response must be right rotationally invariant. In the case that the446

response is anisotropic, as in the inherently anisotropic material studied in this work, the above should not447

be valid.448

5.3 Convexity449

To ensure the thermodynamical consistency of the trained neural network models, the predicted energy450

functional must be convex. Testing the convexity of a black box data-driven function without an explicitly451

stated equation is not necessarily a straightforward process. There have been developed certain algorithms452

to estimate the convexity of black-box functions (Tamura and Gallagher,2019), however, it is outside the453

scope of this work and will be considered in the future. While convexity would be straight-forward to454

visually check for a low-dimensional function, this is not necessarily true for a high-dimensional function455

described by the hybrid models.456

A function f:Rn→Ris convex over a compact domain Dif for all x,y∈Dand all λ∈[0, 1], if:457

f(λx+ (1−λ)y)≤λf(x) + (1−λ)f(y). (26)

For a twice differentiable function f:Rn→Rover a compact domain D, the deﬁnition of convexity458

can be proven to be equivalent with the following statement:459

f(y)≥f(x) + ∇f(x)T(y−x), for all x,y∈D. (27)

The above can be interpreted as the ﬁrst-order Taylor expansion at any point of the domain being a460

global under-estimator of the function f. In terms of the approximated black-box function ˆ

ψ(C,G)used in461

the current work, the inequality 27 can be rewritten as:462

ˆ

ψ(Cα,G)≥ˆ

ψ(Cβ,G) + ∂ˆ

ψ

∂C(Cβ,G):(Cα−Cβ), for all Cα,Cβ∈D. (28)

The above constitutes a necessary condition for the approximated energy functional for a speciﬁc poly-463

crystal (represented by the connectivity graph G) to be convex, if it is valid for any pair of right Cauchy464

deformation tensors Cαand Cβin a compact domain D. This check is shown to be satisﬁed in Section 7.3.4.465

14 Nikolaos N. Vlassis et al.

Remark 3 The trained neural network models in this work will be shown in Section 7to satisfy the checks466

and necessary conditions for being consistent with the expected objectivity, anisotropy, and convexity prin-467

ciples. However, in the case where one or more of these properties appears to be absent, it is noted that it468

can be enforced during the optimization procedure by modifying the loss function. Additional weighted469

penalty terms could be added to the loss function to promote consistency to required mechanical princi-470

ples. For example, in the case of objectivity, the additional training objective, parallel to those expressed in471

Eq. 16 and 17, could be expressed as:472

W0,b0=argmin

W,b 1

N

N

∑

i=1

λ

ˆ

ψ(QF,G|W,b)−ˆ

ψ(F,G|W,b)

2

2!,Q∈SO(3), (29)

where λis a weight variable, chosen between [0, 1], setting the importance of this objective in the now473

multi-objective loss function, and Qare randomly sampled rigid rotations from the SO(3)group. Con-474

straints of this kind were not deemed necessary in the current paper and will be investigated in future475

work.476

6 FFT ofﬂine database generation477

This section ﬁrstly introduces the fast Fourier transform (FFT) based method for the mesoscale homog-478

enization problem, which was chosen to efﬁciently provide the database of graph structures and material479

responses to be used in geometric learning. Following that, the anisotropic Fung hyperelastic model is480

brieﬂy summarized as the constitutive relation at the basis of the simulations. Finally, the numerical setup481

is introduced focusing on the numerical discretization, grain structure generation, and initial orientation482

of the structures in question.483

6.1 FFT based method with periodic boundary condition484

This section deals with solving the mesoscale homogenization problem using an FFT-based method.485

Supposing that the mesoscale problem is deﬁned in a 3D periodic domain, where the displacement ﬁeld is486

periodic while the surface traction is anti-periodic, the homogenized deformation gradient Fand ﬁrst P-K487

stress Pcan be deﬁned as:488

F=hFi,P=hPi, (30)

where h·i denotes the volume average operation.489

Within a time step, when the average deformation gradient increment ∆Fis prescribed, the local stress490

Pwithin the periodic domain can be computed by solving the Lippman-Schwinger equation:491

F+Γ0∗P(F)−C0:F=F, (31)

where ∗denotes a convolution operation, Γ0is Green’s operator, and C0is the homogeneous stiffness of492

the reference material. The convolution operation can be conveniently performed in the Fourier domain, so493

the Lippman-Schwinger equation is usually solved by the FFT based spectral method (Ma and Sun,2019).494

Note that due to the periodicity of the trigonometric basis functions, the displacement ﬁeld and the strain495

ﬁeld are always periodic.496

6.2 Anisotropic Fung elasticity497

An anisotropic elasticity model at the mesoscale level is utilized to generate the homogenized response498

database for then training the graph-based model in the macroscale. In this section, a generalized Fung499

elasticity model is utilized as the mesoscale constitutive relation due to its frame-invariance and convenient500

implementation (Fung,1965;Ateshian and Costa,2009).501

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 15

In the generalized Fung elasticity model, the strain energy density function Wis written as:502

W=1

2c[exp (Q)−1],Q=1

2E:a:E, (32)

where cis a scalar material constant, Eis the Green strain tensor, and ais the fourth-order stiffness tensor.503

The material anisotropy is reﬂected in the stiffness tensor a, which is a function of the spatial orientation504

and the material symmetry type.505

For a material with orthotropic symmetry, the strain energy density can be written in a simpler form506

as:507

Q=c−13

∑

a=1"2µaA0

a:E2+

3

∑

b=1

λab A0

a:EA0

b:E#,A0

a=a0

a⊗a0

a, (33)

where µaand λab are anisotropic Lam´

e constants, and a0

ais the unit vector of the orthotropic plane normal,508

which represents the orientation of the material point in the reference conﬁguration. Note that λab is a509

symmetric second-order tensor, and the material symmetry type becomes cubic symmetry when certain510

values of λand µare adopted.511

The elastic constants take the value:512

c=2 (MPa), λ=

0.6 0.7 0.6

0.7 1.4 0.7

0.6 0.7 0.5

(MPa), µ=

0.1

0.7

0.5

(MPa), (34)

and remain constant across all the mesoscale simulations. The only changing variable is the grain structure513

and the initial orientation of the representative volume element (RVE), which is introduced in the following514

section.515

6.3 Numerical aspects of the database generation516

The grain structures and initial orientations of the mesoscale simulations are randomly generated in517

the parameter space to generate the database. The mesoscale RVE is equally divided into 49 ×49 ×49 grid518

points to maintain a high enough resolution at an acceptable computational cost. The grain structures are519

generated by the open-source software NEPER (Quey et al.,2011). An equiaxed grain structure is randomly520

generated with 40 to 50 grains. A sample RVE is shown in Figure 6.521

The initial orientations are generated using the open source software MTEX (Bachmann et al.,2010).522

The orientation distribution function (ODF) is randomly generated by combining uniform orientation and523

unimodal orientation:524

f(x;g) = w+ (1−w)ψ(x,g),x∈SO(3), (35)

where w∈[0, 1]is a random weight value, g∈SO(3)is a random modal orientation, and ψ(x,g)is525

the von Mises–Fisher distribution function considering cubic symmetry. The half width of the unimodal526

texture ψ(x,g)is 10◦, and the preferential orientation gof the unimodal texture is also randomly generated.527

A sample initial ODF is shown in Figure 6(b).528

The average strain is randomly generated from an acceptable strain space, and simulations are per-529

formed for each RVE with 200 average strains. Note that the constitutive relation is hyperelastic, so the530

simulation result is path independent. To avoid any numerical convergence issues, the range of each strain531

component (F−I)is between 0.0 and 0.1 in the RVE coordinate.532

7 Numerical Experiments533

One major advantage of the hybrid (Frankel et al.,2019) or graph-based training (Wang and Sun,2018)534

is that the resultant neural network is not only suitable to be a surrogate model for one RVE but a fam-535

ily of RVEs with different microstructures. In this section, we present the results of 13 sets of numerical536

experiments grouped in four subsections to examine and demonstrate the performances of the neural net-537

work models we trained. In section Section 7.1, we conducted a numerical experiment to examine the538

16 Nikolaos N. Vlassis et al.

(a) Sample polycrystal microstructure. (b) Sample initial orientation.

Fig. 6: Sample of the randomly generated initial microstructure: (a) Initial RVE with 50 equiaxed grains,

which is equally discretized by 49 ×49 ×49 grid points; (b) Pole ﬁgure plot of initial orientation distribu-

tion function (ODF) combining uniform and unimodal ODF. The Euler angles of the unimodal direction

are (304◦, 61◦, 211◦)in Bunge notation, and the half width of the unimodal ODF is 10◦. The weight value

is 0.50 for uniform ODF and 0.50 for unimodal ODF.

neural network trained by the Sobolev method and compare the predictions obtained from the classical539

loss function that employs L2norm. In Section 7.2, we include 4 sets of numerical experiments (a k-fold540

validation of a Sobolev trained MLP on data for a single polycrystal, a number of input features test for541

the hybrid architecture, an overﬁtting check, and a comparison of the k-fold validation results from both542

the hybrid and the MLP models). In Section 7.3, we include 5 additional veriﬁcation tests (homogenization543

experiments, blind predictions for microstructures in the training set, blind predictions for microstructures544

in the testing set, an isomorphism check, and a model convexity check) to further examine whether the545

predictions violate any necessary conditions that are not explicitly enforced in the objective functions but546

are crucial for forward predictions. In Section 7.4, we introduced 3 set of tests (dynamic simulations of a547

single polycrystal, mesh reﬁnement simulations, comparison of crack patterns for different polycrystal in-548

puts) to demonstrate the potential applications of the hyperelastic energy functional predictions for brittle549

fracture problems using the hybrid architecture. Each of these experiments include multiple calculations550

that involve both predictions within the calibration range and forward predictions for unseen data previ-551

ously unused in the training process. The abbreviations we have used for all the model architectures we552

implemented and tested are summarized in Table 1.

Table 1: Summary of the considered model and training algorithm combinations.

Model Description

ML2

ml p Multilayer perceptron feed-forward architecture. Loss function used is

the L2norm (Eq. 16)

MH1

ml p Multilayer perceptron feed-forward architecture. Loss function used is

the H1norm (Eq. 17)

MH1

hybrid Hybrid architecture described in Fig. 4. Loss function used is the H1

norm (Eq. 17)

MH1

reg Hybrid architecture described in Fig. 4. Loss function used is the H1

norm (Eq. 17). The geometric learning branch of the network is regular-

ized against overﬁtting.

553

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 17

To compare the performance of the training and testing results, the scaled MSE performances of differ-554

ent models are represented using non-parametric, empirical cumulative distribution functions (eCDFs), as555

in (Kendall et al.,1946;Gentle,2009). The results are plotted in scaled MSE vs eCDF curves in a semilog-556

arithmic scale for the training and testing partitions of the dataset. The distance between these curves can557

be a qualitative metric for the performance of the various models on various datasets - e.g. the distance of558

the eCDF curves of a model for the training and testing datasets is a qualitative metric of the overﬁtting559

phenomenon. For a dataset Mwith MSEisorted in ascending order, the eCDF can be computed as follows:560

FM(MSE) =

0, MSE <MSE1,

r

M, MSEr≤MSE <MSEr+1,r=1, ..., M−1,

1, MSEM≤MSE .

(36)

To compare all the models in equal terms, the neural network training hyperparameters throughout561

all the experiments were kept identical wherever it was possible. All the strain and node weight inputs as562

well as the energy and stress outputs were scaled in the range between [0, 1]during the neural network563

training. The learning capacity of the models (i.e. layer depth and layer dimensions) for the multilayer564

perceptrons for ML2

ml p and MH1

ml p, as well as the multilayer perceptron branch of the MH1

hybrid and MH1

reg

565

were kept identical. The multilayer perceptron branch in all the networks consists of two Dense layers566

(200 neurons each) with ELU activation functions. The geometric learning branch consists of two GCN567

layers (64 ﬁlters each with ReLU activation functions) followed by two Dense layers (100 neurons each568

with ReLU activation functions). The selected encoded feature vector layer was chosen to have 9 neurons.569

For the MH1

reg model, Dropout layers (dropout rate of 0.2) are deﬁned between every GCN and Dense layer570

in the geometric learning branch. The optimizer used for the training of the neural networks was Nadam571

and all the networks were trained for 1000 epochs, utilizing an early stopping algorithm to terminate572

training when the performance would stop improving. The hyperparameter space of the neural network573

architecture was deemed rather large to perform a comprehensive parameter search study and the network574

was tuned through consecutive trial and error iterations. An illustrative example of this trial and error575

process to tune the number of neurons of the encoded feature vector is demonstrated in Appendix D. In576

this current work, the values used for the hyperparameters were deemed adequate to provide as accurate577

results as possible for all methods while maintaining fair comparison terms. The optimization of these578

hyperparameters to achieve the maximum possible accuracy will be the objective of future work.579

Remark 4 Since the energy functional ψand the stress values are on different scales of magnitude, the580

prediction errors are demonstrated using a common scaled metric. For all the numerical experiments in581

this current work, to demonstrate the discrepancy between the predicted values of energy (ψpred) and582

the equivalent true energy values (ψtrue) as well as between the predicted principal values of the 2nd583

Piola-Kirchhoff stress tensor (SA,pred) and the equivalent true principal values (SA,true) for A=1, 2, 3, the584

following scaled mean squared error (scaled MSE) metrics are deﬁned respectively for a sample of size M:585

scaled MSEψ=1

M

M

∑

i=1h(ψtrue)i−(ψpred )ii2with ψ:=ψ−ψmin

ψmax −ψmin

. (37)

scaled MSESA=1

3M

M

∑

i=1

3

∑

A=1h(SA,true)i−(SA,pred )ii2with SA:=SA−SA,min

SA,max −SA,min

, (38)

The functions mentioned above scale the values ψpred,ψtrue,SA,pred , and SA,true to be in the feature586

range [0, 1]. It is noted that the scaling functions ψand SAare deﬁned on the training data set - i.e. the587

values ψmin,ψmax,SA,min, and SA,max are derived from the true values of the training data.588

Remark 5 When comparing the performance of models in predicting the directions of a second-order stress589

tensor, we utilize a distance function between two rotation tensors R1,R2belonging to the Special Or-590

thogonal Group, SO(3). The rotation tensors are constructed by concatenating the orthogonal, normalized591

eigenvectors of the stress tensors. The Euclidean distance measure φEu, discussed in detail in (Huynh,2009;592

Heider, Wang, and Sun,2020), can be expressed as:593

18 Nikolaos N. Vlassis et al.

φEu =qd(¯

φ1,¯

φ2)2+d¯

θ1,¯

θ22+d(¯

ψ1,¯

ψ2)2. (39)

In the above, {¯

φi,¯

θi,¯

ψi} ∈ E⊆R+are the set of Euler angles associated with Ri, and the Euclidean594

distance dbetween two scalar-valued quantities α1,α2is expressed as d(α1,α2)=min{|α1−α2|, 2π−595

|α1−α2|} ∈ [0, π]. The distance measure φEu belongs to the range [0, π√3]and the results used in the596

ﬁgures in this work are presented normalized in the range [0, 1]. For this distance measure, the statement597

φEu(R1,R2)is equivalent to R1=R2.598

7.1 Numerical Experiment 1: Generating an isotropic hyperelastic energy functional with Sobolev training599

In this section, a numerical experiment is presented to demonstrate the beneﬁts of training a neural600

network on hyperelastic energy functional data in the Sobolev training framework. The experiment was601

performed on synthetic data generated from a small-strain hyperelastic energy functional designed for the602

Modiﬁed Cam-Clay plasticity model (Roscoe and Burland,1968;Houlsby,1985;Borja et al.,2001). The603

hyperelastic elastic stored energy functional is described in a strain invariant space (volumetric strain ev,604

deviatoric strain es). The strain invariants are deﬁned as:605

ev=tr (e),es=r2

3kek,e=e−1

3ev1, (40)

where eis the small strain tensor and ethe deviatoric part of the small strain tensor. Using the chain606

rule, the Cauchy stress tensor can be described in the invariant space as follows:607

σ=∂ψ

∂ev

∂ev

∂e+∂ψ

∂es

∂es

∂e. (41)

In the above, the mean pressure pand deviatoric (von Mises) stress qcan be deﬁned as:608

p=∂ψ

∂ev≡1

3tr(σ),q=∂ψ

∂es≡r3

2ksk, (42)

where sis the deviatoric part of the Cauchy stress tensor. Thus, the Cauchy stress tensor can be ex-609

pressed by the stress invariants as:610

σ=p1+r2

3qbn, (43)

where bn=ee/keek=√2/3ee/ee

s. (44)

The hyperelastic energy functional allows full coupling between the elastic volumetric and deviatoric611

responses and is described as:612

ψ(ev,es)=−p0crexp ev0−ev

κ−3

2cµp0exp ev0−ev

κ(es)2, (45)

where ev0is the initial volumetric strain, p0is the initial mean pressure when ev=ev0,κ>0 is the613

elastic compressibility index, and cµ>0 is a constant. The hyperelastic energy functional is designed to614

describe an elastic compression law where the equivalent elastic bulk modulus and the equivalent shear615

modulus vary linearly with −p, while the mean pressure pvaries exponentially with the change of the616

volumetric strain ∆ev=ev0−ev. The speciﬁcs and the utility of this hyperelastic law is outside the scope617

of this current work and will be omitted. The numerical parameters of this model where chosen as ev0=0,618

p0=−100 KPa, cµ=5.4, and κ=0.018. By taking the partial derivatives of the energy functional with619

respect to the strain invariants, the stress invariants are derived as:620

p=∂ψ

∂ev

=p01+3cµ

2κ(es)2exp ev0−ev

κ(46)

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 19

q=∂ψ

∂es

=−3cµp0exp ev0−ev

κes(47)

To compare the performance of the Sobolev training method a two-layer feed-forward neural network621

is trained on synthetic data generated for the above hyperelastic law. The training data set includes 225 data622

points, sampled as shown in Fig. 7, 25 of which are randomly selected to be used as a validation set during623

training. The testing is performed on 1000 data points. The inputs of the neural network are the two strain624

invariants ev,esand the output is the predicted energy ψ. The network has two hidden Dense layers (100625

neurons each) with ELU activation functions and an output Dense layer with a linear activation function.626

The training experiment is performed with an L2norm loss function (constraining only the predicted ψ627

values) and with an H1norm loss function (constraining the predicted ψ,p, and qvalues).628

Fig. 7: Comparison of L2and H1norm training performance for a hyperelastic energy functional used for

the Modiﬁed Cam-Clay plasticity model (Borja et al.,2001).

The results of the two training experiments are shown in Fig. 7and Fig. 8. Both training algorithms seem629

to be able to capture the energy functional ψvalues well with the H1trained model demonstrating slightly630

higher accuracy. However, closer examination in the results shown in ﬁg. 8reveals that the neural network631

trained with a H1norm perform better both in predicting the energy functional and the ﬁrst derivative that632

leads to the stress invariants pand q. In particular, the neural network trained with the L2norm generates633

a mean pressure and deviatoric stress response that oscillates spuriously with respect to the strain whereas634

the H1counterpart produces results that exhibit no oscillation. Such oscillation is not desirable particularly635

if the neural network predictions were to be incorporated into an implicit ﬁnite element model.636

7.2 Numerical Experiment 2: Training an anisotropic hyperelastic model for polycrystals with637

non-Euclidean data638

To determine whether the incorporation of graph data improves the accuracy and robustness of the639

forward prediction, we conduct both the hybrid learning and the classical supervised machine learning.640

20 Nikolaos N. Vlassis et al.

Fig. 8: Comparison of L2and H1predictions for the energy functional ψ, the stress invariant p, and stress

invariant q.

The latter is used as a control experiment. The ability to capture the elastic stored energy functional of a641

single polycrystal is initially tested on that MLP model. A two-hidden-layer feed-forward neural network642

is trained and tested on 200 sample points - 200 different, randomly generated deformation tensors with643

their equivalent elastic stored energy and stress measures for only one of the generated microstructures.644

A Sobolev trained model as described in Section 4(model type MH1

ml p) was utilized. This architecture will645

also constitute the multilayer perceptron branch of the hybrid network described previously in Fig. 4. To646

eliminate as much as possible any objectivity on the dataset of the experiment, the network’s capability is647

tested with a K-fold cross-validation algorithm (cf. (Bengio and Grandvalet,2004)). The 200 sample points648

are separated into 10 different groups - folds of 20 sample points each and, recursively, a fold is selected as649

a testing set and the rest are selected as a training set for the network.650

The K-Fold testing results can be seen in Fig. 9where the model can predict the data for a single RVE651

formation adequately, as well as interpolate smoothly between the data points to generate the response652

surface estimations for the energy and the stress ﬁeld (Fig. 10). A good performance for both training and653

testing on a single polycrystal structure was expected as no additional input is necessary, other than the654

strain tensor. Any additional input - i.e. structural information - would be redundant in the training since655

it would be constant for the speciﬁc RVE.656

In this current work, we generalize the learning problem by introducing the polycrystal weighted con-657

nectivity graph as the additional input data. This connectivity graph is inferred directly from the micro-658

structure by assigning each grain in the poly-crystal as a vertex (node) and assigning an edge on each grain659

contact pair. As discussed in Section 2.1, the nodes of the graphs can have weights carrying information660

about the crystals they represent in the form of a feature matrix X. The available node features in the data661

set are described in Appendix C. A model of type MH1

hybrid is tested and trained on 150 polycrystals - 100662

RVEs in the training set and 50 RVEs in the testing set - with different sets of features to evaluate the effect663

of the node features to the model’s performance. Four combinations of data were tested: a model M1

hybrid

664

with the crystal volume as the only features, a model M4

hybrid with the crystal volume and the crystal Eu-665

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 21

Fig. 9: Correlation plot for true vs predicted K-fold testing results for the energy functional ψ(left) and the

ﬁrst component of the 2nd Piola-Kirchhoff stress tensor (right) by a surrogate neural network model MH1

ml p

trained on data for a single RVE.

Fig. 10: Estimated ψenergy functional surface (left) and the ﬁrst component of the 2nd Piola-Kirchhoff

stress tensor (right) generated by a surrogate neural network model (MH1

ml p) trained on data for a single

RVE.

ler angles as features, a model M8

hybrid that utilizes the crystal volumes, the Euler angles, the equivalent666

diameter, the number of faces, the total area of the faces, the number of neighbors, and, ﬁnally, a model667

M11

hybrid that utilizes all the available features. The abbreviations of the model names are also described in668

Table 2. The results of the training experiment are demonstrated in Figure 11. Increasing the available node669

features during training seems to generally increase the model’s performance in training and testing. The670

model that uses the crystal volumes as node features demonstrates the lowest performance. The largest671

improvement in performance is observed when the crystal Euler angles are included in the feature matrix.672

As it was previously mentioned in Section 3.3, the hybrid architecture proposed can be prone to overﬁt-673

ting. To avoid that, we utilize additional Dropout layers in the geometric learning branch of the network as674

a regularization method during the training. The models representing the hybrid architecture without and675

with regularization are of the type MH1

hybrid and MH1

reg respectively. To demonstrate that, the two models676

are tested and trained on 150 polycrystals - 100 RVEs in the training set and 50 RVEs in the testing set. The677

comparison results are shown in Fig. 12. While MH1

hybrid appears to be prone to overﬁtting - the training er-678

ror is lower than the blind prediction error. This issue can be alleviated with regularization techniques that679

22 Nikolaos N. Vlassis et al.

Table 2: Abbreviations of hybrid model names with different number of node weight features.

Model Node weight features

M1

hybrid crystal volume

M4

hybrid crystal volume and three Euler angles

M8

hybrid crystal volume, three Euler angles, equivalent diameter, number of

faces, total area of faces, and number of neighbors

M11

hybrid crystal volume, three Euler angles, equivalent diameter, number of

faces, total area of faces, number of neighbors, and centroid position

vector

Fig. 11: Comparison of the model’s performance for different number of node weight features (M1

hybrid,

M4

hybrid,M8

hybrid, and M11

hybrid) for the second Piola - Kirchhoff stress Stensor principal values (scaled

MSE) and direction predictions (φEu). The abbreviations of the model names are described in Table 2.

promote the model’s robustness in blind predictions. This can be qualitatively seen on the scaled MSE vs680

eCDF plot for the MH1

reg model - the distance between training and testing curves closes for the regularized681

model. Since the MH1

reg model appears to procure superior results in blind prediction accuracy compared to682

the not regularized model MH1

hybrid, from this point on, we will be working and comparing with the MH1

reg

683

and omitting the MH1

hybrid for simplicity reasons.684

The ability of the hybrid architecture proposed in Fig. 4to leverage the information from a weighted685

connectivity graph to expand learning over multiple polycrystals in comparison with the classical mul-686

tilayer perceptron methods is tested in the following experiment. A K-fold validation algorithm is per-687

formed on 100 generated polycrystal RVEs. The 100 RVEs are separated into 5 folds of 20 RVEs each. In688

doing so, every polycrystal RVE will be considered as blind data for the model at least once. The K-fold689

cross-validation algorithm is repeated for the model architectures and training algorithms ML2

ml p,MH1

ml p,690

and MH1

reg . The results are presented in Fig. 13 as scaled MSE vs eCDF curves for the energy functional ψ691

and second Piola-Kirchhoff stress Stensor principal values and as φEu vs eCDF for the principal direction692

predictions. It can be seen that using the Sobolev training method greatly reduces the blind prediction693

errors - both the ML2

ml p energy and stress prediction errors are higher than those of the MH1

ml p and MH1

reg

694

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 23

Fig. 12: Scaled mean squared error comparison for the second Piola - Kirchhoff stress Stensor principal

value and φEu error for the Sdirection predictions for the models MH1

hybrid and MH1

reg .

models. The MH1

reg model demonstrates superior predictive results than the MH1

ml p model, as it can distin-695

guish between different RVE behaviors.696

In addition to the performance measured by this quantitative metric, enabling the weighted graph as697

an additional input for the hybrid network also provides the opportunity to further generalize the learning698

process. In ﬁgure 14, the energy potential surface estimations are shown for the simple multilayer percep-699

tron and the hybrid architecture for two different polycrystals. Without the graph as input, the network700

cannot distinguish behaviors, while the hybrid architecture estimates two distinctive elastic stored energy701

surfaces. The weighted connectivity graph of each polycrystal is encoded in a perceivably different feature702

vector that aids the downstream multilayer perceptron to identify and interpolate among different behav-703

iors for the RVEs. Furthermore, this hybrid strategy, if trained successfully and carefully validated, is also704

potentially more efﬁcient than a black-box surrogate model, as the hybrid model does not require a new705

training process when encountering a new RVE that can be sufﬁciently described by a weighted graph.706

7.3 Numerical Experiment 3: Veriﬁcation tests on unseen data707

To ensure that the constitutive response predicted by the trained neural network is consistent with the708

known mechanics principles, we introduce numerical experiments to verify our ML model and assess the709

accuracy and robustness of the predictions made by the graph-based model. The predictive capacity of710

the neural network is tested for blind predictions against homogeneous RVE simulations and new FFT711

simulations performed on microstructures within and out of the range of the training dataset. The hybrid712

architecture is also tested on whether it can satisfy the isomorphism condition of the graph inputs and on713

whether it produces convex energy functionals.714

7.3.1 Veriﬁcation Test 1: Responses of unseen homogeneous anisotropic RVEs715

In this blind veriﬁcation, our goal is to check whether the machine learning model predicts the right716

anisotropic responses of a homogeneous RVE. We use the model trained with the training data described717

in Appendix Cto make a forward prediction on 5 RVEs with all grains of the same crystalline orientation.718

Since there is no cohesive zone model used in the grain boundary, setting the crystal orientation identical719

to all the grains essentially makes the RVEs homogeneous.720

The ML model then takes the weighted graph that represents the topology of the microstructures as721

additional input and is used to predict the constitutive responses of these 5 extra microstructures unseen722

24 Nikolaos N. Vlassis et al.

Fig. 13: Scaled MSE vs eCDF curves for ψenergy functional (top left), scaled MSE vs eCDF second Piola-

Kirchhoff stress Stensor principal values (top right), and φEu vs eCDF second Piola-Kirchhoff stress S

tensor principal direction predictions (bottom) for the models ML2

ml p,MH1

ml p, and MH1

reg . The models’ per-

formance is tested with a K-fold algorithm on a dataset of 100 RVEs - only the blind prediction results are

shown.

during the training. The results of uniaxial unconﬁned tensile tests performed on the 5 RVEs of two dif-723

ferent orientations (0◦, 0◦, 0◦)and (45◦, 45◦, 45◦)are compared with the benchmark solution as shown in724

15. Meanwhile, we also applied pure shear loading on all three pairs of orthogonal planes of these 5 RVES725

with (0◦, 0◦, 0◦)orientation. The comparison with the benchmark solution is shown in 16.726

7.3.2 Veriﬁcation Test 2: Blind test of RVEs with unseen FFT simulations727

In this veriﬁcation test, we test the model’s blind prediction capabilities against unseen data generated728

by FFT simulations. The hybrid architecture model is tested against uniaxial unconﬁned tension tests and729

pure shear tests conducted using the FFT solver. The hybrid architecture used in this test was trained on730

data from 100 RVEs. It is noted that the model was trained solely on randomly generated deformation731

gradients and, thus, the strain paths prescribed for these tests are unseen. The results of these tests for732

three RVEs sampled from the training dataset can bee seen in Fig. 17, while the results for three unseen733

microstructures sampled from the testing dataset can bee seen in Fig. 18.734

7.3.3 Veriﬁcation Test 3: Test of graph isomorphism responses735

In this test, we check whether the trained geometric learning models procure the same predictions for736

isomorphic graphs. The deﬁnition of graph isomorphism is provided in Appendix A. With the original737

graph structure of the input known, we can generate any number of isomorphic graphs by applying the738

same random permutation to the rows and the columns of the original normalized Laplacian matrix of739

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 25

Fig. 14: Without any additional input (other than the strain tensor), the neural network cannot differentiate

between these two polycrystals. The two anisotropic behaviors can be distinguished when the weighted

connectivity graph is also provided as input. Through the unsupervised encoding branch of the hybrid

architecture, each polycrystal is mapped on an encoded feature vector. The feature vector is fed to the

multilayer perceptron branch and procures a unique energy prediction.

Fig. 15: Estimated ψand S11 responses for 5 unseen RVEs homogenized at Euler angles (0◦, 0◦, 0◦)(top)

and at (45◦, 45◦, 45◦)(bottom) of crystal orientations in Bunge notation under uniaxial unconﬁned tensile

loading.

the graph. The same random permutation is applied to the rows of the feature matrix of the input as well.740

The permuted isomorphic graphs and the equivalent feature matrices carry the same information for the741

microstructure and the predictions of the hybrid architecture should be consistent under any permutation742

26 Nikolaos N. Vlassis et al.

Fig. 16: Estimated S12,S23 , and S13 responses for 5 unseen RVEs homogenized at Euler angles (0◦, 0◦, 0◦)

of crystal orientations in Bunge notation under pure shear loading for 3 directions.

of the graph input. To test this hypothesis, 10 iterations of permutations were performed on the graph743

Laplacian matrix and feature matrix of a microstructure to produce isomorphic graph representations of744

that graph microstructure. These isomorphic graph inputs were then used to make predictions against745

unseen FFT simulation data. The results of this experiment can be seen in Fig. 19 where the isomorphic746

graph inputs procure consistent responses.747

7.3.4 Veriﬁcation Test 4: Convexity check748

To check the convexity for the trained hybrid models, a numerical check was conducted on the trained749

hybrid architecture models. The models were tested for the check described in Eq. 28. The Cαand Cβwere750

chosen to be right Cauchy deformation tensors sampled from the training and testing sets of deformations.751

The input Gwas checked for all the 150 RVEs the hybrid architecture was trained and tested on. For every752

graph input, the approximated energy functional must be convex. Thus, to verify that for all the poly-753

crystal formations, the convexity check is repeated for every RVE in the dataset. It is noted that, while754

these checks describe a necessary condition for convexity, they do not describe a sufﬁcient condition and755

more robust methods of checking convexity will be considered in the future. For a speciﬁc polycrystal756

- graph input, the network has six independent variables - deformation tensor Ccomponents. To check757

the convexity, for every RVE in the dataset, deformation tensors Care sampled in a grid and are checked758

pairwise (approximately 265,000 combinations of points/checks per RVE) and are found to satisfy the759

inequality 28. In Figure 20, a sub-sample of 100 convexity checks for three RVEs is demonstrated.760

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 27

Fig. 17: Comparison of hybrid model predictions with FFT simulation data for 3 RVEs from the training

data set. The tests conducted are uniaxial unconﬁned tension (left and middle columns) and pure shear

(right column).

7.4 Numerical Experiment 4: Parametric studies on anisotropic responses of polycrystals in phase-ﬁeld761

fracture762

The anisotropic elastic responses predicted using the hybrid neural network trained by both non-763

Euclidean descriptors and FFT simulations performed on polycrystals are further examined in the phase-764

ﬁeld fracture simulations in which the stored energy functional generated from the hybrid learning model765

is degraded according to a driving force. In this series of parametric studies, the Kalthoff-Wikler experi-766

ment is numerically simulated via a phase-ﬁeld model in which the elasticity is predicted by the hybrid767

neural network (Kalthoff and Winkler,1988;Kalthoff,2000). We adopt the effective stress theory (Simo768

and Ju,1987) is valid such that the stored energy can be written in terms of the product of a degradation769

function and the stored elastic energy. The degradation function and the driving force are both pre-deﬁned770

in this study. The training of an incremental functional for the path-dependent constitutive responses will771

be considered in the second part of this series of work.772

In the ﬁrst numerical experiment, we conduct a parametric study by varying the orientation of the773

RVE to analyze how the elastic anisotropy predicted by the graph-dependent energy functional affects the774

nucleation and propagation of cracks. In the second numerical experiment, the hybrid neural network is775

given new microstructures. Forward predictions of the elasticity of the two new RVEs are made by the776

hybrid neural network without further calibration. We then compare the crack patterns for the two RVEs777

28 Nikolaos N. Vlassis et al.

Fig. 18: Comparison of hybrid model predictions with FFT simulation data for 3 RVEs from the testing data

set. The tests conducted are uniaxial unconﬁned tension (left and middle columns) and pure shear (right

column).

and compare the predictions made without the graph input to analyze the impact of the incorporation of778

non-Euclidean descriptors on the quality of the predictions of crack growths.779

While previous work, such as Kochmann et al. (2018), has utilized FFT simulations to generate incre-780

mental constitutive updates, the efﬁciency of the FFT-FEM model may highly depend on the complexity of781

the microstructures and the existence of a sharp gradient of material properties of the RVEs. In this work,782

the FFT simulations are not performed during the multiscale simulations. Instead, they are used as the783

training and validation data to generate a ML surrogate model following the treatment in Wang and Sun784

(2018) and Wang and Sun (2019b).785

For brevity, we omit the detailed description of the phase-ﬁeld model for brittle fracture. Interested786

readers please refer to, for instance, Bourdin et al. (2008) and Borden et al. (2012a). In this work, we adopt787

the viscous regularized version of phase-ﬁeld brittle fracture model in Miehe et al. (2010b) in which the788

degradation function and the critical energy release rate are pre-deﬁned. The equations solved are the789

balance of linear momentum and the rate-dependent phase-ﬁeld governing equation:790

∇X·P+B=ρ¨

U, (48)

gc

l0

(d−l2

0∇X·[∂∇dγ]) + η˙

d=2(1−d)H, (49)

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 29

Fig. 19: Hybrid model prediction for isomorphic graph inputs. The mean value and the range of the predic-

tions are compared to the FFT simulation benchmark results for an unconﬁned uniaxial tension test (top

row) and pure shear test (bottom row).

Fig. 20: Approximated energy functional convexity check results for three different polycrystals. Each point

represents a convexity check and must be above the [LHS −RHS =0]line so that the inequality 28 is

satisﬁed.

where γis the crack density function that represents the diffusive fracture, i.e.,791

γ(d,∇d) = 1

2ld2+l

2|∇d|2. (50)

The problem is solved following a standard staggered time discretization (Borden et al.,2012b) such792

that the balance of linear momentum and the phase-ﬁeld governing equations are updated sequentially.793

30 Nikolaos N. Vlassis et al.

In the above Eq. (48), Pis the ﬁrst Piola-Kirchhoff stress tensor, Bis the body force and ¨

Uis the second794

time derivative of the displacement U. In Eq. (49), following (Miehe et al.,2010a), drefers to the phase-795

ﬁeld variable, with d=0 signifying the undamaged and d=1 the fully damaged material. The variable796

l0refers to the length scale parameter used to approximate the sharp crack topology as a diffusive crack797

proﬁle, such that as l0→0 the sharp crack is recovered. The parameter gcis the critical energy release rate798

from the Grifﬁth crack theory. The parameter ηrefers to an artiﬁcial viscosity term used to regularize the799

crack propagation by giving it a viscous resistance. The term His the force driving the crack propagation800

and, in order to have an irreversible crack propagation in tension, it is deﬁned as the maximum tensile801

(”positive”) elastic energy that a material point has experienced up to the current time step tn, formulated802

as:803

H(Ftn,G) = max

tn≥tψ+(Ftn,G). (51)

The degradation of the energy due to fracture should take place only under tension and can be linked804

to that of the undamaged elastic solid as:805

ψ(F,d,G) = (g(d) + r)ψ+(F,G) + ψ−(F,G). (52)

The parameter rrefers to a residual energy remaining even in the full damaged material and it is set806

r≈0 for these experiments. For these numerical experiments, the degradation function that was used was807

the commonly used quadratic (Miehe, Hofacker, and Welschinger,2010a):808

g(d) = (1−d)2with g(0) = 1 and g(1) = 0. (53)

In order to perform a tensile-compressive split, the deformation gradient is split into a volumetric809

and an isochoric part. The energy and the stress response of the material should not be degraded under810

compression. The split of the deformation gradient, following (de Souza Neto et al.,2011), is performed as811

follows:812

F=FisoFvol =FvolFiso, (54)

where the volumetric component of Fis deﬁned as813

Fvol = (det F)1/3 I, (55)

and the volume-preserving isochoric component as814

Fiso = (det F)−1/3 F. (56)

The strain energy is, thus, split in a ”tensile” and ”compressive” part, such that:815

ψ+=ψ(F,G)J≥1

ψ(F,G)−ψ(Fvol,G)J<1, (57)

ψ−=0J≥1

ψ(Fvol,G)J<1. (58)

where J=det(F). In these examples, the energy values are calculated using the hybrid architecture816

neural network model, whose derivatives with respect to the strain input will be the stress. Since the817

model’s input is in terms of the right Cauchy-Green deformation tensor, the degraded stress is calculated818

as:819

P(F,d,G) = 2Fg(d)∂ˆ

ψ+(C,G)

∂C+∂ˆ

ψ−(C,G)

∂C. (59)

The experiment in question studies the crack propagation due to the high velocity impact of a projectile.820

The geometry and boundary conditions of the domain, as well as the conﬁguration of the pre-existing821

crack, is shown in Fig. 21. It is noted that, while only half of the domain of the problem is studied in this822

work, the rest of the domain would not necessarily demonstrate a symmetric response due to the material’s823

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 31

anisotropic behavior. In this preliminary study, it was deemed that the half domain would be adequate to824

illustrate and compare the different anisotropic responses of the model in question. (Kalthoff and Winkler,825

1988;Kalthoff,2000) have observed the crack to propagate at 70◦for an isotropic material, results that have826

previously been reproduced with numerical simulations in other studies (Belytschko et al.,2003;Song827

et al.,2008;Borden et al.,2012b). The experiment is conducted for two impact velocities (v0=16.5 m/sand828

v0=33.0 m/s) to test the crack branching phenomenon expected for higher impact velocities.829

Fig. 21: The geometry and boundary conditions of the domain for the dynamic shear loading experiment.

The velocity is prescribed progressively at the bottom left corner of the domain. The mesh is designed to

have a pre-existing crack of 50.0 mm.

The experiment lasts for 80 µsand the prescribed velocity is applied progressively following the scheme830

below for t0=1µs:831

v=t

t0v0t≤t0

v0t>t0.(60)

The domain is meshed uniformly with 20,000 triangular elements and the length scale is chosen to832

be l0=1.2 ×10−3m. While this mesh is rather coarse compared to previous studies of the same prob-833

lem, it was deemed adequate to simulate the problem at hand with acceptable accuracy to qualitatively834

demonstrate the anisotropic model behavior. The time-step used for the explicit method was chosen to be835

∆t=5×10−8sto provide stable results. Changing the time step of the explicit method did not appear to836

affect the phase-ﬁeld solution, as long as the explicit solver for the momentum equation was stable.837

For the ﬁrst numerical example, the experiment is initially conducted on an isotropic material with838

parameters commonly used in the literature (E=190 GPa, ν=0.3, l0=1.2 ×10−3m) to verify the for-839

mulation and compare with the anisotropic results. It can be seen that with the current formulation, the840

isotropic model can recover the approximately 70◦angle previously reported in the experiments and nu-841

merical simulations. Following that, the behavior of a single polycrystal was tested. In other words, the842

graph input of the hybrid architecture material model remained constant for all the simulations. The ma-843

terial model is a trained neural network of type MH1

reg with the graph input set constant. All the neural844

networks used in this section were trained on a dataset of 100 RVEs with 200 sample points each. The pur-845

pose of this experiment is to show that by rotating the highly anisotropic RVE, under the same boundary846

conditions, different wave propagation and crack nucleation patterns can be observed. This experiment847

could be paralleled to rotating a transversely isotropic material - different ﬁber orientations should pro-848

cure different results under identical boundary conditions. In Fig. 22, it is demonstrated that the neural849

network material model is indeed anisotropic, showing varying behaviors while rotating the RVE for 0◦,850

32 Nikolaos N. Vlassis et al.

isotropic (a) (b)

φ=0◦(c) (d)

φ=30◦(e) (f)

φ=60◦(g) (h)

Fig. 22: Crack patterns at 65 µsfor the dynamic shear loading experiment for the isotropic material and the

anisotropic material for a constant graph, rotated at various angles. The left column shows the experiments

for v=16.5 m/sand the right column for v=33.0 m/s.

30◦, and 60◦. The nature of the anisotropy becomes more apparent when the impact velocity is doubled851

and the crack branching is more prevalent.852

Dynamic simulations can be prone to numerical instabilities that may affect the predicted crack prop-853

agation patterns (Wei and Chen,2018). To ensure the crack propagation patterns demonstrated in this ex-854

periment are not prone to numerical instabilities depending on the mesh, the simulations were repeated on855

different meshes and different levels of mesh reﬁnement. The simulation shown in Fig. 22 (d) (v=33.0 m/s,856

φ=0◦) was repeated on a mesh with 11,450 quadrilateral elements, as well as a mesh with 80,000 trian-857

gular elements. For the quadrilateral elements and the reﬁned triangular meshes, the simulation time step858

were chosen to be ∆t=5×10−8sand ∆t=2.5 ×10−8srespectively. The simulation shown in Fig. 22859

(h) (v=33.0 m/s,φ=60◦) was also repeated on a mesh with 80,000 triangular elements to investigate860

whether the mesh would affect the crack propagation patterns of the RVEs under rotation. The comparison861

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 33

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 23: Crack patterns at 65 µs(left column) and 85 µs(middle column) (v=33.0 m/s) for the dynamic

shear loading experiment on three different meshes: 11,450 quadrilateral elements (top row), 20,000 trian-

gular elements (middle row), and 80,000 triangular elements (bottom row). The area around the branching

at 85 µsis zoomed in to demonstrate the mesh resolution (right column).

of the crack patterns is demonstrated in Fig. 23 and Fig. 24. Neither the selection of element shape nor the862

level of reﬁnement seem to be greatly affecting the propagated cracks. The main cracks for all the simula-863

tions at different levels of reﬁnement appear to be close to identical. There are secondary cracks in Fig. 23864

(h) as well as Fig. 23 (b) appear to be more deﬁned, which is expected due to the higher resolution of the865

mesh.866

34 Nikolaos N. Vlassis et al.

(a) (b)

Fig. 24: Crack patterns at 70 µs(v=33.0 m/s) for the dynamic shear loading experiment for the RVE rotated

at φ=60◦on two different meshes: 20,000 triangular elements (left), and 80,000 triangular elements (right).

no graph (a) (b) (c) (d)

RVE A (e) (f) (g) (h)

RVE B (i) (j) (k) (l)

Fig. 25: Crack patterns at 30 µs, 50 µs, 65 µs, 85 µsfor the dynamic shear loading experiment with an impact

velocity of v=33.0 m/sfor a model without a graph input (a, b, c, d) and two different polycrystals (e, f,

g, h and i, j, k, l). It is noted that all the parameters are identical for all the simulations but the graph input.

For the second numerical experiment, the material response was tested for different polycrystals (model867

type MH1

reg ) as well as for a model without any graph inputs (type MH1

ml p). The aim of this experiment was868

to verify that the hybrid architecture and the graph input can capture the anisotropy of the polycrystal869

material that is originating from the interactions between crystals, as expressed by the connectivity graph.870

The above experiment was repeated for different graph inputs and the results are demonstrated in Fig 25.871

In the absence of a graph input, while there is crack propagation, the results look noisy and the direction of872

the propagation is not similar to that of speciﬁc RVEs, something that could be potentially attributed to the873

model being trained on multiple polycrystal behaviors. For the model with the graph input, the difference874

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 35

in behaviors appears to become more apparent in the areas where branching is more prevalent, with the875

polycrystal affecting the crack branching phenomena. No additional anisotropy measures or crack branch-876

ing criteria were utilized for these simulations. The sole additional information in the input of the material877

model would be the weighted connectivity graph.878

8 Conclusion879

We introduce a machine learning method that incorporates geometric learning to extract low-dimensional880

descriptors from microstructures represented by weighted graphs and use these low-dimensional descrip-881

tors to enhance a supervised learning that predicts the stored elastic energy functional via Sobolev training.882

By utilizing non-Euclidean data structures, we introduce these weighted graphs as new descriptors for ge-883

ometric learning such that the hybrid deep learning can produce an energy functional that leverages the884

rich micro-structural information not describable by the classical Euclidean descriptors, such as porosity885

and density. To overcome the potential spurious oscillations of the learned functions due to lack of con-886

straints on their derivatives, we adopt the Sobolev training and the resultant hyperelastic energy functional887

is more accurate and smoother, compared to classical machine learning techniques. This work also laid the888

foundation for several new potential research directions. For instance, the energy functional approach can889

be extended to a variational consititutive update framework where discrete Lagrangian can be constructed890

incrementally to predict path-dependent behaviors. Furthermore, more sophisticated geometric learning891

methods that involve directed graphs (for representing hierarchical information), edge-weighted graphs892

(for representing attributes of grain contacts), and the evolution of the graphs (for path-dependent behav-893

iors) will provide us a fuller picture to examine the relationships between topology of micorstructures and894

the resultant macroscopic responses.895

9 Acknowledgments896

The authors would like to thank the two anonymous reviewers for their insightful feedback and their897

many suggestions that helped improve the quality of this paper. The authors are supported by the NSF898

CAREER grant from Mechanics of Materials and Structures program at National Science Foundation under899

grant contracts CMMI-1846875 and OAC-1940203, the Dynamic Materials and Interactions Program from900

the Air Force Ofﬁce of Scientiﬁc Research under grant contracts FA9550-17-1-0169 and FA9550-19-1-0318.901

These supports are gratefully acknowledged. The views and conclusions contained in this document are902

those of the authors, and should not be interpreted as representing the ofﬁcial policies, either expressed903

or implied, of the sponsors, including the Army Research Laboratory or the U.S. Government. The U.S.904

Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding905

any copyright notation herein.906

A Appendix: Graph theory terminologies and deﬁnitions907

In this section, a brief review of several terms of graph theory is provided to facilitate the illustration of908

the concepts in this current work. More elaborate descriptions can be found in Graham et al. (1989); West909

et al. (2001); Bang-Jensen and Gutin (2008):910

911

Deﬁnition 1 Agraph is a two-tuple G= (V,E)where V={v1, ..., vN}is a non-empty vertex set (also912

referred to as nodes) and E⊆V×Vis an edge set. To deﬁne a graph, there exists a relation that associates913

each edge with two vertices (not necessarily distinct). These two vertices are called the edge’s endpoints.914

The pair of endpoints can either be unordered or ordered.915

Deﬁnition 2 An undirected graph is a graph whose edge set E⊆V×Vconnects unordered pairs of916

vertices together.917

36 Nikolaos N. Vlassis et al.

(a) (b) (c) (d)

Fig. 26: Different types of graphs. (a) Undirected (simple) binary graph (b) Directed binary graph (c) Edge-

weighted undirected graph (d) Node-weighted undirected graph.

Deﬁnition 3 Aloop is an edge whose endpoint vertices are the same. When all the nodes in the graph are918

in a loop with themselves, the graph is referred to as allowing self-loops.919

Deﬁnition 4 Multiple edges are edges having the same pair of endpoint vertices.920

Deﬁnition 5 Asimple graph is a graph that does not have loops or multiple edges.921

Deﬁnition 6 Two vertices that are connected by an edge are referred to as adjacent or as neighbors.922

Deﬁnition 7 The term weighted graph traditionally refers to graph that consists of edges that associate923

with edge-weight function wij :E→Rnwith (i,j)∈Ethat maps all edges in Eonto a set of real924

numbers. nis the total number of edge weights and each set of edge weights can be represented by a925

matrix Wwith components wij .926

In this current work, unless otherwise stated, we will be referring to weighted graphs as graphs weighted927

at the vertices - each node carries information as a set of weights that quantify features of microstructures.928

All vertices are associated with a vertex-weight function fv:V→RDwith v∈Vthat maps all vertices929

in Vonto a set of real numbers, where Dis the number of weights - features. The node weights can be930

represented by a N×Dmatrix Xwith components xik , where the index i∈[1, ..., N]represents the node931

and the index k∈[1, ..., D]represents the type of node weight - feature.932

933

Deﬁnition 8 A graph whose edges are unweighted (we=1∀e∈E) can be called a binary graph.934

To facilitate the description of graph structures, several terms for representing graphs are introduced:935

936

Deﬁnition 9 The adjacency matrix Aof a graph Gis the N×Nmatrix whose entry αi j is the number of937

edges in Gwith endpoints {vi,vj}, as shown in Eq. 3.938

Deﬁnition 10 If the vertex vis an endpoint of edge e, then vand eare incident. The degree dof a vertex939

vis the number of incident edges. The degree matrix Dof a graph Gis the N×Ndiagonal matrix with940

diagonal entries diequal to the degree of vertex vi, as shown in Eq. 4.941

Deﬁnition 11 An isomorphism from a graph Gto another graph His a bijection mthat maps V(G)942

to V(H)and E(G)to E(H)such that each edge of Gwith endpoints uand vis mapped to an edge943

with endpoints m(u)and m(v). Applying the same permutation to both the rows and the columns of the944

adjacency matrix of graph Gresults to the adjacency matrix of an isomorphic graph H.945

Deﬁnition 12 The unnormalized Laplacian operator ∆is deﬁned such that:946

(∆f)i=∑

j:(i,j)∈E

wij (fi−fj)(61)

=fi∑

j:(i,j)∈E

wij −∑

j:(i,j)∈E

wij fj. (62)

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 37

947

By writing the equation above in matrix form, the unnormalized Laplacian matrix ∆of a graph Gis the948

N×Npositive semi-deﬁnite matrix deﬁned as ∆=D−W.949

In this current work, binary graphs will be used, thus, the equivalent expression is used for the unnor-950

malized Laplacian matrix L, deﬁned as L=D−Awith the entries lij calculated as:951

lij =

di,i=j

−1, i6=jand viis adjacent to vj

0, otherwise.

(63)

952

Deﬁnition 13 For binary graphs, the symmetric normalized Laplacian matrix Lsym of a graph Gis the953

N×Nmatrix deﬁned as:954

Lsym =D−1

2LD−1

2=I−D−1

2AD−1

2. (64)

The entries lsym

ij of the matrix Lsym are shown in Eq. 5.955

B Appendix: Sample problem of graph representation of polycrystal microstructures956

To demonstrate how graphs used to represent a polycrystalline assemble are generated, we introduce957

a simple example where an assembly consists of 5 crystals shown in Fig. 2(a) is converted into a node-958

weighted graph. Each node of the graph represents a crystal. An edge is deﬁned between two nodes if959

they are connected/share a surface. The graph is undirected meaning that there is no direction speciﬁed960

for the edges. The vertex set Vand edge set Efor this speciﬁc graph are V={v1,v2,v3,v4,v5}and961

E={e12,e23 ,e34,e35,e45}respectively.962

An undirected graph can be represented by an adjacency matrix A(cf. Def. 9) that holds information for963

the connectivity of the nodes. The entries of the adjacency matrix A, in this case, are binary - each entry of964

the matrix is 0 if an edge does not exist between two nodes and 1 if it does. Thus, for the example in Fig. 2,965

crystals 1 and 2 are connected so the entries (1, 2)and (2, 1)of the matrix Awould be 1, while crystals 1966

and 3 are not so the entries (1, 3)and (3, 1)will be 0 and so on. If the graph allows self-loops, then the967

entries in the diagonal of the matrix are equal to 1 and the adjacency matrix with self-loops is deﬁned as968

ˆ

A=A+I. The complete symmetric matrices Aand ˆ

Afor this example will be:969

A=

0 1 0 0 0

0 1 0 0

011

0 1

sym. 0

,ˆ

A=A+I=

1 1 0 0 0

1 1 0 0

111

1 1

sym. 1

. (65)

A diagonal degree matrix Dcan also be useful to describe a graph representation. The degree matrix970

Donly has diagonal terms that equal the number of neighbors of the node represented in that row. The971

diagonal terms can simply be calculated by summing all the entries in each row of the adjacency matrix.972

It is noted that, when self-loops are allowed, a node is a neighbor of itself, thus it must be added to the973

number of total neighbors for each node. The degree matrix Dfor the example graph in Fig. 2would be:974

D=

10000

02000

00300

00020

00002

. (66)

The polycrystal connectivity graph can be represented by its graph Laplacian matrix L- deﬁned as975

L=D−A, as well as the normalized symmetric graph Laplacian matrix Lsym =D−1

2LD−1

2. The two976

matrices for the example of Fig. 2are calculated below:977

38 Nikolaos N. Vlassis et al.

L=

1−1 0 0 0

2−1 0 0

3−1−1

2−1

sym. 2

,Lsym =

1−√2

2000

1−√6

60 0

1−√6

6−√6

6

1−1

2

sym. 1

. (67)

Assume that, for the example in Fig. 2, there is information available for two features Aand Bfor978

each crystal in the graph that will be used as node weights - this could be the volume of each crystal, the979

orientations, and so on. The node weights for each crystal represented by a vertex vican be described as a980

vector, fi= ( fA,fB), such that each component of the vector corresponds to a feature of the i-th node. The981

node features can all be represented in a feature matrix Xwhere each row corresponds to a node and each982

column corresponds to a feature. For the example in question, the feature matrix would be:983

X=

fA1fB1

fA2fB2

fA3fB3

fA4fB4

fA5fB5

. (68)

C Appendix: Database statistics984

This section describes the statistics for the generated microstructures that were used for training the985

geometric learning based neural network model. The database consists of 150 polycrystal RVEs generated986

as described in Section 6.3. For every polycrystal in the database, the grain connectivity information is987

available in the form of adjacency matrices. For every crystal in a polycrystal RVE, there is available infor-988

mation on the crystal features that will be used as weights for the undirected graph input. The available989

node features in the data set contain information on the volume, the three Euler angles (in Bunge nota-990

tion), the equivalent diameter (the diameter of the sphere with the equivalent volume as the crystal), the991

number of faces, the total area of the faces, the number of neighbors, and the centroid position vector of992

each crystal. More elaborate descriptions of the crystal features can be found in the documentation of the993

open-source software NEPER (Quey et al.,2011). The distribution of the polycrystal features, separated994

into 100 training and 50 testing cases, is demonstrated in Fig. 27.995

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 39

Fig. 27: Feature distributions for the 150 polycrystals in the database, separated into 100 polycrystals used

for training and 50 polycrystals used for testing.

40 Nikolaos N. Vlassis et al.

D Appendix: Encoded feature vector dimension996

The hyperparameter space of a complex architecture is rather large to conduct a comprehensive hyper-997

parameter search and provide a conﬁdent explanation of how the dimensions of the hybrid architecture998

affect the model’s performance. The number of neurons of the encoded feature vector layer was one of999

these tuned hyperparameters. To provide an insight into how the encoded feature vector layer dimension1000

choice was made, we are providing the results from three training experiments that we conducted while1001

testing various architectures for the neural network through trial and error. We noticed that, in iterations1002

of the network with lower dimensions than 9 neurons, the predictions were less accurate. For feature di-1003

mensions much higher than 9 neurons, the performance seemed to not drastically improve and, thus, they1004

were not chosen to reduce the training time of the network. In the Fig. 28, we are showing the performance1005

results of training the hybrid neural network architecture with an encoded feature vector dimension of 1,1006

9, and 32 neurons.1007

Fig. 28: Comparison of hybrid neural network performance for architectures with encoded feature vector

dimensions of 1, 9, and 32 neurons.

References1008

Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. Low data drug discovery with1009

one-shot learning. ACS central science, 3(4):283–293, 2017.1010

L Anand and M Kothari. A computational procedure for rate-independent crystal plasticity. Journal of the1011

Mechanics and Physics of Solids, 44(4):525–558, 1996.1012

Gerard A Ateshian and Kevin D Costa. A frame-invariant formulation of fung elasticity. Journal of biome-1013

chanics, 42(6):781–785, 2009.1014

F. Bachmann, Ralf Hielscher, and Helmut Schaeben. Texture Analysis with MTEX – Free and Open Source1015

Software Toolbox. 2010. doi: 10.4028/www.scientiﬁc.net/SSP.160.63.1016

Jørgen Bang-Jensen and Gregory Z Gutin. Digraphs: theory, algorithms and applications. Springer Science &1017

Business Media, 2008.1018

Ted Belytschko, Hao Chen, Jingxiao Xu, and Goangseup Zi. Dynamic crack propagation based on loss1019

of hyperbolicity and a new discontinuous enrichment. International Journal for Numerical Methods in1020

Engineering, 58(12):1873–1905, 2003. ISSN 1097-0207. doi: 10.1002/nme.941.1021

Yoshua Bengio and Yves Grandvalet. No unbiased estimator of the variance of k-fold cross-validation.1022

Journal of machine learning research, 5(Sep):1089–1105, 2004.1023

Geometric deep learning for computational mechanics Part I: Anisotropic Hyperelasticity 41

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspec-1024

tives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.1025

MA Bessa, R Bostanabad, Z Liu, A Hu, Daniel W Apley, C Brinson, Wei Chen, and Wing Kam Liu. A frame-1026

work for data-driven analysis of materials under uncertainty: Countering the curse of dimensionality.1027

Computer Methods in Applied Mechanics and Engineering, 320:633–667, 2017.1028

Michael J Borden, Clemens V Verhoosel, Michael A Scott, Thomas JR Hughes, and Chad M Landis. A1029

phase-ﬁeld description of dynamic brittle fracture. Computer Methods in Applied Mechanics and Engineer-1030

ing, 217:77–95, 2012a.1031

M.J. Borden, C.V. Verhoosel, M.A. Scott, T.J.R. Hughes, and C.M. Landis. A phase-ﬁeld description of1032

dynamic brittle fracture. Computer Methods in Applied Mechanics and Engineering, 217-220:77–95, 2012b.1033

ISSN 00457825. doi: 10.1016/j.cma.2012.01.008.1034

Ronaldo I Borja. Plasticity. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. ISBN 978-3-642-38546-9.1035

doi: 10.1007/978-3-642-38547-6.1036

Ronaldo I Borja and Seung R Lee. Cam-clay plasticity, part 1: implicit integration of elasto-plastic constitu-1037

tive relations. Computer Methods in Applied Mechanics and Engineering, 78(1):49–72, 1990.1038

Ronaldo I Borja, Claudio Tamagnini, and Angelo Amorosi. Coupling plasticity and energy-conserving1039

elasticity models for clays. Journal of geotechnical and geoenvironmental engineering, 123(10):948–957, 1997.1040

Ronaldo I Borja, Chao-Hua Lin, and Francisco J Mont´

ans. Cam-clay plasticity, part iv: Implicit integration1041

of anisotropic bounding surface model with nonlinear hyperelasticity and ellipsoidal loading function.1042

Computer methods in applied mechanics and engineering, 190(26-27):3293–3323, 2001.1043

Blaise Bourdin, Gilles A Francfort, and Jean-Jacques Marigo. The variational approach to fracture. Journal1044

of elasticity, 91(1-3):5–148, 2008.1045

Franc¸ois Chollet et al. Keras. https://keras.io, 2015.1046

Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev1047

training for neural networks. In Advances in Neural Information Processing Systems, pages 4278–4287, 2017.1048

Eduardo A de Souza Neto, Djordje Peric, and David RJ Owen. Computational methods for plasticity: theory1049

and applications. John Wiley & Sons, 2011.1050

Micha¨

el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional Neural Networks on Graphs