ArticlePDF Available

Predicting the effective mechanical property of heterogeneous materials by image based modeling and deep learning

Authors:

Abstract and Figures

In contrast to the composition uniformity of homogeneous materials, heterogeneous materials are normally composed of two or more distinctive constituents. It is usually recognized that the effective material property of a heterogeneous material is related to the mechanical property and the distribution pattern of each forming constituent. However, to establish an explicit relationship between the macroscale mechanical property and the microstructure appears to be complicated. On the other hand, machine learning methods are broadly employed to excavate inherent rules and correlations based on a significant amount of data samples. Specifically, deep neural networks are established to deal with situations where input–output mappings are extensively complex. In this paper, a method is proposed to establish the implicit mapping between the effective mechanical property and the mesoscale structure of heterogeneous materials. Shale is employed in this paper as an example to illustrate the method. At the mesoscale, a shale sample is a complex heterogeneous composite that consists of multiple mineral constituents. The mechanical properties of each mineral constituent vary significantly, and mineral constituents are distributed in an utterly random manner within shale samples. Large quantities of shale samples are generated based on mesoscale scanning electron microscopy images using a stochastic reconstruction algorithm. Image processing techniques are employed to transform the shale sample images to finite element models. Finite element analysis is utilized to evaluate the effective mechanical properties of the shale samples. A convolutional neural network is trained based on the images of stochastic shale samples and their effective moduli. The trained network is validated to be able to predict the effective moduli of real shale samples accurately and efficiently. Not limited to shale, the proposed method can be further extended to predict effective mechanical properties of other heterogeneous materials.
Content may be subject to copyright.
Available online at www.sciencedirect.com
ScienceDirect
Comput. Methods Appl. Mech. Engrg. 347 (2019) 735–753
www.elsevier.com/locate/cma
Predicting the effective mechanical property of heterogeneous
materials by image based modeling and deep learning
Xiang Lia, Zhanli Liua,, Shaoqing Cuib, Chengcheng Luoa, Chenfeng Lib, Zhuo Zhuanga
aApplied Mechanics Lab., Department of Engineering Mechanics, School of Aerospace, Tsinghua University, Beijing 100084, China
bZienkiewicz Centre for Computational Engineering, Swansea University, Swansea, United Kingdom
Received 27 September 2018; received in revised form 22 December 2018; accepted 2 January 2019
Available online 14 January 2019
Highlights
Predict the mechanical properties of heterogeneous materials by deep learning.
Generate numerous training samples based on stochastic reconstruction.
Transform sample images to finite element models by image processing.
An artificial neural network to predict mechanical properties by material structure.
The novel method is accurate and efficient in predicting mechanical properties.
Abstract
In contrast to the composition uniformity of homogeneous materials, heterogeneous materials are normally composed of
two or more distinctive constituents. It is usually recognized that the effective material property of a heterogeneous material
is related to the mechanical property and the distribution pattern of each forming constituent. However, to establish an explicit
relationship between the macroscale mechanical property and the microstructure appears to be complicated. On the other hand,
machine learning methods are broadly employed to excavate inherent rules and correlations based on a significant amount of data
samples. Specifically, deep neural networks are established to deal with situations where input–output mappings are extensively
complex. In this paper, a method is proposed to establish the implicit mapping between the effective mechanical property and the
mesoscale structure of heterogeneous materials. Shale is employed in this paper as an example to illustrate the method. At the
mesoscale, a shale sample is a complex heterogeneous composite that consists of multiple mineral constituents. The mechanical
properties of each mineral constituent vary significantly, and mineral constituents are distributed in an utterly random manner
within shale samples. Large quantities of shale samples are generated based on mesoscale scanning electron microscopy images
using a stochastic reconstruction algorithm. Image processing techniques are employed to transform the shale sample images to
finite element models. Finite element analysis is utilized to evaluate the effective mechanical properties of the shale samples. A
convolutional neural network is trained based on the images of stochastic shale samples and their effective moduli. The trained
network is validated to be able to predict the effective moduli of real shale samples accurately and efficiently. Not limited to shale,
the proposed method can be further extended to predict effective mechanical properties of other heterogeneous materials.
c
2019 Elsevier B.V. All rights reserved.
Keywords: Heterogeneous materials; Shale; Deep learning; Stochastic reconstruction
Corresponding author.
E-mail address: liuzhanli@tsinghua.edu.cn (Z.L. Liu).
https://doi.org/10.1016/j.cma.2019.01.005
0045-7825/ c
2019 Elsevier B.V. All rights reserved.
736 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
1. Introduction
The complex microstructure and various forming constituents of heterogeneous materials have long posed
difficulties to the study of their effective mechanical properties. Researchers have studied the effective mechanical
properties, such as elastic modulus and thermal conductivity, based on analytical approaches [15]. These approaches
are generally developed based on simplified models and statistical data. In this paper, a new method is proposed
to predict the effective mechanical properties of heterogeneous materials. Unlike the studies mentioned above,
the proposed method takes advantage of computational mechanics and deep learning methods. A representative
heterogeneous material, shale, is employed in this paper for illustration.
Shale is a multi-phase, multi-scale fine grained sedimentary rock. Shale makes up around 75% of sedimentary
basins of the earth and is critical to petrol and natural gas exploitation [6]. Shale gas is normally absorbed onto
organic kerogen of shale rock. The recent development of horizontal drilling and hydraulic fracturing techniques
has made the large production of shale gas possible. These techniques are closely related to the shale’s macroscopic
mechanical properties such as effective modulus, hardness, and strength. However, experimental researches reveal that
the macroscopic mechanical properties of different shale samples vary drastically [7]. For this reason, the research
on mechanical properties of shales is sometimes conducted from a mesoscale point of view [8].
At the mesoscale, shale is considered as a type of complex heterogeneous composites that consists of mul-
tiple mineral constituents [9]. These mineral constituents include quartz, illite, feldspar, calcite, kaolinite, pyrite,
dolomite, kerogen, etc [10,11]. The modulus of each mineral constituent is proved to be highly deviated based on
nanoindentation experiments [8]. As an example, the modulus of quartz and feldspar are relatively high, and the
modulus of kaolinite and illite are considerably lower in comparison. To understand the correlation between shale
mesoscale structure and the macroscopic mechanical is vital to provide insights into the engineering design of shale
gas extraction. Because of the complex mesoscale structure of shale, to reveal this implicit correlation by means
of analytical analysis seems impractical and arduous. However, with the advancement of computer technology, this
problem becomes much more accessible with the help of computational mechanics approaches.
The development of machine learning has been highly motivated with the advancement of computer science. Some
of the early machine learning methods include the perceptron [12], genetic programming [13], and the Monte Carlo
method [14]. After the 1990s, the proper orthogonal decomposition [15], adaptive boost [16], support vector machine
[17,18], particle swarm optimization methods [19] have been established and applied to various engineering fields.
Besides the previously mentioned machine learning methods, the artificial neural network (ANN) has become a
significant branch of machine learning. In 1943, McCulloch and Pitts [20] proposed the mathematical framework of
the artificial neuron inspired by the characteristics of nervous activity. Rosenblatt [12] established the perceptron in
1958, which is generally recognized as the predecessor of the modern artificial neural network. The prototypes of the
artificial neural network were the models named ADALINE” and “MADALINE” created by Stanford University.
The limitation of the early network models is the difficulty to solve nonlinear problems due to their simple linear
architectures. Ivakhnenko [21] made the earliest efforts to develop deep learning models. Rumelhart et al. [22] and
other researchers came up with the back propagation (BP) algorithm, which later becomes the backbone of deep
neural networks. In 2006, Hinton and Salakhutdinov [23] proposed the framework for deep learning based on the
concept of Deep Belief Networks. Due to the nonlinear activation function and hidden neurons, deep artificial neural
networks are able to extract implicit and complex data mappings based on numerous training data. For this reason,
deep artificial neural networks have been adopted in various applications of mechanics and engineering.
Ghaboussi et al. [24], Jung and Ghaboussi [25], Ji et al. [26], Furukawa and Yagawa [27], Hashash et al. [28],
Sun et al. [29] implemented artificial neural networks to study the constitutive models of solid materials. Faller and
Schreck [30], Wang and Liao [31], Yuhong and Wenxin [32], Butz and Von Stryk [33], Beigzadeh and Rahimi [34],
Mi et al. [35] study the fluid characterizations based on numerical approaches assisted by artificial neural networks.
In recent years, researchers have employed machine learning models in the study of heterogeneous materials.
Sundararaghavan and Zabaras [36] develop a framework to classify and reconstruct 3-D heterogeneous material
based on support vector machines. Liu et al. [37] proposed approaches based on machine learning to predict elastic
strain fields in a 3-D microstructure volume element of heterogeneous composite materials. Kondo et al. [38] employ
convolutional neural networks to establish the mapping between the microstructure and the ionic conductivity of
ceramic. The networks are trained by supervised learning based on cropped microscope scanning images. The data
labels are the macroscopic ionic conductivities measured by impedance spectroscopy. Cang et al. [39] propose a
method mainly to generate stochastic microstructure samples based on variational auto-encoder. A predictive model
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 737
Fig. 1. The workflow of establishing a deep neural network for calculating the shale modulus.
based on convolutional neural networks is also proposed to reveal the data mapping between microstructure and
effective properties. The network is trained by microscale two-phase sandstone samples. The labels of the samples
are effective material properties calculated based on analytical approximations. Bessa et al. [40] propose a data-
driven computational framework to design structures and materials. Sample data that represent microstructures,
material properties, and boundary conditions are extracted; a database of material responses is established based on
computational analyses; augmented by machine learning algorithms, the mapping between descriptors of sample data
and the concerned material properties is constructed, and new designs or response models can be further obtained.
In this paper, a framework for predicting the effective material properties of multi-phase heterogeneous materials is
proposed. For a demonstration of this framework, a convolutional neural network is established to exploit the implicit
mappings between the mesoscale structures and the effective moduli of shale samples. Scanning electron microscope
(SEM) is employed to obtain mesoscale structure images of shale. A simplified model is introduced to transform SEM
images to 5-phase heterogeneous shale samples. Large quantities of shale samples are generated based on the 5-phase
samples using a stochastic reconstruction algorithm. Finite element method is utilized to calculate the stochastic shale
samples’ effective moduli, which are further used as labels of training samples. A deep convolutional neural network
is trained based on the images of stochastic shale samples and their effective moduli. The trained network is further
employed to predict the effective moduli of real shale samples. The workflow of this process is depicted in Fig. 1.
Each portion presented in Fig. 1 will be illustrated in details in the following sections.
This paper is organized in the following scheme. The mesoscale structure of shale and the mechanical properties
of its forming constituents are illustrated in Section 2. The stochastic reconstruction method and the finite element
analysis to calculate the effective moduli of shale samples are discussed in Section 3. In Section 4, the principal theory
of deep neural networks is demonstrated. The artificial neural network architecture used for modulus prediction is
elaborated. The prediction accuracy of the deep neural network is discussed, and several conclusions are drawn based
on the prediction result.
2. Shale mesoscale structure and mechanical properties of constituents
As mentioned in the previous section, the objective of this research is to develop a new approach to predict the
moduli of mesoscale shale samples. In this section, we first discuss the mineral constituents and the structures of
738 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 2. The macroscopic presentation of shale. (a) A laboratory shale sample. (b) The structure of a typical shale formation laminate.
mesoscale shale. Then, a simplified mesoscale shale model that includes five main constituents is explained. The
nanoindentation test, which is utilized to measure the modulus of each main constituent, is introduced in the final
section.
2.1. Shale mesoscale structure and a simplified mesoscale model
Shale is a most commonly found sedimentary rock that accounts for about 50% of all sedimentary rocks on
earth [41,42]. From the macroscale point of view, a shale sample normally appears to be in dark gray color. Fig. 2(a)
shows a cylindrical shale sample for laboratory research. The length of the sample is about 5 cm, and the radius is
about 1.2 cm. Fig. 2(b) shows the typical laminated structure of shale formation. The laminated pattern is formed due
to the sedimentation process.
From a mesoscale point of view, a shale sample is usually considered as a complex heterogeneous material that
consists of multiple mineral constituents. It is normally comprised of quartz, calcite, smectite, pyrite, clay, organic
matter and other minerals [7]. Energy dispersive X-ray spectroscopy (EDX), focused ion beam milling (FIB) and
scanning electron microscope (SEM) are often utilized to identify the mineral constituents and characterize the
heterogeneity of shale samples [4347].
Studies reveal that the deformation and damage characteristics of shale are related to the mechanical properties
and the distributions of its forming constituents. Therefore, it is important to study the distributions and small-scale
mechanical properties of the forming constituents to understand the upscaling mechanical properties of macroscopic
shale samples [4850].
In this research, scanning electron microscope (SEM) is employed to investigate the distribution of each mineral
constituent of a shale sample. The sample is scanned for 13.5 h to generate an SEM image. The scanning voltage is
15 kV. The scanning resolution is 1 µm.
The SEM image is shown in Fig. 3(a). It can be observed that the forming constituents of the sample include quartz,
feldspar, pyrite, calcite, dolomite, kaolinite, illite, kerogen, etc. The distribution manner of the constituents appears to
be random, which brings in strong heterogeneity and anisotropy to the shale samples.
In engineering and geomechanics applications, mineral constituents with similar material properties are usually
grouped for simplicity. A widely accepted shale model classifies various mineral constituents into four categories
[5153]. The first category is abbreviated as QFP. It contains quartz, feldspar, and pyrite, and they are the most
commonly found silicate minerals in shale [54]. The modulus and hardness of QFP are generally highest among all
constituents. The second category is clay, which contains kaolinite, illite, chlorite, and montmorillonite. The third
category is the organic matter, which is also known as kerogen. The modulus and hardness of kerogen are the lowest.
Shale gas is normally absorbed on kerogen, and the proportion of kerogen is relatively low. The final category contains
all other mineral constituents that are not mentioned above, and it is sometimes referred to as the matrix phase.
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 739
Fig. 3. (a) The SEM image of a mesoscale shale sample. (b) The corresponding simplified 5-phase model.
Table 1
The forming constituents of each phase of the mesoscale shale model.
Phase Forming constituents
Silicate Quartz, feldspar, pyrite
Carbonate Calcite, dolomite
Clay Kaolinite, illite, chlorite, montmorillonite
Kerogen Organic matter
Others Other mineral constituents
In this research, we take the characteristics of the shale formation in southwest China into account. An additional
phase, carbonate, is introduced to the aforementioned mesoscale shale model. The carbonate phase contains calcite
and dolomite. Hence, a 5-phase mesoscale shale model is employed in this paper. The 5 phases are silicate, carbonate,
clay, kerogen, and matrix, respectively. An SEM image of a shale sample and the corresponding simplified 5-phase
model is shown in Fig. 3(b). The forming constituents of each phase are listed in Table 1.
2.2. Nanoindentation and modulus measurement
As previously mentioned, the deformation and damage characteristics of shale are related to the mechanical
properties of its forming constituents. Therefore, to study the mechanical properties of each primary constituent of
shale is important to understand the macroscopic mechanical properties of shale. Instrumented nanoindentations are
usually employed to investigate shale’s basic mechanical properties, such as modulus and hardness [5557].
In this research, a series of nanoindentation tests are conducted on different locations of shale samples. The probe
indents shale samples at locations where quartz, calcite, clay and organic matter are concentrated, respectively. The
indentation depths range from 0.5 to 5 µm.
The measured modulus of each constituent is given in Fig. 4. It can be observed that the measured moduli of all
these constituents converge towards an intermediate level as the indentation depth increases. It is mainly because
the surrounding constituents and the supporting matrix tend to disturb the measurement as the indentation depth
increases [8]. For this reason, we select the measured modulus when the indentation is initially applied as the effective
modulus of each constituent. For simplicity, we use the modulus of quartz and calcite to represent the modulus of
silicate and carbonate phases, respectively. The moduli of silicate, carbonate, clay, organic matter and matrix are
summarized in Table 2. It can be concluded that silicate is the stiffest constituent, and the organic matter is the softest.
It is in agreement with the results from other researchers’ work [8,56].
3. Stochastic reconstruction method and finite element analysis
The objective of this research is to establish an artificial neural network to predict the effective moduli of shale
samples. A large number of shale samples are required to train the network. A stochastic reconstruction algorithm is
740 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 4. The moduli of shale constituents measured based on nanoindentation tests.
Table 2
The measured moduli of the primary constituents in shale.
Silicate Carbonate Clay Kerogen Matrix
Modulus (GPa) 89.6 65.8 22.3 9.2 12.392
employed to generate these shale samples. The stochastic sample images are then transferred to finite element models.
The effective moduli of shale samples are evaluated based on finite element analysis.
3.1. Stochastic reconstruction method
Due to time-consuming issues, it is not practical to obtain a large number of shale samples by SEM scanning.
Hence, the stochastic reconstruction technique is employed to generate stochastic shale samples in this study. The
basic idea of the stochastic reconstruction technique is to rapidly reproduce the reference sample based on the
statistical information of morphology. In this way, the effective statistical characteristics of the stochastic samples
and the reference sample are matched.
Various statistical reconstruction methods have been proposed with different performances and applicabilities. The
stochastic reconstruction technique adopted in this study is the stochastic optimization reconstruction algorithm [52].
E=
i
wi
xDi
0(x)Di
s(x)2(1)
In the equation, Di
0(x) is the ith statistical descriptor measured from the reference sample. Di
s(x) is the same
descriptor measured from the stochastic sample. wiis the weight parameter of the ith descriptor. Various descriptors
have been developed to capture different microscale morphological characteristics. Some of the descriptors are two-
point correlation function [53], two-point cluster correlation function [58], lineal-path function [59,60], etc. In this
study, the two-point correlation function and lineal-path function are considered as the target descriptors to capture
the basic statistical information from the reference samples.
By optimization, the stochastic sample evolves in a way that its statistical characteristics gradually approach
those of the reference sample. The simulated annealing algorithm is adopted in this study to optimize the stochastic
samples. The simulated annealing algorithm was introduced into the stochastic reconstruction technique by Yeong
and Torquato [52]. Firstly, a random guess is conducted to generate an initial stochastic sample. The initial sample
is then iteratively evolved based on a spin-exchange strategy. In other words, two image pixels that represent two
different material phases are swapped in each iterative step. After that, the new sample is accepted with a probability
of min {exp (E/T),1}. In the formulation, Eis the difference of the value of objective function Ebetween the
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 741
Fig. 5. The workflow to generate the finite element model based on a mesoscale shale sample image.
old and the new sample. Trepresents the temperature of the simulated annealing process. By gradually decreasing
the temperature Tin the prescribed annealing algorithm, the evolvement iterations are repeated until the termination
criterion is met. Based on the proposed stochastic reconstruction method, a large number of stochastic shale samples
are generated.
3.2. Creating finite element models based on mesoscale shale images
The method to generate stochastic shale samples is illustrated in the previous section. Numerous stochastic
shale samples are generated based on the proposed method. These stochastic samples are further transformed to
corresponding finite element models to calculate the effective moduli. In this section, we briefly discuss the procedure
to generate a finite element model based on a stochastic shale sample.
In the research, an application is developed to generate finite element models based on stochastic shale samples. The
workflow to generate the finite element model is shown in Fig. 5. As is known, a stochastic shale sample is composed
of pixels. Different color on a sample image represents different mineral constituent. The application firstly generates
a finite element model that has the same amount of elements as that of pixels on the shale image. Quadrilateral plane
strain elements are employed in this study. Then, the application scans the shale sample image pixel by pixel, and it
keeps track of the constituent type represented by each pixel. Each pixel is transformed into an element in the finite
element model. The location of each pixel of the shale image is utilized to generate the coordinates of each node.
Besides, five different node and element sets are generated and updated based on the color of the pixels. These sets are
referred to as nodes and elements of silicate, carbonate, clay, kerogen, and matrix phase, respectively. Each element
set is assigned with a distinctive material property that corresponds to the constituent type that it represents. Finally,
initial and boundary conditions are prescribed, and the finite element model of a shale sample is generated.
Finite element method is then employed to calculate the effective modulus of the sample. The scheme of the
compression test is depicted in Fig. 6(a). The left boundary is fixed in the horizontal direction. A compressional
displacement loading is applied along the horizontal direction to the right boundary. As is known, constituents are
distributed in a complex manner in a mesoscale shale model. During the loading process, the stress at each constituent
is different because of the heterogeneous material property distribution. The contour of the stress component S11 is
shown in Fig. 6(b).
After finite element calculation, the total reaction force on the right boundary is summarized. The reaction force is
then divided by the area of the right boundary to obtain the effective modulus. The moduli are further utilized as the
labels of training samples of an artificial neural network. It is worth mentioning that it takes about 20 s to conduct the
finite element analysis for each sample. As will be discussed in Section 4, 12,500 shale samples are used in this study.
742 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 6. (a) The computational model of the finite element method. (b) The stress component S11 contour on the model.
The finite element analysis of all these samples is distributed to 3 desktop computers in parallel. It takes about 23 h to
calculate the modulus of all the training and testing samples.
4. Training and testing of an artificial neural network
In this section, we briefly talk about the basic theory of the artificial neural network. Then, the characteristics of
a convolutional neural network are discussed. The training and prediction processes in this research are explained in
the final section.
4.1. The basic theory of artificial neural network
The fundamental concept of the artificial neural network is inspired by neuroscience. The typical architecture of a
multi-layer artificial neural network is shown in Fig. 7. The first layer of the network is the input layer, and the last
layer is the output layer. The layers between the input and output layers are hidden layers [61,62].
Each circle in the figure represents a neuron. The value of each neuron is known as the activation, which is
normally represented by a real number σthat ranges between 0 and 1. The superscript lof neuron σl
jrepresents
the layer number; the subscript jrepresents the neuron number. In a standard fully-connected network, the lines that
interconnect between neurons are weights w.wl
jk represents the weight that connects the kth neuron in the (l1)th
layer and the jth neuron in the lth layer.
Fig. 8 shows the process to calculate the activation of neuron σl
j. To obtain the value of neuron σl
j, a function zis
first defined as a linear combination of the weights and biases, as given in Eq. (2) [63].
zl
j=
nl1
i=1wl
ji ·σl1
i+bl
j(2)
As previously mentioned, a disadvantage of the early network model is the difficulty to extract complex mappings
due to their simple linear architectures. In contrast, nonlinearity is introduced to overcome this disadvantage. A
nonlinear function fis applied over function zto calculate the activation of that specific neuron. fis also known
as the activation function. Some broadly chosen activation functions include the sigmoid function and the rectified
linear unit (ReLU) function [64]. The equation of sigmoid function is given by
f(z)=1
1+ez(3)
The equation of rectified linear unit function is given by the maximum of 0 and z as
f(z)=max(0,z) (4)
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 743
Fig. 7. The architecture of an artificial neural network.
Fig. 8. The scheme to calculate the activation value of a neuron.
Hence, the activation of the jth neuron in layer lis calculated in Eq. (5).
σl
j=fzl
j=fnl1
i=1wl
ji ·σl1
i+bl
j(5)
Based on the mapping scheme given in Eq. (5), a neural network is able to provide the estimated output from the
input. Since the initial values of all the weights and biases are chosen randomly, the output value is initially different
from the desired output. The difference is usually referred to as the error or cost of a neural network. The desired
output is also known as the label of training data. As an example, a broadly adopted cost function is the mean squared
744 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
error function [61,63].
C=1
2n
n
i=1
oili2(6)
In the equation, nis the number of training samples. oiis the output value of the ith training sample, while liis the
label of the ith training sample. In this case, the notation ∥∥ is the L2 norm that measures the distance between vector
oiand li. As the labels of samples are fixed, the cost Cin Eq. (6) is a function of outputs o. The objective of training
an artificial neural network is to determine all the network parameters, in other words, the values of all the weights w
and biases b. The parameters are determined in a way so that the cost Cshould be minimized.
After the cost function is given, the gradient descent algorithm is implemented to approximate the minimum of the
cost function C. For simplicity, Cis assumed to be a function of tensor v, which represents all the parameters of the
artificial neural network. The variation of Ccan be approximated by the slight variation of vas [65,66]
C C·v(7)
In the equation above, Cis the gradient vector of the cost function C.vis the slight variation of v. It is chosen
as the format in Eq. (8) so that the cost function Cis descended and approximated to 0 step by step.
v= ηC(8)
ηin Eq. (8) is known as the learning rate. The value of ηshould be cautiously selected so that the approximation
in Eq. (7) holds. The values of all the parameters of the network can be iteratively updated by
vupda te d =vηC(9)
Therefore, each component of the weights and biases of the network is updated by Eq. (10).
wupda te d
i=wiηC
wi
bupda te d
j=bjηC
bj
(10)
4.2. Convolution neural network
The principal theory of artificial neural networks is discussed in the previous section. The specific artificial neural
network used in this research is a convolutional neural network. The architecture of the convolutional neural network
was first introduced by Fukushima and Miyake [67]. The convolutional neural network shares a lot of similarities
with the artificial neural network mentioned in the previous section. However, convolutional neural networks employ
some unique features to specialize in image classification applications [6872]. In this section, we will talk about
some of the features and further introduce the network architecture used in this research.
One feature of the convolutional neural network is called the local receptive field. As is discussed in the previous
sections, in an artificial neural network, the neurons from two adjacent layers are normally fully connected. It means
that any neurons from two adjacent layers are connected to each other, as shown in Fig. 9(a). In contrast, it will be
helpful to visualize that neurons are placed in a square pattern in a convolutional neural network, as shown in Fig. 9(b).
In the figure, only small and localized regions of neurons are connected to a neuron in the next layer [65].
Another feature is known as the pooling layer. In convolutional neural networks, pooling layers are normally
applied after convolutional layers. Maximum pooling and average pooling are widely used to simplify the information
of the output neurons from a convolutional layer [65].
With the features mentioned above, the architecture of convolutional neural networks takes into account the spatial
structure of images [65]. In recent years, large convolutional networks demonstrate outstanding performance in
image classifications [71]. To understand the underlying mechanism, an approach named deconvolutional network
is implemented [69,73,74]. Using this technique, image patterns that stimulate high activations in a given feature
map are reconstructed. In this way, researchers are able to study what information is visualized and learned by
various convolutional layers. A convolutional neural network for facial recognition is given here for illustration.
The convolutional neural network in Fig. 10 contains several hidden layers. Researchers found that neurons in the
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 745
Fig. 9. (a) The fully-connected architecture of an artificial neural network. (b) Local receptive field of a convolutional neural network.
Fig. 10. A convolutional neural network for a facial recognition application.
lower-level convolutional layers are able to “witness” very detailed facial features, such as an edge or a dot on the
face. The intermedium layers can “visualize” relatively larger features like an eye, a nose or a mouth. The high-level
layers are able to “see” the overall facial features.
Inspired by the mechanism mentioned above, we hope to establish a multilayer convolutional neural network to
predict the modulus of a shale sample, as given in Fig. 11. The lower-level layers are supposed to extract the features
of tiny pieces of the shale sample. The intermedium layers are able to extract the features of medium-size shale pieces.
The high-level layers are able to grasp the features of the overall shale samples. Several fully-connected layers are
added after the last convolutional layer to obtain the effective modulus. In this way, the network establishes the implicit
mapping between the mesoscale structure of a shale sample and its effective modulus. In the following section, we
will discuss the training process of this convolutional neural network.
746 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 11. A convolutional neural network to establish the implicit mapping between the mesoscale structure of a shale sample and its effective
modulus.
4.3. Training the convolution neural network
In this research, 10,000 stochastic mesoscale shale samples are generated to train a convolutional neural network.
The training process is conducted on a desktop computer with i7-8700 CPU, 32G RAM, and an Nvidia GTX1080Ti.
The training is iterated for 100 cycles, and it takes about 43 min to finish the training process.
As previously mentioned, the training process is to determine all the weights and biases of the convolutional neural
network. The cost function used for training is the mean squared error function.
C=1
2n
n
i=1
oili2(11)
Stochastic gradient descent algorithm is employed to update all the weights and biases iteratively. The relation
between the cost and the training iteration is plotted in Fig. 12(a). It can be observed from the figure that the cost at
the first iteration is significant because the initial values of weights and biases are randomly assigned. The weights and
biases are updated based on the algorithm illustrated in Section 4.1. After several iterations, the cost rapidly descends.
After the training is finished, 2000 stochastic shale samples are used for cross validation. The moduli distribution
of these stochastic samples is shown in Fig. 13. The minimum and maximum modulus of these samples are 51.98 GPa
and 69.26 GPa, respectively.
The 2000 samples are used as input data of the trained convolutional neural network. The network processes the
input data and outputs the predicted effective moduli of the 2000 samples. The predicted effective moduli are then
compared with the moduli calculated based on finite element method. The relation between the cross validation error
and the training iteration is shown in Fig. 12(b). From the figure, it can be observed that the cross validation error
converges after about 50 iterations. Fig. 14 depicts the distribution of the cross validation errors after 100 training
iterations. The majority of the cross validation errors for the 2000 samples is under 2%. The average cross validation
accuracy is 0.55%. Hence, no obvious over-fitting is observed in the cross validation, and the network can be further
employed to predict the effective moduli of real shale samples.
4.4. Predicting the moduli of real shale samples
The objective of establishing this network is to predict the effective moduli of real shale samples. In the final step,
we employ the trained convolutional neural network to predict the effective moduli of 500 real shale samples. Some of
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 747
Fig. 12. (a) The training error descends as the training iteration increases. (b) The cross validation error descends as the training iteration increases.
Fig. 13. The moduli distribution of the 2000 stochastic shale samples.
these real shale samples are shown in Fig. 15. It can be observed that the percentages and distributions of the forming
constituents appear to be extensively random on these images.
The moduli distribution of these real samples is shown in Fig. 16. The moduli of the real samples vary from
56.26 GPa to 82.40 GPa.
The 500 real shale samples are used as input data of the trained convolutional neural network. It should be
mentioned that none of the 500 real samples has been used in the training process. The network outputs the effective
moduli of these 500 samples. The predicted moduli are compared with the labels of these samples to evaluate the
prediction errors. The prediction results are given in Fig. 17. From the figure, it can be observed that most of the
prediction errors are below 3%. The average prediction error is 0.97%. We reckon that, based on the limited 10,000
training samples, the trained convolutional neural network exhibits promising performance in predicting the effective
moduli of real shale samples.
5. Conclusion
A new method to predict effective mechanical properties of heterogeneous materials is presented in this paper.
In this method, numerous stochastic mesoscale material samples are generated based on scanning image of material
748 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 14. The distribution of the cross validation errors.
samples. The stochastic samples are transformed into corresponding finite element models. The effective mechanical
properties of the finite element models are calculated based on finite element analysis. The effective mechanical
properties are regarded as labels of the samples. The mesoscale structures of the stochastic samples and their labels are
combined as training data to train a convolution neural work. The proposed method takes advantages of the advanced
fitting capability of the deep learning algorithm, and a multiple-layer convolution neural network is trained to excavate
the implicit mapping between the mesoscale structure of material samples and their effective mechanical properties.
The network is validated by cross validation and then employed to predict the effective mechanical properties of real
samples. The prediction accuracy and efficiency of the method are promising.
In this paper, the proposed method combines image processing techniques, stochastic reconstruction approaches,
finite element analysis, and deep learning method to predict the effective moduli of mesoscale shale samples. It should
be noted that the prediction of the effective moduli of shale samples is used as an example to illustrate the method.
The method can be further applied to predict the effective mechanical properties of other heterogeneous materials and
even be integrated into the design of new composites with anticipated effective properties.
Acknowledgments
This work is supported by the Science Challenge Project, China, No. TZ2018001, National Natural Science
Foundation of China, under Grant No. 11722218, 11302115 and 11532008, Tsinghua University Initiative Scientific
Research Program, China.
Appendix. Linear elastic finite element method
In this section, we briefly discussed the linear-elastic finite element model used in this paper [75,76]. In the context
of small strain, the strain tensor is defined by the displacement gradient as
εe=1
2u+(u)T(12)
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 749
Fig. 15. Several real mesoscale shale samples employed for modulus prediction.
Fig. 16. The moduli distribution of the 500 real mesoscale shale samples.
750 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
Fig. 17. The prediction errors distribution of the 500 real shale samples.
With respect to the linear elastic constitutive model, the stress and strain tensor complies the following linear
relation,
σ=C:εe(13)
In a quasi-static system, the energy balance of an elastic solid is illustrated as
Wex t =Wi nt (14)
Win t is the internal energy defined by the elastic deformation of the solid body.
Win t (εe)=
1
2
εe:C:εed(15)
Cis the stiffness matrix of the solid body, and εeis the strain Wex t is the external energy contributed by the body
force and the boundary traction.
Wex t =
b·ud+
h·ud(16)
The energy balance should also hold for the variation of the internal and external energy
δWex t =δWi nt (17)
The variation of the internal and external energy is respectively given as
δWin t (εe)=Wi nt
ε
δε=
εe:C:δεed(18)
δWex t =
b·δud+
h·δud(19)
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 751
Eqs. (18) and (19) are substituted into Eq. (17) to give the strong form of the equilibrium equation to describe solid
deformation.
· σ+b=0i n ,
u=¯
uon ¯
u,
σ·n=¯
hon ¯
h,
(20)
The Galerkin weak forms are derived based on strong form boundary value equations in Eq. (20).
[σ:δεe]d¯
h¯
h·δud=0 (21)
In the above equations, δuis variational test functions of displacement. Thus, displacement is expressed using
interpolation of nodal variables.
u=
4
I=1
NIuI(22)
In Eq. (22),NI=NI0
0NIis shape function matrix for displacement field by introducing Voigt notation. The
gradient matrix of shape function matrix Ncan be defined correspondingly as
BI=
NI,x0
0NI,y
NI,yNI,x
(23)
The gradient of displacement can be discretized using Bmatrices as
εe=
4
I=1
BIuI(24)
By inserting discretized expressions of primary variables and their gradients into Eq. (22), the discretized format
of the residual of the weak form equation is obtained.
RI=
[σBI]dhNI¯
hd(25)
The corresponding tangent stiffness matrices of the above equations are
KI J =RI
uJ
=(BI)TCBJd(26)
These discretized residual equations are implicitly solved using a linear solver, and the displacements of all nodes
are calculated.
References
[1] Z. Hashin, The elastic moduli of heterogeneous materials, J. Appl. Mech. 29 (1) (1962) 143–150.
[2] B. Budiansky, On the elastic moduli of some heterogeneous materials, J. Mech. Phys. Solids 13 (4) (1965) 223–227.
[3] M. Hori, F. Yonezawa, Statistical theory of effective electrical, thermal, and magnetic properties of random heterogeneous materials. IV.
effective- medium theory and cumulant expansion method, J. Math. Phys. 16 (2) (1975) 352–364.
[4] E.J. Garboczi, A. Day, An algorithm for computing the effective linear elastic properties of heterogeneous materials: three-dimensional
results for composites with equal phase Poisson ratios, J. Mech. Phys. Solids 43 (9) (1995) 1349–1362.
[5] J. Wang, J.K. Carson, M.F. North, D.J. Cleland, A new approach to modelling the effective thermal conductivity of heterogeneous materials,
Int. J. Heat Mass Transfer 49 (17–18) (2006) 3075–3083.
[6] C.P. Bobko, Assessing the mechanical microstructure of shale by nanoindentation: The link between mineral composition and mechanical
properties, Mass. Inst. Technol. (2008).
[7] S. Lee, L. Hyder, P. Alley, Microstructural and Mineralogical Characterization of Selected Shales in Support of Nuclear Waste Respository
Studies, Microstructure of Fine-grained Sediments, Springer, 1991, pp. 545–560.
[8] K.C. Bennett, L.A. Berla, W.D. Nix, R.I. Borja, Instrumented nanoindentation and 3D mechanistic modeling of a shale at multiple scales,
Acta Geotech. 10 (1) (2015) 1–14.
[9] V. Kumar, C.H. Sondergeld, C.S. Rai, Nano to macro mechanical characterization of shale, SPE annual technical conference and exhibition,
Soc. Pet. Eng. (2012).
752 X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753
[10] C.H. Sondergeld, R.J. Ambrose, C.S. Rai, J. Moncrieff, Micro-structural studies of gas shales, SPE unconventional gas conference, Soc. Pet.
Eng. (2010).
[11] M.E. Curtis, R.J. Ambrose, C.H. Sondergeld, Structural characterization of gas shales on the micro-and nano-scales, canadian unconventional
resources and international petroleum conference, Soc. Pet. Eng. (2010).
[12] F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychol. Rev. 65 (6) (1958) 386.
[13] W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: An Introduction, Morgan Kaufmann San Francisco, 1998.
[14] N.M. Nasrabadi, Pattern recognition and machine learning, J. Electron. Imaging 16 (4) (2007) 049901.
[15] Y. Liang, H. Lee, S. Lim, W. Lin, K. Lee, C. Wu, Proper orthogonal decomposition and its applications—part I: Theory, J. Sound Vib. 252
(3) (2002) 527–544.
[16] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1)
(1997) 119–139.
[17] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on
Computational Learning Theory, ACM, 1992, pp. 144–152.
[18] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297.
[19] J. Kennedy, Particle Swarm Optimization, Encyclopedia of Machine Learning, Springer, 2011, pp. 760–766.
[20] W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophys. 5 (4) (1943) 115–133.
[21] A.G. Ivakhnenko, Polynomial theory of complex systems, IEEE Trans. Syst. Man Cybern. (4) (1971) 364–378.
[22] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533.
[23] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507.
[24] J. Ghaboussi, D.A. Pecknold, M. Zhang, R.M. Haj-Ali, Autoprogressive training of neural network constitutive models, Internat. J. Numer.
Methods Engrg. 42 (1) (1998) 105–126.
[25] S. Jung, J. Ghaboussi, Neural network constitutive model for rate-dependent materials, Comput. Struct. 84 (15–16) (2006) 955–963.
[26] G. Ji, F. Li, Q. Li, H. Li, Z. Li, A comparative study on Arrhenius-type constitutive model and artificial neural network model to predict
high-temperature deformation behaviour in Aermet100 steel, Mater. Sci. Eng. A 528 (13–14) (2011) 4774–4782.
[27] T. Furukawa, G. Yagawa, Implicit constitutive modelling for viscoplasticity using neural networks, Internat. J. Numer. Methods Engrg. 43
(2) (1998) 195–219.
[28] Y. Hashash, S. Jung, J. Ghaboussi, Numerical implementation of a neural network based material model in finite element analysis, Int. J.
Numer. Methods Eng. 59 (7) (2004) 989–1005.
[29] Y. Sun, W. Zeng, Y. Zhao, Y. Qi, X. Ma, Y. Han, Development of constitutive relationship model of Ti600 alloy using artificial neural network,
Comput. Mater. Sci. 48 (3) (2010) 686–691.
[30] W.E. Faller, S.J. Schreck, Unsteady fluid mechanics applications of neural networks, J. Aircr. 34 (1) (1997) 48–55.
[31] D. Wang, W. Liao, Modeling and control of magnetorheological fluid dampers using neural networks, Smart Mater. Struct. 14 (1) (2004)
111.
[32] Z. Yuhong, H. Wenxin, Application of artificial neural network to predict the friction factor of open channel flow, Commun. Nonlinear Sci.
Numer. Simul. 14 (5) (2009) 2373–2378.
[33] T. Butz, O. Von Stryk, Modelling and simulation of electro-and magnetorheological fluid dampers, ZAMM-J. Appl. Math. Mech./Z. Angew.
Math. Mech: Appl. Math. Mech. 82 (1) (2002) 3–20.
[34] R. Beigzadeh, M. Rahimi, Prediction of heat transfer and flow characteristics in helically coiled tubes using artificial neural networks, Int.
Commun. Heat Mass Transfer 39 (8) (2012) 1279–1285.
[35] Y. Mi, M. Ishii, L. Tsoukalas, Flow regime identification methodology with neural networks and two-phase flow models, Nucl. Eng. Des.
204 (1–3) (2001) 87–100.
[36] V. Sundararaghavan, N. Zabaras, Classification and reconstruction of three-dimensional microstructures using support vector machines,
Comput. Mater. Sci. 32 (2) (2005) 223–239.
[37] R. Liu, Y.C. Yabansu, A. Agrawal, S.R. Kalidindi, A.N. Choudhary, Machine learning approaches for elastic localization linkages in high-
contrast composite materials, Integr. Mater. Manuf. Innov. 4 (1) (2015) 13.
[38] R. Kondo, S. Yamakawa, Y. Masuoka, S. Tajima, R. Asahi, Microstructure recognition using convolutional neural networks for prediction of
ionic conductivity in ceramics, Acta Mater. 141 (2017) 29–38.
[39] R. Cang, H. Li, H. Yao, Y. Jiao, Y. Ren, Improving direct physical properties prediction of heterogeneous materials from imaging data via
convolutional neural network and a morphology-aware generative model, Comput. Mater. Sci. 150 (2018) 212–221.
[40] M. Bessa, R. Bostanabad, Z. Liu, A. Hu, D.W. Apley, C. Brinson, W. Chen, W.K. Liu, A framework for data-driven analysis of materials
under uncertainty: Countering the curse of dimensionality, Comput. Methods Appl. Mech. Engrg. 320 (2017) 633–667.
[41] F.J. Pettijohn, Sedimentary rocks, 1957.
[42] S. Boggs, Petrology of Sedimentary Rocks, Cambridge University Press, 2009.
[43] S. Bernard, R. Wirth, A. Schreiber, L. Bowen, A. Aplin, E. Mathia, H. Schulz, B. Horsfield, A. Aplin, E. Mathia, FIB-SEM and TEM
investigations of an organic-rich shale maturation series from the lower toarcian posidonia shale, Ger.: Nanoscale Pore Syst. Fluid-rock
Interact., Electron Microsc. Shale Hydrocarbon Reserv.: AAPG Memoir 102 (2013) 53–66.
[44] N. Ohkouchi, J.i. Kuroda, M. Okada, H. Tokuyama, Why cretaceous black shales have high C/N ratios: Implications from SEM-EDX
observations for livello bonarelli black shales at the cenomanian-turonian boundary, Front. Res. Earth Evol. 1 (2003) 239–241.
[45] S. Abedi, M. Slim, R. Hofmann, T. Bryndzia, F.-J. Ulm, Nanochemo-mechanical signature of organic-rich shales: a coupled indentation–EDX
analysis, Acta Geotech. 11 (3) (2016) 559–572.
[46] S. Kelly, H. El-Sobky, C. Torres-Verdín, M.T. Balhoff, Assessing the utility of FIB-SEM images for shale digital rock physics, Adv. Water
Resour. 95 (2016) 302–316.
X. Li, Z.L. Liu, S. Cui et al. / Computer Methods in Applied Mechanics and Engineering 347 (2019) 735–753 753
[47] P. Tahmasebi, F. Javadpour, M. Sahimi, Three-dimensional stochastic characterization of shale SEM images, Transp. Porous Media 110 (3)
(2015) 521–531.
[48] C.D. Foster, T.M. Nejad, Embedded discontinuity finite element modeling of fluid flow in fractured porous media, Acta Geotech. 8 (1) (2013)
49–57.
[49] J.F. Barthélémy, C. Souque, J.M. Daniel, Nonlinear homogenization approach to the friction coefficient of a quartz-clay fault gouge, Int. J.
Numer. Anal. Methods Geomech. 37 (13) (2013) 1948–1968.
[50] J.A. White, Anisotropic damage of rock joints during cyclic loading: constitutive framework and numerical integration, Int. J. Numer. Anal.
Methods Geomech. 38 (10) (2014) 1036–1057.
[51] A.H. Kohli, M.D. Zoback, Frictional properties of shale reservoir rocks, J. Geophys. Res. Solid Earth 118 (9) (2013) 5109–5125.
[52] C. Yeong, S. Torquato, Reconstructing random media, Phys. Rev. E 57 (1) (1998) 495.
[53] S. Torquato, Statistical description of microstructures, Annu. Rev. Mater. Sci. 32 (1) (2002) 77–111.
[54] D.B. Shaw, C.E. Weaver, The mineralogical composition of shales, J. Sediment. Res. 35 (1) (1965).
[55] A. Deirieh, J. Ortega, F.-J. Ulm, Y. Abousleiman, Nanochemomechanical assessment of shale: a coupled WDS-indentation analysis, Acta
Geotech. 7 (4) (2012) 271–295.
[56] V. Kumar, Geomechanical Characterization of Shale Using Nano-Indentation, University of Oklahoma Norman, OK, USA, 2012.
[57] B. Gathier, Multiscale strength homogenization: application to shale nanoindentation, Mass. Inst. Technol. (2008).
[58] D. Cule, S. Torquato, Generating random media from limited microstructural information via stochastic optimization, J. Appl. Phys. 86 (6)
(1999) 3428–3437.
[59] B. Lu, S. Torquato, Lineal-path function for random heterogeneous materials, Phys. Rev. A 45 (2) (1992) 922.
[60] B. Lu, S. Torquato, Lineal-path function for random heterogeneous materials. II. effect of polydispersivity, Phys. Rev. A 45 (10) (1992) 7292.
[61] B. Yegnanarayana, Artificial Neural Networks, PHI Learning Pvt. Ltd., 2009.
[62] R. Lippmann, An introduction to computing with neural nets, IEEE ASSP Mag. 4 (2) (1987) 4–22.
[63] R.J. Schalkoff, Artificial Neural Networks, McGraw-Hill New York, 1997.
[64] A.K. Jain, J. Mao, K.M. Mohiuddin, Artificial neural networks: A tutorial, Computer 29 (3) (1996) 31–44.
[65] M.A. Nielsen, Neural Networks and Deep Learning, Determination Press, 2015.
[66] M.H. Hassoun, Fundamentals of Artificial Neural Networks, MIT press, 1995.
[67] K. Fukushima, S. Miyake, Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition,
Competition and Cooperation in Neural Nets, Springer, 1982, pp. 267–285.
[68] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[69] M.D. Zeiler, R. Fergus, Visualizing and Understanding Convolutional Networks, European Conference on Computer Vision, Springer, 2014,
pp. 818–833.
[70] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: A convolutional neural-network approach, IEEE Trans. Neural Netw. 8 (1)
(1997) 98–113.
[71] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Adv. neural Inf. Process. Syst.
(2012) 1097–1105.
[72] D. Ciregan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification, in: Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE conference on, IEEE, 2012, pp. 3642–3649.
[73] M.D. Zeiler, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: Computer Vision (ICCV),
2012 IEEE International Conference on, IEEE, 2011, pp. 2018–2025.
[74] M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional Networks, Computer Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on, IEEE, 2010, pp. 2528–2535.
[75] K.J. Bathe, Finite Element Method, Wiley Online Library, 2008.
[76] X.-c. Wang, Finite Element Method, Tsinghua University Press, Beijing, 2003.
... To address these challenges, three-dimensional convolutional neural network (3D-CNN) have been adopted for directly processing voxelized microstructural data [15][16][17]. Cecen et al. [18] employed 3D-CNN to establish direct microstructure-property linkages, achieving substantial improvements in predictive accuracy. ...
Article
Full-text available
This study introduces a polycrystalline graph convolutional network (PGCNN) to predict the mechanical properties of Ti-6Al-4V alloy’s dual-phase polycrystalline microstructure. The model captures complex inter-grain interactions. It integrates node features and graph structural information to map microstructures to macroscopic mechanical properties. The PGCNN model demonstrated exceptional predictive performance (mean absolute relative error, MARE = 0.369%). It remained robust in handling nonlinear relationships and capturing high-order inter-grain interactions, even with limited datasets (MARE = 1.985%). We evaluated the interpretability of the PGCNN model through analyses at the node, edge, and graph structure levels, offering comprehensive insights. At the node level, the influence of each grain (node) on the output was quantified, clarifying the direct link between individual grains and macroscopic performance. Edge level analysis emphasized the importance of inter-grain interactions. It laid the groundwork for identifying grain boundaries that significantly affect mechanical properties. Graph level analysis quantified the overall impact of microstructural features on macroscopic performance. This provided insights into the complex “microstructure–mechanical property” relationship in dual-phase polycrystals.
... The number of hidden layer nodes ranged from 4 to 11 to obtain the optimal training network. The number of neurons in the hidden layer was determined by trial and error [69]. The data were normalized between 0 and 1 to eliminate unit problems. ...
Article
Full-text available
Background The time a patient spends in the hospital from admission to discharge is known as the length of stay (LOS). Predicting LOS is crucial for enhancing patient care, managing hospital resources, and optimizing the use of patient beds. Therefore, this study aimed to predict the LOS for patients hospitalized in various clinics using different artificial intelligence (AI) models. Methods The study analyzed 162,140 hospitalized patients aged 18 and older at various clinics of a university hospital in northern Türkiye from 2012 to 2020. Three soft computing methods—Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference Systems (ANFIS), and Multiple Linear Regression Analysis (MLR)—were employed to estimate LOS using inputs such as medical and imaging services (number of CT, USG, ECG, hemogram tests, medical biochemistry, and number of direct x-rays), demographic, and diagnostic data (patients’ age, sex, season of hospitalization, type of hospitalization, diagnosis, and second diagnosis). The LOS predictions utilized single and double-hidden layer ANNs with various training algorithms (Levenberg-Marquardt-LM, Bayesian Regularization-BR and Scaled Conjugate Gradient-SCG) and activation functions (tangent-sigmoid, purelin), ANFIS with Grid Partitioning (ANFIS-GP), and MLR. Model performance was evaluated using the Coefficient of Determination (R²), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Results Of the patients, 54% were male and 43.5% were treated in surgical clinics. The mean age was 55.1 years, with 32.9% of participants aged 65 years or older. Hospital stays were 2–7 days for 39.7% of patients, over 7 days for 30.9%, and 1 day for 29.4%. Neoplasm-related diagnoses (ICD codes) accounted for 25.1% of admissions. Variables influencing LOS were identified through feature selection from patients in various hospital wards. The most significant factors affecting LOS include second diagnosis, the number of hemogram tests, computerized tomography scans (CT), ultrasonography (USG), and direct X-rays. Utilizing these factors, 12 models with varied input variables were developed and analyzed. The double hidden layer ANN model with the Levenberg-Marquardt (LM) training algorithm outperformed the others, achieving R² values of 0.854 for training and 0.807 for the test dataset, with RMSE values of 2.397 days and 2.774 days and MAE values of 1.787 days and 1.994 days, respectively. Following ANN-LM, the best results were obtained with ANFIS-GP, while MLR exhibited the lowest performance. Conclusions Various AI models can effectively predict LOS for patients in different hospital units. Accurate LOS predictions can help health managers allocate resources more equitably across units.
... Tang et al. (2022) applied grain-based modeling with microscale Young's modulus to derive the macroscale mechanical properties of granite samples. Li et al. (2019) predicted the meso-scale mechanical properties of shale rocks using the FEM method and deep learning. Saxena and Mavko (2016) compared the estimated 2D and 3D elastic moduli of rocks from 2D thin-section images. ...
Article
Full-text available
The mechanical properties of shale rocks are essential for effective extraction of unconventional shale gas and oil. Digital rock imaging through scanning electron microscopy (SEM) plays a pivotal role in characterizing these properties. However, accurately segmenting clay from grain minerals in SEM images is challenging due to overlapping grayscale values. This study introduces a workflow that employs a deep learning algorithm for SEM image segmentation, coupled with finite element method (FEM) simulations to model shale elastic modulus. The accuracy of these simulations is validated against meso-scale laboratory microindentation tests. The deep learning algorithm, U-Net, was utilized to effectively segment shale SEM images into four phases including mineral grains, clays, organic matter, and pyrite, achieving a mean accuracy of 0.91 and an Intersection over Union (IoU) of 0.73. The model demonstrated robust performance in segmenting unseen images, particularly excelling in organic matter, followed by pyrite, mineral grains, and clay. Finite element method (FEM) simulations using the 2D lattice approach estimated Young’s modulus of shale at 43.7 GPa, contrasting with the 38.2 GPa observed in micro-scale microindentation tests. Discrepancies between these simulation and experimental outcomes were thoroughly analyzed. This research, integrating deep learning with 2D finite element method (FEM) simulations, offers a novel approach for directly modeling the mechanical behavior of heterogeneous shales from micro-scale SEM imaging, and provides insights into the microstructure-mechanical relationships in rock physics.
Article
High-throughput experiments (HTE) aim to acquire extensive chemical or physical properties in a single experiment, thereby enhancing testing efficiency. To simplify the extraction of diverse properties from one specimen, samples have moved from “discrete” arrays to “continuous” gradient ones. Despite this, complex responses of “continuous” gradient samples have impeded the development of continuous HTE. Full-field data, which can be obtained with Digital Image Correlation (DIC), is nec-essary for mechanical property characterizations. Traditional inversion methods for calculating property distributions from this data are slow and error-prone. Deep learning (DL) offers a faster and more accurate alternative for characterizing prop-erties. Therefore, based on convolutional neural networks (CNNs), this article estab-lishes a mapping model to obtain the modulus distribution directly from the full-field displacement. In view of the cost of time, simulation data are used to re-place DIC data. However, fine mesh must be used to obtain the precise responses of gradient samples which unfortunately making the DL model face the challenge of time-consuming dataset generation and high-dimensional data mapping. To alleviate the difficulties, the isoparametric graded finite element (IGFE) formulation is in-troduced in this article, which offers an efficient way to generate datasets with low-dimension but high-fidelity. Results show that our framework not only has high prediction accuracy (with the L1-error of 1.38%) but also enables fast characteri-zation (within 12ms), providing methodological support for high-throughput char-acterization based on gradient samples.
Article
Full-text available
With the development of the economy and society, porous materials have been widely used in various fields due to their unique structure and function. Therefore, it is of great significance to analyze the mechanical properties of porous materials. In traditional analysis methods, experimental and numerical simulation methods are mainly used. When conducting finite element numerical simulation analysis on porous materials, a large number of fine grids need to be divided, and the calculation process is time-consuming and laborious. This article randomly generates porous microstructure models through algorithms and uses efficient quadtree algorithms to calculate their mechanical properties, thereby obtaining a large amount of machine-learning sample data. Furthermore, a neural network-based machine learning algorithm is established to predict the mechanical properties of porous materials. By using microstructure images as the input layer of the model, the mechanical properties under corresponding conditions can be directly predicted. This study provides a new method for predicting mechanical properties based on microstructure images. It has been verified that the mechanical properties directly predicted by the network are similar to the actual ones, with high accuracy and computational efficiency.
Article
Full-text available
With technological advancement and development, there is a tremendous increase in demand for different smart materials because of their stimulation from external sources. Moreover, the time‐dependent response of smart materials provides insight into the fabrication of these materials using 4D printing (4DP) techniques. Hence, this study presents a comprehensive review of the 4DP of smart materials. The review covers different aspects of smart material, from design and optimization to printing. Herein, smart materials have been discussed in detail based on the physical, biological, and chemical stimuli‐responsive and their subtype's behavior. For designing smart materials, different usage of tools such as new designing software, finite element analysis, and machine learning are also discussed. The design of smart materials is challenging because of the different responsive natures and complexity of design and mechanisms. Hence, a detailed review of present 3D printing techniques, the use of 4DP, and how future applications can be incorporated with smart material and 4DP is presented. With the help of machine learning, the fabrication of smart materials using 4DP is also discussed. The review provides future directions for fabricating smart materials using 4DP. The design and printing challenges of smart materials for future utilization have also been comprehensively covered.
Article
Full-text available
Foam ceramics are widely used in industrial applications due to their unique properties, including high porosity, lightweight, and high-temperature resistance. However, their complex microstructure presents significant challenges for image analysis. Traditional machine learning methods often fall short in capturing both global feature dependencies and detailed representations. To address this, a novel artificial intelligence recognition model, FD-Conv, is proposed, which combines the global information processing capabilities of Transformers with the local feature extraction strengths of convolutional neural networks. Additionally, a frequency domain block detail enhancement mechanism is introduced to improve recognition accuracy. Experimental results demonstrate that the FD-Conv model enhances recognition accuracy by at least 7.6% compared to state-of-the-art methods. Furthermore, the model effectively identifies foam ceramics with varying compositions and formulations and quantifies their microstructural phase characteristics. This research aims to advance the application of foam ceramic microstructure image analysis by improving recognition accuracy, particularly in multi-source microscopic image feature learning and pattern recognition.
Article
Full-text available
Direct prediction of material properties from microstructures through statistical models has shown to be a potential approach to accelerating computational material design with large design spaces. However, statistical modeling of highly nonlinear mappings defined on high-dimensional microstructure spaces is known to be data-demanding. Thus, the added value of such predictive models diminishes in common cases where material samples (in forms of 2D or 3D microstructures) become costly to acquire either experimentally or computationally. To this end, we propose a generative machine learning model that creates an arbitrary amount of artificial material samples with negligible computation cost, when trained on only a limited amount of authentic samples. The key contribution of this work is the introduction of a morphology constraint to the training of the generative model, that enforces the resultant artificial material samples to have the same morphology distribution as the authentic ones. We show empirically that the proposed model creates artificial samples that better match with the authentic ones in material property distributions than those generated from a state-of-the-art Markov Random Field model, and thus is more effective at improving the prediction performance of a predictive structure-property model.
Article
Full-text available
The organic–inorganic nature of organic-rich source rocks poses several challenges for the development of functional relations that link mechanical properties with geochemical composition. With this focus in mind, we herein propose a method that enables chemo-mechanical characterization of this highly heterogeneous source rock at the micron and submicron length scale through a statistical analysis of a large array of energy-dispersive X-ray spectroscopy (EDX) data coupled with nanoindentation data. The ability to include elemental composition to the indentation probe via EDX is shown to provide a means to identify pure material phases, mixture phases, and interfaces between different phases. Employed over a large array, the statistical clustering of this set of chemo-mechanical data provides access to the properties of the fundamental building blocks of clay-dominated organic-rich source rocks. The versatility of the approach is illustrated through the application to a large number of source rocks of different origin, chemical composition, and organic content. We find that the identified properties exhibit a unique scaling relation between stiffness and hardness. This suggests that organic-rich shale properties can be reduced to their elementary constituents, with several implications for the development of predictive functional relations between chemical composition and mechanical properties of organic-rich source rocks such as the intimate interplay between clay-packing, organic maturity, and mechanical properties of porous clay/organic phase.
Article
Full-text available
There has been a growing recognition of the opportunities afforded by advanced data science and informatics approaches in addressing the computational demands of modeling and simulation of multiscale materials science phenomena. More specifically, the mining of microstructure–property relationships by various methods in machine learning and data mining opens exciting new opportunities that can potentially result in a fast and efficient material design. This work explores and presents multiple viable approaches for computationally efficient predictions of the microscale elastic strain fields in a three-dimensional (3-D) voxel-based microstructure volume element (MVE). Advanced concepts in machine learning and data mining, including feature extraction, feature ranking and selection, and regression modeling, are explored as data experiments. Improvements are demonstrated in a gradually escalated fashion achieved by (1) feature descriptors introduced to represent voxel neighborhood characteristics, (2) a reduced set of descriptors with top importance, and (3) an ensemble-based regression technique.
Article
Full-text available
Complexity in shale-gas reservoirs lies in the presence of multiscale networks of pores that vary from nanometer to micrometer scale. Scanning electron microscope (SEM) and atomic force microscope imaging are promising tools for a better understanding of such complex microstructures. Obtaining 3D shale images using focused ion beam-SEM for accurate reservoir forecasting and petrophysical assessment is not, however, currently economically feasible. On the other hand, high-quality 2D shale images are widely available. In this paper, a new method based on higher-order statistics of a porous medium (as opposed to the traditional two-point statistics) is proposed in which a single 2D image of a shale sample is used to reconstruct stochastically equiprobable 3D models of the sample. Because some pores may remain undetected in the SEM images, data from other sources, such as the pore-size distribution obtained from nitrogen adsorption data, are integrated with the overall pore network using an object-based technique. The method benefits from a recent algorithm, the cross- correlation-based simulation, by which high-quality, unconditional/conditional realizations of a given sample porous medium are produced. To improve the ultimate 3D model, a novel iterative algorithm is proposed that refines the quality of the realizations significantly. Furthermore, a new histogram matching, which deals with multimodal continuous properties in shale samples, is also proposed. Finally, quantitative comparison is made by computing various statistical and petrophysical properties for the original samples, as well as the reconstructed model.
Article
A new method, termed autoprogressive training, for training neural networks to learn complex stress–strain behaviour of materials using global load–deflection response measured in a structural test is described. The richness of the constitutive information that is generally implicitly contained in the results of structural tests may in many cases make it possible to train a neural network material model from only a small number of such tests, thus overcoming one of the perceived limitations of a neural network approach to modelling of material behaviour; namely, that a voluminous amount of material test data is required. The method uses the partially‐trained neural network in a central way in an iterative non‐linear finite element analysis of the test specimen in order to extract approximate, but gradually improving, stress–strain information with which to train the neural network. An example is presented in which a simple neural network constitutive model of a T300/976 graphite/epoxy unidirectional lamina is trained, using the load–deflection response recorded during a destructive compressive test of a [(±45)6]S laminated structural plate containing an open hole. The results of a subsequent forward analysis are also presented, in which the trained material model is used to simulate the response of a compressively loaded [(±30)6]S structural laminate containing an open hole. Avenues for further improvement of the neural network model are also suggested. The proposed autoprogressive algorithm appears to have wide application in the general area of Non‐Destructive Evaluation (NDE) and damage detection. Most NDE experiments can be viewed as structural tests and the proposed methodology can be used to determine certain damage indices, similar to the way in which constitutive models are determined. © 1998 John Wiley & Sons, Ltd.
Article
Convolutional neural networks (CNNs) have recently exhibited state-of-the-art performance with respect to image recognition tasks. In the present study, we adopt CNNs to link experimental microstructures with corresponding ionic conductivities. The results reveal that CNNs can be trained using only seven micrographs, and their performance exceeds the conventional scheme using hand-crafted features. While the main drawback in the use of CNNs is poor interpretability of their highly abstracted features, we propose a feature visualization method that is suitable for the proposed training scheme, assuming that all of the cropped images from a macroscopic image have the representative macroscopic property. The visualization results showed that the present CNNs automatically extract semantic features having a large correlation with macroscopic properties, such as the number of voids and the area without voids. By analyzing these features, we find an optimized size of the representative volume element to ensure the prediction accuracy of the CNNs, providing useful guidance in preparation for the training set.
Article
A new data-driven computational framework is developed to assist in the design and modeling of new material systems and structures. The proposed framework integrates three general steps: (1) design of experiments, where the input variables describing material geometry (microstructure), phase properties and external conditions are sampled; (2) efficient computational analyses of each design sample, leading to the creation of a material response database; and (3) machine learning applied to this database to obtain a new design or response model. In addition, the authors address the longstanding challenge of developing a data-driven approach applicable to problems that involve unacceptable computational expense when solved by standard analysis methods – e.g. finite element analysis of representative volume elements involving plasticity and damage. In these cases the framework includes the recently developed “self-consistent clustering analysis” method in order to build large databases suitable for machine learning. The authors believe that this will open new avenues to finding innovative materials with new capabilities in an era of high-throughput computing (“big-data”).
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Quantitative X-ray diffraction analysis gives data on quartz, feldspar, and carbonate in shales with a precision of + or -10 percent; clay mineral content is determined by difference. Obviously silty and calcareous shales were not included among the 400 samples from Tertiary rocks of the Gulf Coast, Mesozoic rocks from the western interior of the United States and Canada, and Paleozoic rocks from the Marathon Uplift, Big Bend Arch, Ouachita, and Appalachian areas, and 57 samples identified only as miscellaneous. Means (percent) and standard deviations for about 300 of these samples are quartz, 33.6, 15.2; feldspar, 3.6, 4.8; carbonate, 2.7, 8.4, and clay minerals, 64.1, 9.4. The inclusion of an additional 100 unidentified samples changes the means to quartz, 30.8; feldspar, 4.5; clay minerals, 60.9; carbonates, 3.6; iron oxides, >0.5; organic matter, 1; and other minerals, 2 percent.
Article
Shales and other unconventional or low permeability (tight) reservoirs house vast quantities of hydrocarbons, often demonstrate considerable water uptake, and are potential repositories for fluid sequestration. The pore-scale topology and fluid transport mechanisms within these nanoporous sedimentary rocks remain to be fully understood. Image-informed pore-scale models are useful tools for studying porous media: a debated question in shale pore-scale petrophysics is whether there is a representative elementary volume (REV) for shale models? Furthermore, if an REV exists, how does it differ among petrophysical properties? We obtain three dimensional (3D) models of the topology of microscale shale volumes from image analysis of focused ion beam-scanning electron microscope (FIB-SEM) image stacks and investigate the utility of these models as a potential REV for shale. The scope of data used in this work includes multiple local groups of neighboring FIB-SEM images of different microscale sizes, corresponding core-scale (milli- and centimeters) laboratory data, and, for comparison, series of two-dimensional (2D) cross sections from broad ion beam SEM images (BIB-SEM), which capture a larger microscale field of view than the FIB-SEM images; this array of data is larger than the majority of investigations with FIB-SEM-derived microscale models of shale. Properties such as porosity, organic matter content, and pore connectivity are extracted from each model. Assessments of permeability with single phase, pressure-driven flow simulations are performed in the connected pore space of the models using the lattice-Boltzmann method. Calculated petrophysical properties are compared to those of neighboring FIB-SEM images and to core-scale measurements of the sample associated with the FIB-SEM sites. Results indicate that FIB-SEM images below ∼5000 µm3 volume (the largest volume analyzed) are not a suitable REV for shale permeability and pore-scale networks; i.e. field of view is compromised at the expense of detailed, but often unconnected, nanopore morphology. Further, we find that it is necessary to acquire several local FIB-SEM or BIB-SEM images and correlate their extracted geometric properties to improve the likelihood of achieving representative values of porosity and organic matter volume. Our work indicates that FIB-SEM images of microscale volumes of shale are a qualitative tool for petrophysical and transport analysis. Finally, we offer alternatives for quantitative pore-scale assessments of shale.