Content uploaded by Shima Rastgordani
Author content
All content in this area was uploaded by Shima Rastgordani on Jan 03, 2023
Content may be subject to copyright.
Citation: Cheloee Darabi, A.;
Rastgordani, S.; Khoshbin, M.; Guski,
V.; Schmauder, S. Hybrid
Data-Driven Deep Learning
Framework for Material Mechanical
Properties Prediction with the Focus
on Dual-Phase Steel Microstructures.
Materials 2023,16, 447. https://
doi.org/10.3390/ma16010447
Academic Editor: Lijun Zhang
Received: 22 November 2022
Revised: 12 December 2022
Accepted: 14 December 2022
Published: 3 January 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
materials
Article
Hybrid Data-Driven Deep Learning Framework for Material
Mechanical Properties Prediction with the Focus on Dual-Phase
Steel Microstructures
Ali Cheloee Darabi 1, Shima Rastgordani 1, Mohammadreza Khoshbin 2, Vinzenz Guski 1
and Siegfried Schmauder 1, *
1Institute for Materials Testing, Materials Science and Strength of Materials, University of Stuttgart,
Pfaffenwaldring 32, 70569 Stuttgart, Germany
2Department of Mechanical Engineering, Shahid Rajaee Teacher Training University, Lavizan,
Tehran 1678815811, Iran
*Correspondence: siegfried.schmauder@imwf.uni-stuttgart.de
Abstract:
A comprehensive approach to understand the mechanical behavior of materials involves
costly and time-consuming experiments. Recent advances in machine learning and in the field of
computational material science could significantly reduce the need for experiments by enabling the
prediction of a material’s mechanical behavior. In this paper, a reliable data pipeline consisting of
experimentally validated phase field simulations and finite element analysis was created to generate a
dataset of dual-phase steel microstructures and mechanical behaviors under different heat treatment
conditions. Afterwards, a deep learning-based method was presented, which was the hybridization of
two well-known transfer-learning approaches, ResNet50 and VGG16. Hyper parameter optimization
(HPO) and fine-tuning were also implemented to train and boost both methods for the hybrid
network. By fusing the hybrid model and the feature extractor, the dual-phase steels’ yield stress,
ultimate stress, and fracture strain under new treatment conditions were predicted with an error of
less than 1%.
Keywords:
deep learning; material properties; dual-phase steel; micromechanical modeling; phase
field simulation
1. Introduction
Dual-phase (DP) steels are a family of high-strength low-alloy steels that exhibit high
strength and good formability. They have, therefore, found extensive use in the automotive
industry [
1
]. Their promising properties can be attributed to their microstructure, which
consists of hard martensite islands and a soft ferrite matrix. This microstructure leads to
high formability, continuous yielding behavior, high strength, high strain hardening rate,
and low yield stress-to-tensile strength ratio [2].
One of the fundamental objectives of materials science and engineering is the devel-
opment of reliable working models which connect process parameters, microstructures,
and material properties. Many models have been developed for analyzing each individual
domain. For example, phase field modeling (PFM) can simulate the phase transformations
during heat treatment [
3
,
4
], and finite element analysis (FEA), can be used to obtain the
mechanical response of a microstructure [
5
]. These have also been combined [
6
,
7
]. This gen-
erally takes the form of a PFM analysis obtaining a representative volume element (RVE) of
a multiphase material that has undergone heat treatment and the resulting microstructure
being loaded with specific boundary conditions to obtain its fracture stress using FEA.
These models have an inherent deficiency that they only work on a limited part of the
problem and connecting all the effects can be very challenging. Furthermore, they can only
be used to analyze a particular configuration after it has been conceived. They do not have
Materials 2023,16, 447. https://doi.org/10.3390/ma16010447 https://www.mdpi.com/journal/materials
Materials 2023,16, 447 2 of 22
any predictive power and must be run many times to obtain a suitable model. Currently,
using these approaches for designing new materials is very costly, time-consuming, and
requires substantial lab work.
These problems can be avoided by assigning the modern advancements of machine
learning methods [
8
]. Machine learning and deep learning, and especially their subcate-
gories, such as artificial neural networks (ANN) and convolutional neural networks (CNN),
are being introduced in materials science and engineering because they can accelerate
the processes and, in some cases, reduce the required physical experiments [
9
–
11
]. These
models can also automate different layers of material characteristic investigations [
12
].
Different scaled microstructure studies, from macro and continuum levels to the atomic
and micro scales, could benefit from the recent developments in ANN techniques [
13
,
14
].
Additionally, methods such as phase field modeling could assist the researchers in 2D and
3D simulations, enhancing the dataset for further steps to an ANN model [
15
,
16
]. These
new tools make the final aim of tailoring the material features achievable and within reach.
The classic paradigm of microstructural behavior studies needs to be revised. Recent
material informatics developments could magnify machine learning approaches’ vital role
in quantitative microstructural subjects [
17
–
19
]. Thus, the need to expand the knowledge
of neural network applications in materials science and engineering is evident. In the last
decade, various methods have been implemented to predict the characteristics of different
materials [20].
This work represents a timely, advanced computational methodology with a wide
range of implications to help the materials community [
21
]. The novelty of this work
is twofold: we use validate and utilize simulations of heat treatment to generate mi-
crostructures, which reduces the cost associated with creating a machine learning dataset.
Additionally, we introduce a hybrid machine learning model and apply it to a materials
science problem. In the first step of this study, since having an extensive data set for
training is the prerequisite of a deep neural network, about 1000 images were generated
with a phase field model. About 10 percent of the whole data set was randomly chosen
for the testing set. For this study, different models, including simple CNN and transfer
learning methods, were investigated and two algorithms with faster optimization behavior,
VGG16 [
22
] and ResNet [
23
], were paralleled and named “Hybrid Model”. Not every
model showed promising results regarding the prediction of tensile stress and fracture
strain. However, with an error of less than 1% for the prediction of ultimate stress and yield
stress for the testing data, and about 0.5% for the training set, this model could respond
ideally. This fast and accurate technique could be applied to different alloy data sets, giving
scientists a better overview of the metal characteristics.
2. Data Generation
2.1. Overview
In this study, a large number of phase field (PF) heat treatment simulations were
performed to generate artificial DP steel microstructures. These microstructures were then
analyzed using finite element analysis (FEA) to obtain the mechanical response of those
steels. Consequently, a dataset containing process parameters, resulting microstructure,
and mechanical properties was created, which was then used in Section 3to train a machine
learning system. A high-level illustration of the process is shown in Figure 1.
The following sections describe and validate the PF and FEA models and then explain
how the two data pipelines work together to create the final dataset.
Materials 2023,16, 447 3 of 22
Materials 2023, 15, x FOR PEER REVIEW 3 of 22
Figure 1. Workflow of the microstructure data generation with different heat treatment conditions.
The following sections describe and validate the PF and FEA models and then ex-
plain how the two data pipelines work together to create the final dataset.
2.2. Multiphase Field Simulation
2.2.1. Basic Theory
The phase field equation was implemented by Steinbach et al. [24,25] to predict mi-
crostructure evolution. In this approach, a phase field parameter φ is defined for each
phase, which changes between 0 and 1 during the process. Parameter 𝜑 indicates the
local fraction of phase (α) in a grain, which means the sum of the local fraction of phases
is equal 1 (∑𝜑= 1). In this paper, MICRESS® software, version 7, was used for the phase
field simulation, and the rate of parameter 𝜑 during the process is shown as Equation (1)
[26]:
φ
=
∑
M
[
b
∆
G
−
σ
K
+
A
+
∑
j
]
, (1)
where the parameters 𝛼,𝛽, and 𝛾 show the different phases, and n is the number of
phases in the simulation. Parameter 𝑀
, given as Equation (2), is related to the interface
mobility between phases 𝛼 and 𝛽, which is a function of the kinetic coefficient in the
Gibbs–Thomson equation:
M
=
∆
∑
∑
[
(
)
]
, (2)
where 𝜂 and ∆𝑠 are the thickness of the interface and entropy of fusion between the
phases, respectively. Additionally, the parameters 𝑚
and 𝐷
represent the liquidus
line slop for component 𝑖 and the diffusion matrix, respectively, and 𝑘 is related to the
partition coefficient.
The expression inside the brackets represents the required force for moving the in-
terface between phases 𝛼 and 𝛽. Parameter 𝑏 is a pre-factor and is calculated using
Equation (3). The parameters ∆𝐺 and 𝐾 show the difference in Gibbs energy and
pairwise curvature between the two phases, as indicated in Equations (4) and (5), respec-
tively. 𝐽 is related to the triple junction between three phases through Equation (6):
b
=
(
φ
+
φ
)
φ
φ
, (3
)
∆
G
=
(
μ
−
μ
)
,
(4)
Figure 1. Workflow of the microstructure data generation with different heat treatment conditions.
2.2. Multiphase Field Simulation
2.2.1. Basic Theory
The phase field equation was implemented by Steinbach et al. [
24
,
25
] to predict
microstructure evolution. In this approach, a phase field parameter
ϕ
is defined for each
phase, which changes between 0 and 1 during the process. Parameter
ϕα
indicates the local
fraction of phase (
α
) in a grain, which means the sum of the local fraction of phases is equal
1 (
∑ϕα=
1). In this paper, MICRESS
®
software, version 7, was used for the phase field
simulation, and the rate of parameter ϕduring the process is shown as Equation (1) [26]:
.
ϕα=∑n
α6=βMϕ
αβ[b∆Gαβ −σαβ (Kαβ +Aαβ )+∑υ
α6=β6=γjαβγ], (1)
where the parameters
α
,
β
, and
γ
show the different phases, and n is the number of phases
in the simulation. Parameter
Mϕ
αβ
, given as Equation (2), is related to the interface mobility
between phases
α
and
β
, which is a function of the kinetic coefficient in the Gibbs–Thomson
equation:
Mϕ
αβ =µG
αβ
1+µG
αβη∆sαβ
8∑iml
i∑i[Dij
α−11−kjcjα], (2)
where
η
and
∆sαβ
are the thickness of the interface and entropy of fusion between the
phases, respectively. Additionally, the parameters
ml
i
and
Dij
α
represent the liquidus line
slop for component
i
and the diffusion matrix, respectively, and
kj
is related to the partition
coefficient.
The expression inside the brackets represents the required force for moving the inter-
face between phases
α
and
β
. Parameter
b
is a pre-factor and is calculated using Equation (3).
The parameters
∆Gαβ
and
Kαβ
show the difference in Gibbs energy and pairwise curvature
between the two phases, as indicated in Equations (4) and (5), respectively.
Jαβγ
is related
to the triple junction between three phases through Equation (6):
b=π
η(ϕα+ϕβ)√ϕαϕβ, (3)
∆Gαβ =1
νmµ0
β−µ0
α, (4)
Kαβ =π2
2η2(ϕβ−ϕα)+1
2∇2ϕβ−∇2ϕα, (5)
Materials 2023,16, 447 4 of 22
Jαβγ =1
2(σβγ −σαγ)π2
η2ϕγ+∇2ϕγ. (6)
2.2.2. Validation of PF Simulations
Before using the PF model for generating microstructures under different heat treat-
ment conditions, the model’s accuracy for simulating the basic heat treatment must be
validated against experiments. Here, the step quenching heat treatment process routine for
the production of DP steel from low carbon steel, shown in Figure 2, is simulated using
phase field simulation in MICRESS software. Afterwards, the same heat treatment proce-
dure is also carried out experimentally, and the resulting microstructures are compared.
Materials 2023, 15, x FOR PEER REVIEW 4 of 22
K
=
φ
−
φ
+
(
∇
φ
−
∇
φ
)
,
(5)
J
=
(
σ
−
σ
)
(
φ
+
∇
φ
)
.
(6)
2.2.2. Validation of PF Simulations
Before using the PF model for generating microstructures under different heat treat-
ment conditions, the model’s accuracy for simulating the basic heat treatment must be
validated against experiments. Here, the step quenching heat treatment process routine
for the production of DP steel from low carbon steel, shown in Figure 2, is simulated using
phase field simulation in MICRESS software. Afterwards, the same heat treatment proce-
dure is also carried out experimentally, and the resulting microstructures are compared.
Figure 2. Schematic view of the step-quenching heat treatment process routine.
The base material used in the PF simulations was a ferritic–pearlitic steel with the
chemical composition given in Table 1. To reduce computational costs, the heat treatment
simulations started from the fully austenitic microstructure and the morphology for this
state was calculated using MatCalc software, version 6.03 [27]. Afterwards, the step
quenching heat treatment was simulated, resulting in the formation of ferrite and marten-
site phases. It was assumed that the remaining austenite phase is wholly transformed into
martensite below the martensite temperature. Additionally, based on the chemical com-
position given in Table 1 and using the equation in the study [28], the martensite starting
temperature (𝑀) was calculated to be 417.34 °C. For this particular heat treatment based
on the step quenching shown in Figure 2, first the fully austenitic microstructure was
cooled from 1100 °C to 770 °C, then held for 10 min, and finally quenched in water to room
temperature.
Table 1. Chemical composition of the low-carbon steel used for validating the PF model.
Element
C Mn Si P S Cr Mo V Cu Co
wt% 0.2 1.1 0.22 0.004 0.02 0.157 0.04 0.008 0.121 0.019
In this study, a binary phase diagram was implemented for the simulation. Table 2
provides information on carbon and magnesium concentration and proportionality driv-
ing pressure (𝐿) at 𝑇, which were calculated using MatCalc. Some other phase interac-
tion properties, such as interface kinetic coefficient and interface mobility, were extracted
from the literature and are shown in Table 3. For carbon’s diffusion properties, the maxi-
mal diffusion coefficient ( 𝐷) in ferrite and austenite were set to 2.20×10 and
0.15×10 ms
; and the activation energy for diffusion ( 𝑄) were set to 122.5 and
Figure 2. Schematic view of the step-quenching heat treatment process routine.
The base material used in the PF simulations was a ferritic–pearlitic steel with the
chemical composition given in Table 1. To reduce computational costs, the heat treatment
simulations started from the fully austenitic microstructure and the morphology for this
state was calculated using MatCalc software, version 6.03 [
27
]. Afterwards, the step quench-
ing heat treatment was simulated, resulting in the formation of ferrite and martensite
phases. It was assumed that the remaining austenite phase is wholly transformed into
martensite below the martensite temperature. Additionally, based on the chemical com-
position given in Table 1and using the equation in the study [
28
], the martensite starting
temperature (
Ms
) was calculated to be 417.34
◦
C. For this particular heat treatment based
on the step quenching shown in Figure 2, first the fully austenitic microstructure was
cooled from 1100
◦
C to 770
◦
C, then held for 10 min, and finally quenched in water to room
temperature.
Table 1. Chemical composition of the low-carbon steel used for validating the PF model.
Element C Mn Si P S Cr Mo V Cu Co
wt% 0.2 1.1 0.22 0.004 0.02 0.157 0.04 0.008 0.121 0.019
In this study, a binary phase diagram was implemented for the simulation. Table 2
provides information on carbon and magnesium concentration and proportionality driving
pressure (
Lij
) at
T1
, which were calculated using MatCalc. Some other phase interaction
properties, such as interface kinetic coefficient and interface mobility, were extracted from
the literature and are shown in Table 3. For carbon’s diffusion properties, the maximal diffu-
sion coefficient (
D0
) in ferrite and austenite were set to 2.20
×
10
−4
and 0.15
×
10
−4m2/s
; and
the activation energy for diffusion (
Q
) were set to 122.5 and 142.1
KJ/mol
, respectively
[29–31]
.
The diffusion of magnesium was ignored in this study and the “phase concentration” model
in MICRESS and periodic boundary conditions (named PPPP in MICRESS) were used.
Figure 3a–e illustrates the sample progression of the heat treatment.
Materials 2023,16, 447 5 of 22
Materials 2023, 15, x FOR PEER REVIEW 5 of 22
142.1KJ mol
, respectively [29–31]. The diffusion of magnesium was ignored in this study
and the “phase concentration” model in MICRESS and periodic boundary conditions
(named PPPP in MICRESS) were used. Figure 3a–e illustrates the sample progression of
the heat treatment.
(a) (b)
(c) (d)
(e) (f)
Figure 3. Progression of the results of the PF simulation: (a) initial state, (b) 15 s, (c) 1 min, (d) 10
min, and (e) after quenching; and (f) SEM image of a sample undergoing the same heat treatment.
Figure 3.
Progression of the results of the PF simulation: (
a
) initial state, (
b
) 15 s, (
c
) 1 min, (
d
) 10 min,
and (e) after quenching; and (f) SEM image of a sample undergoing the same heat treatment.
Materials 2023,16, 447 6 of 22
Table 2. Linearized data for the phase diagram at T1=1043.
Phase Boundary α/γ+α γ/α+γ
Carbon (Cij )Concentration (wt%) 0.0048 0.365
Slope (◦K/wt%) −13,972.00 −188.80
Manganese (Mnij )Concentration (wt%) 1.58 3.78
Slope (◦K/wt%) −100.03 −23.55
LijJ cm−30.238
Table 3. Interfacial parameters between ferrite (α) and austenite (γ) [3,32].
Interface α/α α/γ γ/γ
Interfacial energy (J cm−2)7.60 ×10−57.20 ×10−57.60 ×10−5
Mobility (cm4J−1s−1)5.00 ×10−62.40 ×10−43.50 ×10−6
The only output taken from the PF models for the FEA is the final microstructure
geometry. This means that to validate the PF models, it is only necessary to make sure
they predict martensite volume fraction, average phase size, and morphology (banded or
equiaxed) correctly. Figure 3e,f shows the simulated and experimental microstructures
resulting from the described heat treatment. There is a good agreement between the results,
as both microstructures have the same martensite volume fraction (34%), average phase
size (15
µ
m) and morphology (equiaxed). This means that the utilized multiphase model
can accurately predict the experimental results. Therefore, this validated model is used
for simulating the final microstructure after undergoing heat treatment under different
conditions.
2.3. FEM Simulation
2.3.1. FEA Parameters
This section describes the process of creating, analyzing, and validating microme-
chanical FEA models based on the PF simulations. After a microstructure is generated
using PFM, it can be used as a representative volume element (RVE). The parameters for a
single simulation are explained here, and the next section explains how a large number of
simulations is performed.
The material properties of the ferrite and martensite phases are essential factors to
consider. It is well known that they change with the process parameters [
33
,
34
], but to
simplify the process, the flow curves were taken from DP600 steel, as shown in Figure 4,
which was reported in a previous study [
2
]. Damage in the martensite phase was ignored,
and the Johnson–Cook damage model was used for the ferrite phase. Since the test was
executed at room temperature with constant strain rates,
D4
and
D5
were ignored and local
fracture strain under uniaxial loading was predicted by [
2
] to be 0.4. Finally,
D1
,
D2
, and
D3
were found to be 0.17, 0.80, and
−
0.7, respectively. Darabi et al. [
35
] showed that there is no
difference in stress–strain curves of RVEs loaded under periodic and symmetric boundary
conditions. Therefore, symmetric (linear displacement) boundary conditions were applied
to the RVE.
Materials 2023,16, 447 7 of 22
Materials 2023, 15, x FOR PEER REVIEW 7 of 22
Figure 4. Flow curves of the ferrite and martensite phases used in micromechanical FEA [2].
2.3.2. Validation of the FE Simulation
The main outputs of the analysis were the yield strength, UTS, and fracture points.
To obtain the mentioned properties, the model’s equivalent plastic strain and von Mises
stress were homogenized using the method described in our previous work [2] to obtain
the stress–strain curve. Afterward, the model’s Young’s modulus was calculated based on
the 2% offset method, and finally, the yield strength, UTS, and fracture points were found
based on the curve.
Table 4 compares the experimental and numerical results, showing that the numeri-
cal model can predict the mechanical behavior of the simulated microstructure. Therefore,
this micromechanical model can predict the mechanical behavior of microstructures gen-
erated using PF simulations.
Table 4. Comparison of mechanical behavior and experimental results.
Yield Stress (MPa) Ultimate Stress (MPa)
Fracture Strain (−)
Numerical 314.36 517.9 0.127
Experimental 323.7 530.1 0.131
2.4. Data Pipelines
The main goal of PFM and FEA is to generate a large amount of reliable data for
training and testing the machine learning models. Since the model parameters are deter-
mined and the validity of the models is examined, we can automate the process for each
of the PFM and FEA data pipelines and connect them together to create the full dataset.
2.4.1. PFM Data Pipeline
The PFM parameters are based on the various points in Figure 2. Table 5 shows them
with a short description and selected values. There is a total of four variable heat treatment
parameters, which result in 1188 different data points. To automate such a large number
of PFM analyses, a base MICRESS® driving (.dri) file was created and extensively tested.
Afterwards, scripts were written that changed the parameters and saved new .dri files.
Additionally, the PFM process was divided into two steps to reduce computational time.
The first step was heat treatment, which was until the time 𝑡 in Figure 2 was reached,
and the second step restarted the PFM analysis and quenched the microstructure. This
procedure greatly reduces computational time because, although the second step had to
Figure 4. Flow curves of the ferrite and martensite phases used in micromechanical FEA [2].
2.3.2. Validation of the FE Simulation
The main outputs of the analysis were the yield strength, UTS, and fracture points.
To obtain the mentioned properties, the model’s equivalent plastic strain and von Mises
stress were homogenized using the method described in our previous work [
2
] to obtain
the stress–strain curve. Afterward, the model’s Young’s modulus was calculated based on
the 2% offset method, and finally, the yield strength, UTS, and fracture points were found
based on the curve.
Table 4compares the experimental and numerical results, showing that the numerical
model can predict the mechanical behavior of the simulated microstructure. Therefore, this
micromechanical model can predict the mechanical behavior of microstructures generated
using PF simulations.
Table 4. Comparison of mechanical behavior and experimental results.
Yield Stress (MPa) Ultimate Stress (MPa) Fracture Strain (−)
Numerical 314.36 517.9 0.127
Experimental 323.7 530.1 0.131
2.4. Data Pipelines
The main goal of PFM and FEA is to generate a large amount of reliable data for train-
ing and testing the machine learning models. Since the model parameters are determined
and the validity of the models is examined, we can automate the process for each of the
PFM and FEA data pipelines and connect them together to create the full dataset.
2.4.1. PFM Data Pipeline
The PFM parameters are based on the various points in Figure 2. Table 5shows them
with a short description and selected values. There is a total of four variable heat treatment
parameters, which result in 1188 different data points. To automate such a large number
of PFM analyses, a base MICRESS
®
driving (.dri) file was created and extensively tested.
Afterwards, scripts were written that changed the parameters and saved new .dri files.
Additionally, the PFM process was divided into two steps to reduce computational time.
The first step was heat treatment, which was until the time
t2
in Figure 2was reached,
and the second step restarted the PFM analysis and quenched the microstructure. This
procedure greatly reduces computational time because, although the second step had to
be performed 1188 times, the first step was performed only 396 times. In the end, the
Materials 2023,16, 447 8 of 22
microstructures were saved as VTK files, which were used as input for creating FEA models
in Section 2.4.2. They were also saved as images to be directly used by the machine learning
model. The PFM data pipeline is shown in the red section of Figure 1.
Table 5.
Heat treatment parameters and their values. The units for temperatures, times, and cooling
rates are Kelvin, seconds, and K
s, respectively.
Parameter Description Values
T0Initial temperature of the microstructure. 1250
CR01 Cooling rate between points 0 and 1. Not
used directly. −10, −5, −1
t01 Number of seconds it takes to cool down from
point 1 to point 2. Calculated based on CR01
T1Temperature of the microstructure in point 1.
1000, 1010, 1020, 1030, 1040,
1050, 1060, 1070, 1080, 1090,
1100, 1110
t12
Holding time between points 1 and 2 in seconds.
10, 20, 30, 60, 300, 600, 900,
1800, 3600, 7200, 10,800
T2Temperature of the microstructure in point 2. Equal to T1
CR23
Cooling rate between points 2 and 3 based on the
quench media. Not used directly.
Brine =−220
Water =−130
Oil =−50
t23 Number of seconds it takes to cool down from
point 1 to point 2. Calculated based on QM
T3Room temperature. 298
2.4.2. FEA Data Pipeline
MICRESS
®
PFM software allows output to a number of different file formats. To
enable the easy creation of FEA models, output is requested in The Visualization Toolkit
(VTK) file format, which can be read using readily available software libraries. The output
used for modeling was the “phas” variable, which shows the phase of each element, i.e.,
each element was either ferrite, martensite, or part of the phase interface. Interface elements
were converted to ferrite in subsequent operations.
A Python script was written that extracted the phase distribution from the VTK
file and passed it to the open-source VCAMS library, which created an Abaqus
®
input
file containing the elements with the proper phase labels, as well as linear displacement
boundary conditions. Another script was written for the Abaqus Python environment that
opened the input file and defined the rest of the simulation parameters, such as the material
assignment, etc. described in Section 2.3.1.
The main script then submitted the analysis and passed the final ODB to an Abaqus
Python script that post-processed the results. This included homogenization of stress and
strains using the method described in Ref. [
2
] to obtain the stress–strain curve, determine
the elastic modulus based on the 2% offset method, and find the yield strength, UTS, in
addition to fracture strains. These were then written to a file that mapped each model with
its output. Pictures of the microstructure and the stress–strain curve were also saved so
they can be audited if necessary. The FEA data pipeline is illustrated in the blue section of
Figure 1.
3. Deep Learning Approaches
3.1. Introduction and Overview
Inspired by brain biology, artificial neural networks (ANNs) allow for the modeling
of complex patterns. Nowadays, various methods have also been applied to compare the
different performances of AN networks in computational mechanics [
36
–
38
]. The attention
Materials 2023,16, 447 9 of 22
paid to ANN in recent years has led to the flourishing of methods such as transfer learning,
which allows for loading new datasets onto pre-trained models, greatly reducing the effort
required for training the neural network [39].
In order to design safe and functional parts, we need information about the material’s
mechanical properties, such as ultimate stress (UTS), stiffness (E), yield stress (Y), fracture
strain (F), elongation, fatigue life, etc. The same properties are also expected when we are
designing a new material. Naturally, experimental tests are the gold standard for obtaining
this information, but they are costly and time-consuming. The field of material informatics
presents an excellent alternative, offering to learn and then predict these properties based
on suitable datasets. It is worth mentioning that when mechanical properties are discussed,
researchers are dealing with numeric values, leading us to see the mechanical features
prediction more as a regression problem. In recent years, applying machine learning
approaches to predict material behavior has attracted great attention [40–44].
The prerequisite of material informatics is a trustworthy dataset used as the input for
the next steps, such as the one that has been thoroughly explained in the previous sections.
This dataset can then be used for quantitative predictions. The next step is identifying
features and labels in the dataset. In the context of machine learning, feature refers to
the parameters used as the input and label refers to the output corresponding to a set of
features [
45
]. In neural networks, both features and labels must be numeric, meaning that
even images are represented by numbers. The act of mapping specific features to labels
is called learning, and the choice of how to map these relationships opens the door to
learning algorithms [
36
]. This paper aims to predict three mechanical properties of DP
steels, namely UTS, Y, and F, based on PFM-generated microstructures, making them the
labels and the feature, respectively.
This research tries to train a hybrid deep-learning model for predicting these me-
chanical properties based on 1188 PFM-generated microstructure images of DP steel. An
overview of the deep learning model is as follows. In this study, the input parameters
defined as “labels”, used to train a network for prediction of mechanical properties, are ul-
timate stress, yield stress and fracture strain for each microstructure. After a deep research
study on different transfer learning architectures such as LeNet, Xception, and Incep-
tionv3 [
22
] for the material informatics, and having in mind the resemblance of medical
images to microstructure images [
46
], two transfer learning models ResNet50 and VGG16
were trained, and their output was used independently in conjunction with microstructure
images to perform deep feature extraction. The Adam optimizer has been implemented as
one of the best adaptive gradient-based methods while discussing the optimization func-
tions with the stochastic objective [
47
]. In order to use it for future estimations, this method
saves an exponentially decaying average of previously squared gradients [
20
]. What makes
the Adam optimizer remarkable is the ability to keep the momentum of previous gradients,
resulting in a better estimation of the following behavior orders [
48
]. In addition, it is
worth mentioning that Adam’s adaptability to different learning rates is superior and its
stability in the convergence process cannot be disputed. This resulted in two feature vectors
for each microstructure image, which were then merged to form a stacked feature matrix,
which was finally used as the input for the Adaboost and Random Forest (RF) algorithms.
Figure 5illustrates this hybrid deep learning model.
All implementations were performed in Python via the Google Colaboratory platform
utilizing an NVIDIA Tesla K80. The Keras, Tensorflow, and SKlearn packages were used to
build the final deep network. Training the whole grid with the feature extraction section
takes about 2 h, and with access to more advanced hardware and clusters, this could
decrease to below one hour.
Materials 2023,16, 447 10 of 22
Materials 2023, 15, x FOR PEER REVIEW 10 of 22
Figure 5. General framework of the hybrid model.
All implementations were performed in Python via the Google Colaboratory plat-
form utilizing an NVIDIA Tesla K80. The Keras, Tensorflow, and SKlearn packages were
used to build the final deep network. Training the whole grid with the feature extraction
section takes about 2 h, and with access to more advanced hardware and clusters, this
could decrease to below one hour.
3.2. VGG16
The model was proposed by Andrew Zisserman and Karen Simonyan at Oxford Vis-
ual Geometry Group [49]. Compared to most convolutional neural networks (CNN), the
network is simple and works with a simple 3 × 3 stacked layer. VGG is a promising CNN
model based on an ImageNet dataset that is trained by over 14 million images to nearly
22,000 categories. To train the model, all images were downsized to 256 × 256. RGB images
with a size of 224 × 224 were the inputs for the VGG16 model. Then, the convolutional
layers were applied to the images. The whole setup can differ, although the stride, pad-
ding, and down-sampling layers can be distinguished. Five max-pooling layers were ap-
plied following some of the CNN layers in the very first architecture of the model [50].
The last layer was also equipped with a soft-max layer. An optimum batch size of 16 was
selected for the model. It is worth mentioning that the Rectifier Linear Unit (ReLU) func-
tion was used in all hidden layers, as depicted below. Other activation functions were also
considered. Since we only deal with positive values, ReLU showed the best performance
in the case of speed of the convergence, following the mathematical formula in Equation
(7) [51]:
𝑅
′
(
𝑧
)
=
𝑧
𝑧
≥
0
0
𝑧
<
0
. (7)
3.3. ResNet50 (Deep Residual Learning)
Another transfer learning architecture used in the current study was initially de-
signed due to the problematic subject of losing accuracy by adding more layers. The model
Figure 5. General framework of the hybrid model.
3.2. VGG16
The model was proposed by Andrew Zisserman and Karen Simonyan at Oxford
Visual Geometry Group [
49
]. Compared to most convolutional neural networks (CNN),
the network is simple and works with a simple 3
×
3 stacked layer. VGG is a promising
CNN model based on an ImageNet dataset that is trained by over 14 million images to
nearly 22,000 categories. To train the model, all images were downsized to 256
×
256.
RGB images with a size of 224
×
224 were the inputs for the VGG16 model. Then, the
convolutional layers were applied to the images. The whole setup can differ, although
the stride, padding, and down-sampling layers can be distinguished. Five max-pooling
layers were applied following some of the CNN layers in the very first architecture of the
model [
50
]. The last layer was also equipped with a soft-max layer. An optimum batch
size of 16 was selected for the model. It is worth mentioning that the Rectifier Linear
Unit (ReLU) function was used in all hidden layers, as depicted below. Other activation
functions were also considered. Since we only deal with positive values, ReLU showed
the best performance in the case of speed of the convergence, following the mathematical
formula in Equation (7) [51]:
R0(z) = z z ≥0
0z<0. (7)
3.3. ResNet50 (Deep Residual Learning)
Another transfer learning architecture used in the current study was initially designed
due to the problematic subject of losing accuracy by adding more layers. The model is
called ResNet since it deals with residual learning. The model’s algorithm could justify
the superb performance of ResNet in that, instead of modeling the intermediate output,
it tries to model the residual of the output of each layer [
50
]. ResNet50, as the structure
has been displayed in Figure 6, is the enhanced model with 48 CNN layers, a max pool,
and an average pool. Similar to the VGG16 model, all layers are passed through a ReLU
activation function. What matters here is the shortcut connections that skip every three
Materials 2023,16, 447 11 of 22
layers in the new ResNet50 model. In comparison to the classic ResNet, every two layers
are removed [
52
], which means that each block of 2 CNN layers in a network of 34 layers
was replaced with a bottleneck block of 3 layers. Despite the fact that the ResNet model
could be time-consuming, it showed promising performance on the microstructure images.
Materials 2023, 15, x FOR PEER REVIEW 11 of 22
is called ResNet since it deals with residual learning. The model’s algorithm could justify
the superb performance of ResNet in that, instead of modeling the intermediate output, it
tries to model the residual of the output of each layer [50]. ResNet50, as the structure has
been displayed in Figure 6, is the enhanced model with 48 CNN layers, a max pool, and
an average pool. Similar to the VGG16 model, all layers are passed through a ReLU acti-
vation function. What matters here is the shortcut connections that skip every three layers
in the new ResNet50 model. In comparison to the classic ResNet, every two layers are
removed [52], which means that each block of 2 CNN layers in a network of 34 layers was
replaced with a bottleneck block of 3 layers. Despite the fact that the ResNet model could
be time-consuming, it showed promising performance on the microstructure images.
Figure 6. (a) ResNet50 model vs. (b) ResNet classic model.
3.4. Study of Hyper Parameters
Several hyper parameters have different effects on the network, such as learning rate,
the number of nodes, dense layers, batch size, or even iteration number. To enhance each
model, a set of three hyper parameters, as listed below, is optimized in a deep network
with the help of a Keras Tuner, and it will be fixed for the other trials. Until tuning all
hyper parameters in a model, this process will go on. The effect of the learning rate, dense
layers, and the number of nodes has been investigated and will be discussed in the next
section.
The global optimization framework, called Bayesian Optimization [10], is applied to
select optimal values. The posterior distribution of this function provides insights into the
reliability of the function’s values in the hyper parameter space [53]. With the previously
tested values of each iteration, this function tries to take advantage of the variance effect
of every defined hyper parameter.
Building a search space [54] for each effective parameter is the main idea behind the
Bayesian formulation. With the help of a Keras Tuner in this study, how the performance
varies could be detected with the alteration of the values of each hyper parameter. Before
applying an automated approach for tuning, a manual grid search was also investigated.
Since the process was costly time- and budget-wise, the Keras tuner was a better alterna-
tive with the possibility of creating a search space. The same values for three categories of
hyper parameters were considered for both models.
3.4.1. Learning Rate (lr)
The learning rate is among the top three in the list of significant hyper parameters in
stochastic gradient descent. This factor controls how much the model alters every time
weights are updated according to the calculated error of each iteration [55]. Higher
Figure 6. (a) ResNet50 model vs. (b) ResNet classic model.
3.4. Study of Hyper Parameters
Several hyper parameters have different effects on the network, such as learning rate,
the number of nodes, dense layers, batch size, or even iteration number. To enhance each
model, a set of three hyper parameters, as listed below, is optimized in a deep network
with the help of a Keras Tuner, and it will be fixed for the other trials. Until tuning all hyper
parameters in a model, this process will go on. The effect of the learning rate, dense layers,
and the number of nodes has been investigated and will be discussed in the next section.
The global optimization framework, called Bayesian Optimization [
10
], is applied to
select optimal values. The posterior distribution of this function provides insights into the
reliability of the function’s values in the hyper parameter space [
53
]. With the previously
tested values of each iteration, this function tries to take advantage of the variance effect of
every defined hyper parameter.
Building a search space [
54
] for each effective parameter is the main idea behind the
Bayesian formulation. With the help of a Keras Tuner in this study, how the performance
varies could be detected with the alteration of the values of each hyper parameter. Before
applying an automated approach for tuning, a manual grid search was also investigated.
Since the process was costly time- and budget-wise, the Keras tuner was a better alternative
with the possibility of creating a search space. The same values for three categories of hyper
parameters were considered for both models.
3.4.1. Learning Rate (lr)
The learning rate is among the top three in the list of significant hyper parameters
in stochastic gradient descent. This factor controls how much the model alters every
time weights are updated according to the calculated error of each iteration [
55
]. Higher
learning rates were chosen to accelerate the training at the initial step. Then, lower amounts
were applied to avoid any sudden cross-domain fluctuations, especially at the optimal
neighborhood. The quantities of
lr =(1e−2.1e−3.1e−4.1e−5)
were the selected values
to test the performance of each model, and the best performance was detected with the
implementation of an optimizer. This will be discussed in the Results and Discussion
section.
3.4.2. Dense Layers
The most common layer in the ANNs is the dense layer, where the multiplication of
matrix vectors occurs. One, two, and three layers were implemented for both VGG16 and
ResNet50 networks. Dense units, defined as the output size of each dense layer, were also
Materials 2023,16, 447 12 of 22
considered as a hyper parameter. All three models were tested by the change of dense unit
numbers. For this study, the range of 16 to 2048 with a step of 32 was considered for tuning
the units of the dense layer for both models. The results will be reported in the next part.
While discussing the effect of dense layers on the network, the number of layers was also
studied. One to three layers were simulated, the most common number of dense layers [
56
]
as one of the most influential parameters in the whole network. The ReLU function for
the activation function, which plays the role of neuron transformation for each layer, was
designated.
3.4.3. Regression Ensemble Learning Method
Keeping in mind that the regression part of the model could also be a turning point in
the simulation, two main methods based on the decision tree algorithm were nominated
for the learning method in the last part of the model to predict the mechanical properties.
Adaboost and Random Forest architectures are illustrated in Figure 7. In the first method,
which is quite a famous method called Random Forest (RF), every decision tree for the
optimal splits takes a random volume of features according to the bagging method, meaning
that each tree is trained individually with a random subset of data but with equal weights.
However, in the following method, which we are focused on, called Adaboost, each tree
takes its own weight by analyzing the mistakes of the previous one and increases the weight
of misclassified data points, which is called the boosting method. The ordering of the trees
in the Adaboost method could affect the subsequent layers, although each tree performs
independently with RF. The algorithms of both models are sketched in the demo below.
Materials 2023, 15, x FOR PEER REVIEW 12 of 22
learning rates were chosen to accelerate the training at the initial step. Then, lower
amounts were applied to avoid any sudden cross-domain fluctuations, especially at the
optimal neighborhood. The quantities of 𝑙𝑟 = (1𝑒 − 2. 1𝑒 − 3 .1𝑒 − 4. 1𝑒 − 5) were the se-
lected values to test the performance of each model, and the best performance was de-
tected with the implementati