ArticlePDF Available

Prediction for Manufacturing Factors in a Steel Plate Rolling Smart Factory Using Data Clustering-Based Machine Learning

Authors:
  • TorMagnus Pharmaceuticals

Abstract and Figures

A Steel Plate Rolling Mill (SPM) is a milling machine that uses rollers to press hot slab inputs to produce ferrous or non-ferrous metal plates. To produce high-quality steel plates, it is important to precisely detect and sense values of manufacturing factors including plate thickness and roll force in each rolling pass. For example, the estimation or prediction of the in-process thickness is utilized to select the control values (e.g., roll gap) in the next pass of rolling. However, adverse manufacturing conditions can interfere with accurate detection for such manufacturing factors. Although the state-of-theart gamma-ray camera can be used for measuring the thickness, the outputs from it are influenced by adverse manufacturing conditions such as the high temperature of plates, followed by the evaporation of lubricant water. Thus, it is inevitable that there is noise in the thickness estimation. Furthermore, installing such thickness measurements for each passing step is costly. The precision of the thickness estimation, therefore, significantly affects the cost and quality of the final product. In this paper, we present machine learning (ML) technologies and models that can be used to predict the in-process thickness in the SPM operation, so that the measurement cost for the in-process thickness can be significantly reduced and high-quality steel plate production can be possible. To do so, we investigate most-known technologies in this application. In particular, Data Clustering based Machine Learning (DC-ML), combining clustering algorithms and supervised learning algorithms, is introduced. To evaluate DC-ML, two experiments are conducted and show that DC-ML is well suited to the prediction problems in the SPM operation. In addition, the source code of DC-ML is provided for the future study of machine learning researchers.
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Prediction for Manufacturing Factors in a
Steel Plate Rolling Smart Factory using
Data Clustering based Machine Learning
CHEOL YOUNG PARK1, (Member, IEEE), JIN WOOG KIM2, BOSUNG KIM3, and
JOONGYOON LEE4
1Bayesian AI Lab, BAIES, Fairfax, VA, USA (e-mail: cparkf@gmu.edu)
2Deep Learning Lab, DEEP-IN, Gangnam-gu, Seoul, South Korea (e-mail: jeenwook.kim@gmail.com)
3POSCO, Pohang-si, Gyeongsangbuk-do, South Korea (e-mail: kbs9065@posco.com)
4GIFT, POSTECH University, Pohang-si, Gyeongbuk, South Korea (e-mail: jlee2012@postech.ac.kr)
Corresponding author: Joong Yoon Lee (e-mail: jlee2012@postech.ac.kr).
This work was supported in part by POSCO under Grant #: 2019Y048.
ABSTRACT A Steel Plate Rolling Mill (SPM) is a milling machine that uses rollers to press hot slab
inputs to produce ferrous or non-ferrous metal plates. To produce high-quality steel plates, it is important
to precisely detect and sense values of manufacturing factors including plate thickness and roll force
in each rolling pass. For example, the estimation or prediction of the in-process thickness is utilized
to select the control values (e.g., roll gap) in the next pass of rolling. However, adverse manufacturing
conditions can interfere with accurate detection for such manufacturing factors. Although the state-of-the-
art gamma-ray camera can be used for measuring the thickness, the outputs from it are influenced by adverse
manufacturing conditions such as the high temperature of plates, followed by the evaporation of lubricant
water. Thus, it is inevitable that there is noise in the thickness estimation. Furthermore, installing such
thickness measurements for each passing step is costly. The precision of the thickness estimation, therefore,
significantly affects the cost and quality of the final product. In this paper, we present machine learning
(ML) technologies and models that can be used to predict the in-process thickness in the SPM operation, so
that the measurement cost for the in-process thickness can be significantly reduced and high-quality steel
plate production can be possible. To do so, we investigate most-known technologies in this application.
In particular, Data Clustering based Machine Learning (DC-ML), combining clustering algorithms and
supervised learning algorithms, is introduced. To evaluate DC-ML, two experiments are conducted and
show that DC-ML is well suited to the prediction problems in the SPM operation. In addition, the source
code of DC-ML is provided for the future study of machine learning researchers.
INDEX TERMS Intelligent manufacturing systems, Machine Learning, Regression analysis, Steel industry,
Thickness control.
I. INTRODUCTION
As the fourth industrial revolution, called Industry 4.0, be-
comes more pervasive, contemporary manufacturing also
becomes smarter using state-of-the-art technologies such as
artificial intelligence, cloud computing, internet of things,
cyber-physical systems, and big data. These technologies
make smart manufacturing [1]–[12] radically feasible. In this
paper, we introduce an application of ML technologies in a
steel plate production smart factory.
In a steel plate factory line, the input of the line is a slab
made by continuous casting of molten steel and the output
of the line is a steel plate. And the steel plate is produced
by a special facility, a Steel Plate Rolling Mill (SPM). The
rolling process is a metal forming process in which a slab is
passed through a set of rolls in order to uniformly reduce
the thickness of the slab by handling the gap of rolls. To
produce high-quality steel plates, it is important to precisely
detect and sense values of manufacturing factors such as
roll gap, roll force, and temperature. However, environmental
factors such as high temperature can hinder accurate value
detection for manufacturing factors (e.g., the thickness of a
steel plate when passing through the SPM). In a steel plate
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 1. Actual Thickness vs Predicted Thickness
factory line, the estimation of the in-process thickness is
utilized to select the control values (e.g., roll gap) for the next
pass. The precision of the thickness estimation, therefore, can
significantly affect the final product. Although a gamma-ray
camera can be used, the outputs from it can be influenced by
adverse manufacturing conditions such as high temperature
of plates and followed evaporation of lubricant water. Thus,
it is inevitable that there is noise in the thickness estimation
using such thickness measuring sensors. Furthermore, the use
of such sensors causes another production cost.
In this paper, we introduce machine learning approaches
predicting manufacturing factors to support existing SPM
control systems. Figure 1 shows one illustrative example of
the problem in this paper. Over the number of rolling passes
(x-axis), the SPM control systems require the high precision
for value estimation of manufacturing factors to produce the
required thickness (y-axis) of the plate in the next rolling
step. To do so, the measured manufacturing data from sensors
are used to predict the next thickness of the plate. For high-
quality production, the prediction for the next thickness (the
dashed line) should be as close as possible to the actual value
(the solid line).
Specifically, this paper introduces four existing machine
learning approaches and one novel machine learning algo-
rithm in order to support the SPM control systems. One of
traditional SPM control systems is Automatic Gauge Control
(AGC) in steel plate production. AGC has been successfully
applied to commercial rolling mill system to select required
control values. Usually, the conventional AGC systems are
based on Proportional Integral Derivative (PID) controller
[13]–[15], a feedback-loop mechanism adjusting control val-
ues to address target values. Such PID controllers are widely
used in industrial control systems (e.g., temperature control,
flow control, pneumatic control, and compressor control),
including AGC systems. For example, Zhang et al. [16] in-
troduced a generalized predictive control algorithm, evolved
from the existing control algorithms for hydraulic AGC. They
used a simulation for evaluation and showed an improved
thickness precision of strips. Karandaev et al. [17] applied a
transfer function to an AGC control model in order to address
the control error of the existing AGC, so that they could
reduce gauge deviations. Zhang and Ding [18] introduced a
strategy of the AGC control to improve the final product qual-
ity. The control limitations of the conventional AGC control
under compound disturbance were addressed by using such a
control strategy that could remove rolling uncertainty in the
AGC operation. However, to develop a PID-based AGC, the
mathematical models are required and designed by subject-
matter experts (SME). Furthermore, designing such math-
ematical models is not practicable, when considering a lot
of manufacturing factors. Consequently, simple PID models
have been developed. Another drawback of PID controllers
is that it is not easy to deal with the complex non-linearity
[19]. To overcome these limitations of the conventional AGC
controllers, self-adjusting AGC systems, developing control
models automatically, have been researched and developed.
For example, Fuzzy Logic [20] based control systems were
presented. The fuzzy control systems have several advantages
such as human understandable model, fast and easy imple-
mentation, ability to deal with non-linearity, and so on. Wang
et al. [21] utilized a fuzzy control system to perform self-
adjusted PID controller in an AGC system. The stimulation
results of the paper showed that the proposed fuzzy system
outperformed conventional PID systems. However, because
such fuzzy systems are based on various domain assumptions
and human interventions, the reasoning results can be inac-
curate. In addition, it is not trivial to design fuzzy rules by
SME (i.e., dependent on the domain knowledge level).
As another example of the self-adjusting AGC systems,
some researchers have focused on the prediction of a roll
force value. One critical factor of designing a conventional
control model of the AGC systems is the roll force. Selecting
a precise roll force value for each rolling process affects the
quality of thickness reduction of a steel plate [22]. In this re-
search domain, Artificial Neural Networks (ANN) were used
to predict the roll force value [23]–[29]. Lee and Choi [23]
applied ANN to roll force prediction. Their results showed
the 30% improvement of the final product quality. Zhang et
al. [24] combined differential evolution with ANN. The pre-
diction error of the proposed approach was less than 5%. Rath
et al. [25] applied ANN for prediction of roll force. They used
a feed forward network as an ANN architecture and a back
propagation algorithm. A conjugate gradient optimization of
the loss function is used for network training. The prediction
accuracy of the trained model was the R-squared value of
about 0.94. Bagheripoor and Bisadi [26] applied ANN and
used the similar feed forward network and back propagation
algorithm. The prediction accuracy of the trained model was
the R-squared value of about 0.979. Wang et al. [27] used an
ANN for the bending force prediction in a hot strip rolling.
They suggested the ANN architecture which was optimized
by a genetic algorithm and Bayesian regulation. The predic-
tion accuracy of the proposed architecture was the R-squared
2VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
value of 0.956. Liu et al. [28] applied a genetic algorithm
(GA), particle swarm optimization algorithm (PSO), and
multiple hidden layer extreme learning machine (MELM)
for their model. They used GA to determine the optimal
number of hidden layers and the optimal number of hidden
nodes. PSO was used to search for the optimal input weights
and biases. Esendag et al. [29] used ANN and conventional
regression models (e.g., Support Vector Machines) to predict
reversible cold rolling process parameters. They reported the
roll force prediction accuracy for the ANN as the R-squared
value of 0.939 and the regression model as the R-squared
value of 0.947.
In this paper, several ML algorithms are used to predict
two main parameters of the SPM control systems. One is
the roll force, because the plate thickness can be directly
calculated by using the roll force value. Another is the plate
thickness at each rolling pass, so that we can find the best
control conditions without an expensive sensor (e.g., the
gamma-ray camera) and its operational cost. Furthermore,
high-quality plate production of the SPM control systems can
be achieved. Specifically, four well-known ML regression
models are utilized for these two predictions.
(1) Random Forest Regression (RF)
(2) Gradient Boosting Regression (GB)
(3) Gaussian Process Regression (GP)
(4) Conditional Linear Gaussian (CG)
In addition, this paper introduces Data Clustering based
Machine Learning (DC-ML). DC-ML is based on an idea
in which training data for machine learning are classified
into a set of data by clustering and then each data of the set
are learned by supervised learning, including regression and
classification.
There are similar studies regarding data clustering based
machine learning. Wang et al. [30] introduced clustering-
based Kriging, or Gaussian Process Regression [31], to solve
the problem of Efficient Global Optimization (EGO). Kriging
has the advantage of learning a complex function. However,
when it is required to be processed with large data, a problem
arises in computing large matrix multiplication. To solve such
a big data problem, the paper introduced how to combine a
clustering algorithm with Kriging for EGO. Qiang et al. [32]
presented an algorithm regarding a clustering-based artificial
neural network. In the initial step of the algorithm, many
neural networks are trained. And then these networks are
divided into clusters using K-means clustering [33] according
to the output results of each network. The most accurate one
network in each cluster is selected to be used for inference.
These previous studies focused on the specific clustering
and classification algorithm (i.e., Kriging and artificial neural
network). In this paper, we introduce a general algorithm
in terms of data clustering based machine learning. The
presented algorithm utilizes existing clustering and super-
vised learning algorithms to make a group of clustering and
supervised learning models. For a performance analysis, two
experiments are conducted and show that the presented DC-
ML is well suited to the prediction problems in the SPM
control systems and outperforms the above four regression
models (RF, GB, GP, and CG).
This paper contributes to three research agendas: (1) sug-
gest DC-ML in the application of SPM, (2) provide the
source code for DC-ML, and (3) introduce the experiment
results using the real-world data from a steel plate rolling
smart factory.
The remainder of the paper is organized as follows. Section
2 introduces background knowledge on the concept of SPM,
the basic theory of thickness reduction of SPM, and the
machine learning algorithms used in this paper. Section 3
suggests the algorithm of DC-ML. Section 4 presents the ex-
periments regarding roll force and plate thickness prediction
in SPM. Section 5 discusses the experiment results in terms
of prediction accuracy. The final section presents conclusions
and future research directions.
II. BACKGROUND
In this section, we introduce the concept of the rolling mill
process, the basic theory of thickness reduction, and machine
learning technologies regarding regression. This prerequisite
knowledge will be the basis of the methodology introduced
in Section 3.
A. RESEARCH TARGET SYSTEM
The steel plate factory process usually contains seven steps
to produce a steel plate using a slab: (1) Reheating Furnace,
(2) Hot Scale Breaker, (3) Input Size Measure, (4) Rolling
Mill Stand, (5) Output Size Measure, (6) Cooling, and (7) Hot
Leveler. The steel plate smart factory in this paper has only
one rolling mill stand, which performs multiple reciprocating
pass operations to enlarge the width and/or length of the steel
plate, and reduce the thickness of it to achieve the desired
target size. In this paper, the target rolling mill system is a
four-high reciprocating rolling mill stand. The specification
of this machine includes 8,000 tons of rolling capacity, 4
meters of rolling width, and 5 m/sec of rolling speed. It is
equipped with the pair-cross automatic gauge control system.
B. BASIC THEORY OF THICKNESS REDUCTION BY
ROLLING MILL OPERATION
The rolling process is a metal forming process in which a
slab is passed through a set of rolls to uniformly reduce the
thickness of the slab by handling the gap of rolls. Equation 1
represents the relation of the output thickness T h and the roll
gap SD under ideal conditions.
T h(i+ 1) = SD, (1)
where SD denotes Screw Down of mill (simply, Roll Gap)
and idenotes the rolling pass number.
That is, when a thick plate T h(i) is input to the roll from
the left side (Figure 2), the plate with reduced thickness
T h(i+ 1) by the roll gap SD is output to the right side.
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 2. Thickness Reduction Concept by Rolling
1) Problem of Thickness Control in Rolling
The concept of the rolling process is simple, however precise
thickness control is not trivial, because of the various noise
factors like vertical expansion movement of roll, shape defor-
mation (roll crown), temperature interaction factor, and so on.
The following introduces some explanation of major noise
factors which should be considered in the rolling process.
Vertical Expansion Movement of Roll
As shown in Figure 2, the vertical expansion (VE)
movement of roll occurs due to the material repulsion
force (Roll Force) for the thickness reduction in the
rolling process. The vertical expansion should be re-
flected when setting the roll gap in order to meet the
target thickness of the output plate. The value of VE can
be obtained by dividing the value of roll force by Mill
Modulus (MM). And also, the value of roll force can
be obtained by the high temperature strength, thickness
reduction rate, width of the rolling plate, and rolling
speed. In addition, when setting the roll gap, it should
be considered that the value of MM is slightly changed
by the width of the plate.
Shape Deformation of Roll
The original convex cylinder form of the roll (roll
crown) can be flattened due to abrasion, as rolling quan-
tities increase. Such a shape deformation also should be
reflected, when setting the roll gap.
Rolling Temperature
Under the rolling mill operation, the temperature di-
rectly contacted with the plate can rise and the roll
can be expanded due to the heat of the rolling plate.
And the plate is shrunk due to cool down from high
rolling temperature. This thermal expansion of the roll
and cooling shrink of the plate should be reflected, when
setting the roll gap.
Plate Dimension
During rolling the input plate, the thickness and width
especially vary for each rolling pass. The difference of
such dimensions causes the different mill modulus and
roll force, and eventually leads to a different roll gap.
Other Noise Factors
In addition to the above major noise factors, Roberts
[34] introduced more factors like the coefficient of fric-
tion, work-roll diameter, and rolling speed related to the
mathematical models for predicting the roll force. Such
factors associated with the roll force are also related
to the thickness of the plate. Gingzburg [35] suggested
that the disturbances, affecting gauge performance in
rolling mills, can be caused by various sources. Table
1 summarizes these noise factors.
2) Equation for the Output Thickness
In the previous subsection, the basic Equation 1 was only
associated with the roll gap SD. Equation 2 represents the
relationship between the output thickness and roll gap under
the various noise conditions [36] [37] [38].
T h(i+ 1) = (SD +RLF
MM S)×T F, (2)
where RLF denotes the roll force, MM denotes the mill
modulus which includes compensation for the plate width
variation, Sdenotes the adjustment of the roll gap which
includes compensation for thickness variation, strength vari-
4VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 1. Main Factors affecting Gauge Performance in Rolling Mills [35]
Source of Disturbances Factor Groups Factors
Disturbances
from mill mechanical
and hydraulic equipment No-load roll gap
Roll Bearing Oil Film Thickness,
Roll Ovality, Mill Chatter,
Roll Balance Force,
Roll Bite Lubricant Film,
Thickness, Roll Expansion or Contraction,
Roll Wear, and Roll Eccentricity
Disturbances
from mill mechanical
and hydraulic equipment
Main factors
affecting mill stiffness
Roll flattening, Roll Crown,
Hydraulic Cylinder Extension,
Roll Bit Lubricant Film Thickness,
Rolled Material Width,
Bearing Oil Film Thickness,
Screw Down Extension, Roll and Diameter
Disturbances
from mill control systems Mill control systems
affecting gauge performance
Mill Speed Control, Roll Force Control,
Roll Balance Control, Strip Tension Control,
Gauge Monitor Control,
Roll Coolant and Lubrication Control,
Roll Bending Control, and Roll Gap Control
Disturbance
from incoming rolled product Geometry variations
of incoming product
Gauge Variation, Hardness Variation,
Width Variation, Profile Variation,
and Flatness Variation
ation and others, and T F denotes the thermal shrinkage
compensation factor. Details can be found in [36] [37] [38].
Equation 2 is used for the basis of developing a causal model
introduced in Subsection 4.B.
C. REGRESSION IN MACHINE LEARNING
ARegression model is used to predict a continuous response
f(X), or a target variable, using predictor variables X=
{x1, x2, ..., xn}. This paper uses four well-known regres-
sion models: (1) Random Forest Regression, (2) Gradient
Boosting Regression, (3) Gaussian Process Regression, and
(4) Conditional Linear Gaussian. Random Forest Regression
can handle large data, missing data, and many variables.
However, for unseen data, it can not predict a continuous
change precisely. Also, it can be over-fitted for noisy data
and the learned model from Random Forest Regression is
difficult to be interpreted. Gradient Boosting Regression is
prone to over-fitting, so it requires careful hyperparameter
tuning, when performing machine learning. Gaussian Process
Regression is a promising model in regression. It can predict
continuous change in nonlinear regression. However, it is not
suitable for large data [39]. Conditional Linear Gaussian is a
simple and human editable model that allows subject-matter
experts to modify it. However, it is a linear model. In this
subsection, these four models are briefly introduced.
1) Random Forest Regression
A set of ML models can often have a better performance than
the use of a single ML model. Such an integration of ML
models is called ensemble learning. Random Forest [40] uses
the ensemble learning by forming a set of decision trees (e.g.,
Classification and Regression Tree, CART [41]) and resulting
in an output which is averaged over outputs from the decision
trees. Random Forest draws random samples from training
data and creates a decision tree model from the sample data,
so that it can have a set of decision trees (i.e., forest). After
machine learning, in the prediction or application stage, the
mean value of the outputs of all decision trees is yielded
as the final result. Equation 3 shows an equation for the
averaging outputs from the set of the learned decision trees.
ˆy=Mean{a1(x), a2(x), ..., an(x)},(3)
where ai(x)is a single decision tree and the function
Mean(.)yields the average value using the outputs from the
set of the decision trees.
2) Gradient Boosting Regression
Gradient Boosting [42] uses an ensemble model consisting
of a set of simple models (e.g., a decision tree stump, a tree
containing only one root and its immediately connected leaf
nodes). By adding such simple models, the result ensemble
model can be sequentially improved and finally fitted to data.
In other words, after applying a simple model, samples which
are classified by it are reused to another simple model. And
then this process is repeated until convergence (or achieving
better predictive performance). Gradient Boosting is a gener-
alized method of boosting (e.g., [43], [44]) by using gradient
of a loss function.
3) Gaussian Process Regression
Gaussian Process is composed of a set of Gaussian random
variables, specified by a mean and covariance (or kernel)
function. Equation 4 formally shows Gaussian Process [45].
P(F(x)|D, x) = N(µ(x), σ2(x)),(4)
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
where Ddenotes an observed data {x1:n, F (x1:n)},xde-
notes an independent value for F(.),N(., .)denotes a normal
distribution, µ(.)denotes a mean function of x, and σ2(.)
denotes a variance function of x. These µ(.)and σ2(.)are
shown in Equations 5 and 6, respectively.
µ(x) = kTK1F(x1:n)(5)
and
σ2(x) = k(x, x)kTK1k,(6)
where k= [k(x, x1), k(x, x2), ..., k(x, xn)] denotes a set
of kernel functions k(., .)and Kdenotes a kernel matrix as
shown by Equation 7.
K=
k(x1, x1)... k(x1, xn)
... ... ...
k(xn, x1)... k(xn, xn)
(7)
Using Equations 5 and 6, it is straightforward to compute
Gaussian Process in Equation 4.
4) Conditional Linear Gaussian Bayesian Network for
Regression
Conditional Linear Gaussian (CLG) Bayesian Network (BN)
(CLG-BN) [46] can be used for the regression problem in this
paper. Also, CLG-BN can be used to estimate the posterior
probability distribution for the target variable using various
reasoning algorithms [47]–[49]. Parameters in conditional
linear Gaussian distribution can be estimated by using an
extension of multiple-regression.
In CLG-BN, we assume that Xis a continuous node with
n continuous parents U1, ..., Unand m discrete parents A1, ...,
Am, then the conditional distribution p(X | u,a) given parent
states U=uand A=ahas the following form:
p(X|u,a) = N(L(a)(u), σ(a)),(8)
where L(a)(u)= m(a) + b1(a) u1+ ...+ bn(a)unis a linear function
of the continuous parents, with intercept m(a), coefficients
bi(a), and standard deviation σ(a) that depends on the state a
of the discrete parents.
Given a discrete parent state aj, estimating the parame-
ters (i.e., the intercept m(aj), coefficients bi(aj), and standard
deviation σ(aj)) is required. Equation 9 shows multiple lin-
ear regression which is modified from [50]. L(a)(u)can be
rewritten, if we suppose that there are kobservations (or data)
(Note that in the following, we can omit the state a, because
we know it).
Li(u) = m+b1ui1 +... +bnuin +σi, i = 1, . . . , k , (9)
where iindexes the observations. For convenience, we can
write Equation 9 more compactly using matrix notation:
l=Ub +σ, (10)
where ldenotes a vector of instances for the observations,
U denotes a matrix containing all continuous parents in
the observations, bdenotes a vector containing an intercept
mand a set of coefficients bi, and σdenotes a vector of
regression residuals. Equation 11 show these variables in
forms of vectors and a matrix.
l=
L1(u)
L1(u)
...
L1(u)
U=
1u11 ... u1n
1u21 ... u2n
... ... ... ...
1uk1 ... ukn
b=
m
b1
...
bk
σ=
σ1
σ2
...
σk
(11)
From the above settings, we can derive an optimal vector for
the intercept and the set of coefficients ˆ
b
ˆ
b= (UTU)-1UTl,(12)
Also, we can derive the optimal standard deviation ˆσfrom
the above linear algebra term [50].
ˆσ=s(lUˆ
b)T(lUˆ
b)
kn1(13)
In summary, using observation (or data) U, Equation 12,
and Equation 13, we can simply form Equation 10 and
Equation 9. In this paper, we used a probabilistic graphical
modeling package, called UnBBayes [51], which contains a
CLG-BN machine learning algorithm [52].
III. DATA CLUSTERING BASED MACHINE LEARNING
In this section, we introduce Data Clustering based Machine
Learning (DC-ML). In supervised learning, the training data
consist of data for predictor variables (e.g., X variables) and
data for a target variable (e.g., a Y variable). The data for the
predictor variables may or may not be classified as several
clusters. If the data clusters exist, we can imply that there are
several corresponding forces promoting such clusters. These
forces may differently influence the target variable. If that
is the case, separating data according to the clusters would
be better than using all data for supervised learning. In this
case, each clustered data is used to learn a corresponding su-
pervised learning model. Consequently, a machine learning
model family or ML model family, containing a set of ML
models, is constructed (Figure 3).
FIGURE 3. A Machine Learning Model Family
6VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Figure 3 shows an illustrative example of an ML model
family. The ML model family contains a high-scored clus-
tering model consisting of Mclusters. The high-scored clus-
tering or supervised learning model used herein refers to
the model which is selected by the highest score against
other candidate models. The score (e.g., R-Squared Score
and Mean Absolute Error) can be determined by an analysis
goal. Each cluster is associated with a corresponding high-
scored supervised learning model (a regression model or
classification model). Such an ML model family is learned
by DC-ML as shown by Figure III. Figure III illustrates three
main functions: (1) Perform DC-ML, (2) Perform Clustering
(CL) and Split Data according to Clusters, and (3) Perform
Supervised Learning (SL). The first step of DC-ML starts
with training data. Several clustering algorithms are indepen-
dently used to data clustering and generate clustering models
from 1 to L. Each clustering model contains clusters by which
the training data are split into clustered data from 1 to M.
Each clustered data are used to perform supervised learning.
By supervised learning, SL models from 1 to Nare output.
Several ML model families containing clustering models and
SL models are generated by this process and then one high-
scored ML model family is selected as the output of DC-
ML. After the high-scored ML model family is learned by
DC-ML, it can be used for prediction. Figure 5 illustrates
two main functions of the DC-ML prediction: (1) Select
an SL Model according to Data Clusters and (2) Perform
Prediction. The first step of prediction starts with data. The
clustering model in the ML model family is utilized to select
an SL model using the given data. The given data are reused
to predict a target value by using the selected SL model.
In Algorithm 1, DC-ML is described in more detail. DC-
ML has five inputs. The first input DXis the training data set
for predictor variables. The second input DYis the training
data set for a target variable. The third input Cis the set
of clustering algorithms (e.g., Gaussian Mixture [53], Birch
[54], and Mini Batch K-Means [33]). The input of each clus-
tering algorithm contains a set of candidate hyperparameters
(e.g., Gaussian Mixture algorithms associated with 2, 3, 4,
and 5 clusters, respectively). The fourth input Sis the set of
supervised learning algorithms (e.g., Random Forest Regres-
sion,Gradient Boosting Regression, and Gaussian Process
Regression). The supervised learning algorithm can also take
the candidate hyperparameters. The fifth input Vis the set of
clustering variables. For clustering, it is not necessary that
all the variables for the training data are used. The clustering
variables means the variables that are selected to be used
for clustering. Given these inputs, Algorithm 1 proceeds as
follows:
Line 1 The algorithm starts with the function Run(.).
Line 2 The function Run(.) iterates the function Perform
Clustering(.) in parallel. To do that, an index iis taken
from 1 to the number of clustering algorithms in C.
Line 3 The i-th clustering algorithm Ciis taken from the set
of clustering algorithms C.
Line 4 The i-th ML model family Fiis created to be used
as a result repository. For example, clustering models
and supervised learning models are stored in the i-th ML
model family.
Line 5 The function Perform Clustering(.) is executed. Note
that Line 6 is explained after the explanation of the sub-
functions in Algorithm 1.
Line 8 The function Perform Clustering(.) aims to set a
clustering hyperparameter (i.e., the number of clusters)
to each clustering algorithm.
Line 9 This function iterates the function Perform Cluster-
ing(CL) Algorithm(Alg)(.) in parallel. To do that, an
index jis taken from 1 to the number of the set of
hyperparameters Hin the i-th clustering algorithm Ci.
The index jdenotes a hyperparameter used in the CL
algorithm.
Line 10 The i-th CL algorithm is set with the hyperparame-
ter Hj.
Line 11 The function Perform Cl Alg(.) is executed.
Line 14 The function Perform Cl Alg(.) aims to execute
each clustering algorithm and prepare for the supervised
learning algorithms.
Line 15 This function executes the clustering algorithm Ci, j
using the training data DXcorresponding to the clus-
tering variables V. Note that the training data which is
not included in the clustering variables is ignored. The
clustering model CMi, j , then, is resulted from it.
Line 16 The clustering model CMi, j is assigned to the ML
model family Fi, j .
Line 17 The clustered data CDXY are taken from DXand
DYusing the clustering model CMi, j .
Line 18 This function iterates the function Perform Super-
vised Learning(.) in parallel. To do that, an index kis
taken from 1 to the number of clustered data CDXY . The
index kdenotes the k-th clustered data in the clustered
data CDXY .
Line 19 The k-th clustered data are taken from the clustered
data CDXY .
Line 20 The function Supervised Learning(.) is executed.
Line 23 The function Supervised Learning(.) aims to exe-
cute each supervised learning (SL) algorithm and return
the evaluation score of a learned SL model.
Line 24 This function iterates in parallel from 1 to the set of
supervised learning algorithms S. The index ldenotes
the l-th SL algorithm.
Line 25 The l-th SL algorithm is taken from the set of
supervised learning algorithms S.
Line 26 The k-th clustered data are used to be split into the
training data TDXY , k and the validation data VDXY, k
using the K-Fold Cross-Validation (e.g., K = 5). The
training data are used for machine learning, while the
validation data are used for evaluation of a learned
machine learning model.
Line 27 The l-th SL algorithm is executed using the training
data TDXY , k. The SL model SMlis, then, generated.
Line 28 The SL model SMlis used for prediction using the
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 4. Concept of Data Clustering based Machine Learning (DC-ML)
FIGURE 5. Prediction using an ML Model Family
validation data VDXY, k . The l-th prediction average
score from the cross validation is stored. For this vali-
dation, various performance evaluation metrics (e.g., R-
Squared Score and Mean Square Error) can be used.
Line 29 The high-scored SL model Fi, j, k*is selected using
the set of the prediction average scores.
Line 30 The high-scored SL model Fi, j, k*is stored in the
set of the high-scored SL models Fi, j, K*.
Line 22 The average score for the high-scored SL models in
Fi, j, K*is calculated. It is, then, assigned to Fi, j, avg .
Line 23 The average score Fi, j, avg is stored in the set of the
average scores Fi, J, avg.
Line 12 The high-scored clustering model which has the j*-
th hyperparameter is selected and it is assigned to Fi, j*.
Line 13 The average score Fi, j*, avg of the high-scored clus-
tering model Fi, j*is stored in the set of the average
scores FI, j *, avg.
Line 6 The high-scored i-th clustering model is selected
using the set of the average scores FI, j*, av g.
Line 7 This algorithm outputs the high-scored ML model
family Fi*, j*, K *containing the high-scored i*-th clus-
tering model with the j*-th hyperparameter and the set
of high-scored SL models K*.
We consider the time complexity of this algorithm in
terms of the Big O. In this analysis, the time complexity
of each machine learning algorithm is excluded, because
it is beyond the scope of this research. In the algorithm,
there exist four iterations (i.e., Lines 2, 9, 18, and 24), so
the time complexity is O(|C|×|Ci.H|×|CDXY |×|S|), where
Cdenotes the set of clustering algorithms, Ci.Hdenotes
the set of hyperparameters of the i-th clustering algorithm,
CDXY denotes the clustered data, and Sdenotes the set
of supervised learning algorithms. It seems like this is the
computationally expensive operation. For example, for three
clustering algorithms, three hyperparameters for each clus-
tering algorithm, two clustered data, and three supervised
learning algorithms, 54 processing tasks in total are required.
However these iterations can be parallelizable, so in practice,
actual operating time can be significantly reduced by using
multithreading and/or multiprocessing. For example, if there
are 54 multiprocessors, the total computing time can be
the sum of the maximum processing times of a clustering
algorithm and a supervised learning algorithm.
In addition, this paper presents a DC-ML software
that was implemented in the Python programming lan-
guage. The most recent version of the DC-ML soft-
ware is available online at the DC-ML GitHub repository
(https://github.com/pcyoung75/DC-ML).
IV. EXPERIMENTS IN THE SPM
In this section, we introduce two experiments to evaluate the
predictive accuracy of the DC-ML algorithm. In this paper,
the predictive accuracy means how correctly the models
learned by the ML algorithms are mapped to a test data set.
8VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Algorithm 1: Data Clustering based ML
Input: A training data set for predictor variables DX
Input: A training data set for a target variable DY
Input: A set of clustering algorithms C
Input: A set of supervised learning algorithms S
Input: A set of clustering variables V
Output: A high-scored ML model family Fi*, j*, K *
1Function Run (DX,DY,C,S,V)
2do in parallel for i1to |C|
3Cihave i-th clustering algorithm from C;
4Ficreate an empty i-th ML model family;
5Perform Clustering (DX,DY,Ci,Fi,S,V);
6Fi*select the high-scored i-th clustering model
using the average scores FI, j*, av g (see Line 13);
7return Fi*, j*, K*
8Function Perform Clustering (DX,DY, Ci, Fi,S,V)
9do in parallel for j1to |Ci.H|
10 Ci, j set i-th clustering algorithm Ciwith a
candidate hyperparameter Hj;
11 Perform CL Alg (DX,DY,Ci, j ,Fi, j ,S);
12 Fi, j*select the high-scored i-th clustering
model with the j*hyperparameter using the
average scores Fi, J, avg (see Line 23);
13 FI, j *, avg FI , j*, avg {Fi, j*, avg };
14 Function Perform CL Alg (DX,DY, Ci, j , Fi, j ,S,V)
15 CMi, j execute the clustering algorithm Ci, j
using the training data DXassociated with the
variables Vto get the clustering model CMi, j ;
16 Fi, j CMi, j ;
17 CDXY get the clustered data CDX Y from DX
and DYusing the clustering model CMi, j ;
18 do in parallel for k1to |CDXY |
19 CDXY ,khave k-th clustered data from
CDXY ;
20 Perform Supervised Learning (CDX Y ,k,
Fi, j, k,S);
21 Fi, j, avg calculate the average score for the SL
models in Fi, j, K*and store it into Fi, j, avg ;
22 Fi, J, avg Fi, J, av g {Fi, j, avg };
23 Function Perform Supervised Learning (CDX Y ,k,
Fi, j, k,S)
24 do in parallel for l1to |S|
25 Slhave l-th SL algorithm from S;
26 TDXY ,k,VDX Y ,kperform the K-Fold
split to get the training data TDXY ,kand the
validation data VDXY ,kfrom CDX Y ,k;
27 SMlperform the SL algorithm Slusing
TDXY ,kto get the SL model SMl;
28 avgScorelperform the prediction using
SMland VDXY ,kto get an l-th average score;
29 Fi, j, k*find the high-scored SL model using
the set of scores avgScore and put it into Fi, j, k*;
30 Fi, j, K*Fi, j, K *{Fi, j, k*};
Specifically, a coefficient of determination (see Equation 14)
is used for comparison between ML models. The experiments
aim to find high-scored ML models for roll force and plate
thickness predictions in each rolling pass using four existing
ML algorithms (Gradient Boosting Regression (GB), Ran-
dom Forest Regression (RF), Gaussian Process Regression
(GP) and, CLG-BN (CG)) and the DC-ML algorithm.
For these two experiments, we performed four steps: (1)
acquiring real data, (2) developing a causal model, (3)
performing machine learning, and (4) testing the prediction.
In the acquiring real data step, the real data for machine
learning are collected from a target factory. In the causal
model development step, a causal model, representing causal
relationships between variables, is defined regarding the steel
plate rolling factory. Such a causal model enables machine
learning engineers to select a best structure (including fea-
tures) of ML models. In the machine learning step, candidate
ML models are trained using ML algorithms (including DC-
ML) and the training data set from the target factory. In the
test step, the learned ML models are evaluated using the test
data set. Specifically, the roll force and plate thickness in each
rolling pass (e.g., P S1, P S2, ..., P SN) are predicted and then
evaluated in terms of the accuracy.
The following subsection introduces each step of the ex-
periment in detail. This experiment was performed on a
3.50GHz Intel Core i7-5930K processor with a 96 GB mem-
ory. Through these experiments, we determined two high-
scored ML models that can be utilized in the operation of
the SPM control systems.
A. ACQUIRING REAL DATA
The target factory contains several sensors and actuators to
operate the rolling mill and other facilities (e.g., reheating
furnaces and hot levelers). Factory data from these facilities
are stored in real time on a main computer. For this research,
some sample data, containing 4334 pass data cases, were
used. Each pass data contained several sensor and actuator
parameters (e.g., roll force, roll gap, and temperature) and
their values. These parameters can be found in Table 2. For
example, a plate production is scheduled with 18 rolling
passes in which each rolling pass data are generated in
the rolling mill operation (i.e., 18 pass data for one plate
production). The last pass data contain the specification of
the final results (e.g., the final production thickness of the
plate). For each rolling pass, the values of these parameters
were distributed in various ranges. For example, the input
thickness of a plate before the rolling mill operation was
around 272 millimeters, while the output thickness after the
operation was around 17 millimeters.
B. DEVELOPING A CAUSAL MODEL FOR THE STEEL
PLATE ROLLING FACTORY
Based on theoretical analysis of the thickness reduction pro-
cess by rolling mills (see Subsections 2.A and 2.B), a causal
model was developed by subject-matter experts in terms of
a SPM control system, managing control values for a SPM.
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
The causal model was used to identify main features and
relationships between such features, so that machine learning
engineers in this research comprehensively could understand
the domain problem and situation, and could develop seam-
lessly machine learning models. Also, for machine learning,
the causal model was used for feature engineering, in which
features of data were selected. Usually, machine learning
engineers lack the domain knowledge to which they are
assigned. The subject-matter experts also do not have much
knowledge about machine learning algorithms. The causal
model could help both experts to understand the target sit-
uation and develop the ML models.
There were a number of factors (i.e., predictor variables)
that might have affected the plate thickness and the roll force
(i.e., target variables). However, some predictor variables can
be negligible, because of redundancy and small influence to
the target variables. For this, we selected candidate control
and noise factors of the SPM control system, and determined
the relationships between these factors using the theory of the
rolling process in Subsection 2.B.
Figure 6 shows the causal model in this paper. This causal
model was developed in terms of Plate Thickness and its
causal factors (e.g., Mill Modulus and Temperature). For
the SPM control system, the first-order control factors are
Roll Gap and Roll Gap Adjustment, while the first-order
noise factors are Mill Modulus and Roll Force. The causal
model shows also the second and third-order noise factors for
the SPM control system. In addition, there are two factors,
represented by the dashed boxes (i.e., Material Strength at
rolling temperature and Quantity of material deformation by
rolling), for which corresponding data do not exist. These two
factors are included in the causal model, because by doing
this, hidden factors can be displayed more explicitly.
Table 2 shows all the features (or variables) used in this
paper. The total 16 features were identified through this step.
For example, Plate Thickness in Table 2 is the thickness of
a plate measured by a laser. Planned Plate Thickness is the
planned target thickness of a plate after each rolling.
C. PERFORMING MACHINE LEARNING
Initially, we considered various machine learning algorithms
(e.g., decision tree, support vector machine, and deep learn-
ing), however since they did not result in any noticeably
better performance compared to the results from the four
algorithms in Subsection 2.C, we did not include them in this
experiment. For the roll force and plate thickness predictions,
the four algorithms in Subsection 2.C and the DC-ML algo-
rithm in Subsection 3 were used to learn each ML model of
the corresponding ML algorithm.
To perform these ML algorithms, identifying predictor
variables and target variables was required. From the causal
model in Figure 6, the predictor variables and the target
variables were identified by the subject-matter experts. Table
3 shows the variables for the roll force prediction, while Table
4 shows the variables for the plate thickness prediction.
For DC-ML, three clustering algorithms (Gaussian Mix-
ture [53], Birch [54], and Mini Batch K-Means [33]) were
used as input. For each clustering algorithm, 27 cluster
numbers were set as the candidate hyperparameters. Note that
eight or more cluster numbers can be set, but an experiment
takes a lot of time. Furthermore, as we will see in Section 5,
a four-clusters model yields the best result.
The data in Subsection 4.A were randomly divided in 90%
of the training data and 10% of the test data. Each ML
algorithm test was repeated up to 20 times. When performing
DC-ML, the training and validation data were randomly
selected using the 5-fold cross-validation and Mean Absolute
Error was used for the validation of candidate models.
TABLE 3. Selected Variables for Roll Force Prediction
Predictor Variables Target Variable
Mill Modulus
Material Temperature
Thickness Reduction
Width Reduction
Material Strength
Roll Gap
Planned Plate Thickness
Roll Force
TABLE 4. Selected Variables for Plate Thickness Prediction
Predictor Variables Target Variable
Mill Modulus
Material Temperature
Thickness Reduction
Width Reduction
Material Strength
Rolling Speed
Roll Force
Roll Gap
Roll Gap Adjustment
Plate Thickness
The following steps summarizes the experiment process
for the roll force and plate thickness prediction in detail.
Step 1. The training data of 90% and test data of 10% were
randomly taken from the real data set (4334 cases) ac-
cording to the experiment type (the roll force prediction
or the plate thickness prediction)
Step 2. The DC-ML algorithm was used to learn an ML
model using four inputs: (1) the training data, (2) the
set of clustering algorithms (Gaussian Mixture, Birch,
and Mini Batch K Means), (3) The set of supervised
learning algorithms (Random Forest Regression (RF),
Gradient Boosting Regression (GB), Gaussian Process
Regression (GP), and CLG-BN (CG)), and (4) the one
clustering variable (Material Strength). Each input clus-
tering algorithm was set with 2 to 7 cluster numbers as
hyperparameters. In this setting, the DC-ML algorithm
was performed 20 times for each cluster numbers.
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 6. Causal Model for Steel Plate Rolling Factory
TABLE 2. Selected Variables in the Real Data
Name Description
Plate Thickness The thickness of plate measured by a laser
Mill Modulus The mill stand coefficient of vertical expansion due to
repulsive force of material during rolling
Roll Force The measure of repulsive force of material during rolling
Roll Gap The set value of roll gap by screw down of the mill
Roll Gap Adjustment The total adjustment value of roll gap for various inaccurate numbers
of material size, material strength, material temperature, roll crown, etc
Type of Temperature
Controlled Rolling The type of rolling method to adjust
temperature for proper metallurgical transformation
Material Strength at Rolling Temperature This item is not measured in the factory.
Rolling Speed The speed of work roll surface
Quantity of Material Deformation by Rolling This item is not measured in the factory.
Material Strength The mechanical yield strength for the composition of material
Material Temperature The temperature of material at the time of rolling
Thickness Reduction The quantity of thickness reduction during a pass of rolling
Width Reduction The quantity of width reduction during a pass of rolling
Planned Plate Thickness The planned target thickness of plate after rolling
Plate Width The calculated width of plate after rolling
Roll Crown The measure of the convex contour of roll
Step 3. The training data of 90% were reused to learn each
of four ML models (RF, GB, GP, and CG).
Step 4. After machine learning, the DC-ML model in Step
2 and the four learned models in Step 3 were evaluated
using the test data of 10%.
D. TESTING PREDICTION
To evaluate the five ML models from the previous subsection,
the coefficient of determination, called R2score (Equation
14), were used. Note that 1 of R2score means that the
model perfectly predicted the results without an error. And
a negative R2score can occur, when poorly predicting the
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
results.
R2= 1 SSE
SST ,(14)
where SSE denotes the sum of squared errors
SSE =
n
X
i=1
(y(i)ˆy(i))2(15)
and SST denotes the total sum of squares
SST =
n
X
i=1
(y(i)µy)2(16)
in which ndenotes the total number of the data cases and y
denotes the actual target value in a case, ˆydenotes prediction,
and µydenotes the average of actual target values.
V. RESULTS AND DISCUSSION
In this section, we evaluate the five machine learning algo-
rithms and present the lessons learned regarding the applica-
tion of machine learning in SPM.
A. EVALUATION FOR MACHINE LEARNING
ALGORITHMS
For the two experiments (the roll force and plate thickness
predictions in SPM), two high-scored ML model families
were selected. These two models contained the same four
clusters. Such ML model families can be called a four-cluster
ML model family. In the following two subsections, the DC-
ML models mean the four-cluster ML model family.
1) Evaluation for Roll Force Prediction
In the roll force prediction, the four-cluster ML model family
showed better results than the four regression models (GB,
RF, GP, and CG). Table 5 shows an overall average R2
score of each of the five ML algorithms. The R2score
denotes the prediction accuracy (Equation 14), evaluated by
comparing the actual values and the predicted values. The
overall average R2score means the average of the R2scores
from 20 tests
In the prediction results, the ML algorithms Gradient
Boosting Regression, Random Forest Regression, and CLG-
BN resulted in relatively lower scores than the ML algo-
rithm DC-ML. DC-ML predicted the roll force with the
highest accuracy (0.8828) and precision (0.0117). Among the
four algorithms except DC-ML, Conditional Linear Gaussian
showed the highest result (0.8632), while Gaussian Process
Regression showed the lowest result (0.2066).
TABLE 5. Overall Average R2Score in Roll Force Prediction
ML Average Standard Deviation
Gradient Boosting Regression 0.7381 0.0297
Random Forest Regression 0.8475 0.0193
Gaussian Process Regression 0.2066 0.0333
Conditional Linear Gaussian BN 0.8632 0.0143
DC-ML 0.8828 0.0117
Figure 7 shows a box-plot chart corresponding to data in
Table 5. In the figure, the ML algorithm Gaussian Process
Regression was excluded to investigate precisely the results
from the other algorithms.
FIGURE 7. Overall Average R2Score for Roll Force Prediction
In the 20 times test, the 20 high-scored ML model families
were learned using DC-ML. For each test, a high-scored ML
model family contained a different clustering model. The
input set of clustering algorithms were Gaussian Mixture,
Birch, and Mini Batch K-Means. Table 6 shows the percent-
age of selected clustering algorithms in the 20 tests. Gaussian
Mixture, Birch, and Mini Batch K-Means were selected with
25 percent, 10 percent, and 65 percent, respectively.
TABLE 6. Percentage of Selected Clustering Algorithms in the 20 Tests for
Roll Force Prediction
Gaussian Mixture Birch Mini Batch K-Means
25% 10% 65%
For each cluster of the four clusters in the 20 high-scored
ML model families, one of the four supervised learning mod-
els was selected. Table 7 shows the percentage of selected
supervised learning algorithms in the ML model families.
For example, Random Forest Regression was selected with
25 percent, while CLG-BN was selected with 75 percent.
Among the four supervised learning algorithms, CLG-BN
was shown as a best algorithm. And this result is consistent
with the results in Table 5.
TABLE 7. Percentage of Selected Supervised Learning Algorithms in the ML
Model Families for Roll Force Prediction
Gradient
Boosting
Regression
Random
Forest
Regression
Gaussian
Process
Regression
Conditional
Linear
Gaussian BN
0% 25% 0% 75%
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
2) Evaluation for Plate Thickness Prediction
Like the previous subsection, high-scored ML model families
for the plate thickness prediction were learned using DC-ML.
Table 8 shows the overall average R2score of each of the five
ML algorithms.
TABLE 8. Overall Average R2Scores in Plate Thickness Prediction
ML Average Standard Deviation
Gradient Boosting Regression 0.9996499 0.0000551
Random Forest Regression 0.9999032 0.0001174
Gaussian Process Regression 0.9999957 0.0000009
Conditional Linear Gaussian BN 0.9999957 0.0000009
DC-ML 0.9999959 0.0000008
FIGURE 8. Overall Average R2Score for Plate Thickness Prediction
For the five ML algorithms, the prediction results look
similar around 0.999 of the overall average R2score. How-
ever, the SPM control systems require a high level of accu-
racy, because it directly influences the quality of the final
product (i.e., a steel plate). The higher prediction accuracy
is significant in this domain. The ML algorithms Gradient
Boosting Regression and Random Forest Regression resulted
in relatively lower scores than the ML algorithms Gaussian
Process Regression, Conditional Linear Gaussian BN, and
DC-ML. DC-ML predicted the plate thickness with slightly
higher accuracy (0.9999959) and precision (0.0000008).
Figure 8 shows a box-plot chart corresponding to data in
Table 8. In Figure 8, the ML algorithms Gradient Boosting
Regression and Random Forest Regression were excluded
to investigate precisely the results from Gaussian Process
Regression (GP), CLG-BN (CG), and DC-ML.
Table 9 shows the percentage of selected clustering algo-
rithms in the high-scored ML model families. Of the clus-
tering algorithms Gaussian Mixture, Birch, and Mini Batch
K-Means, Gaussian Mixture was selected with 20 percent,
Birch was selected with 55 percent, and Mini Batch K-Means
was selected with 25 percent.
TABLE 9. Percentage of Selected Clustering Algorithms in the ML Model
Families for Plate Thickness Prediction
Gaussian Mixture Birch Mini Batch K-Means
20% 55% 25%
In addition, Table 10 shows the percentage of selected
supervised learning algorithms in the ML model families.
This result is consistent with the results in Table 8. For
example, Gaussian Process Regression and Conditional Lin-
ear Gaussian BN were selected with 67 and 33 percent, re-
spectively, while Gradient Boosting Regression and Random
Forest Regression were not selected as shown in their low
scores in Table 8.
TABLE 10. Percentage of Selected Supervised Learning Algorithms in the ML
Model Families for Plate Thickness Prediction
Gradient
Boosting
Regression
Random
Forest
Regression
Gaussian
Process
Regression
Conditional
Linear
Gaussian BN
0% 0% 67% 33%
B. LESSONS LEARNED
This subsection introduces the lessons learned from this
research to help researchers related to a smart factory make
better decision, when applying machine learning.
Data Clustering for Smart Manufacturing
Smart manufacturing aims small-quantity batch produc-
tion for various products. The wide range of products
generates a variety of data. In such a case, using a single
ML model may not be able to achieve effective results.
Instead, the approach of using multiple ML models can
provide better performance, because the data in this case
contains separable sub-data. For example, in this paper,
the four-cluster ML model family (i.e., the multiple
ML model approach) showed better results than the
approach of using the single ML model.
Cluster Numbers and Data Size
The performance of DC-ML is mainly influenced by
the quality of clusters. In data of a fixed size, as the
number of clusters increases, the number of available
data for supervised learning decreases. The number of
data influences the quality of the supervised learning
model. Therefore, it is required to find the appropriate
number of clusters. Figure 9 depicts the overall average
R2scores for the roll force prediction over 27 cluster
numbers in the experiment of Subsection 5.A.
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 9. Overall Average R2Scores for Roll Force Prediction over Clusters
As the number of clusters increases from 2 to 7, the
score increases and decreases after the four clusters.
The figure represents a typical correlation between the
cluster numbers (or data size) and the model quality.
To improve the performance of DC-ML, a method of
recommending the appropriate number of clusters is
required. To address this issue, a simple grid search can
be used. However, as the number of clustering variables
increases, the total number of the searching space can
exponentially increase. We leave this for future research.
Usefulness of Causal Models
Although it is not trivial to derive a causal model (e.g.,
Figure 6) from the target domain, it can help one un-
derstand aspects of that field, find weak and/or strong
influencing factors, and utilize existing domain knowl-
edge (e.g., physical and chemical characteristics) to
construct ML models. Understanding the target domain
using the causal model enables us to determine suitable
candidates of machine learning models and algorithms
in advance so that we can efficiently deliver the domain
knowledge to ML engineers.
Static and Dynamic ML Models
If data are sequential in nature, a dynamic ML model
(e.g., Recurrent Neural Network (RNN) [55] [56] and
Long Short-Term Memory (LSTM) [57]) is usually
required. However, by changing dynamic data to static
data, a static ML model, representing just one snapshot
in time, can be applied. In this research, we found that
the current manufacturing factors are influenced only by
factors in the previous pass (i.e., a first-order Markov
assumption, a factor at a time nonly depends on a factor
at a time n1). In this case, simply combining the
current pass data with the previous pass data is sufficient
to train the static ML model.
Missing Data and Data Precision
In our experience of applying machine learning to smart
factories, oftentimes we have encountered a missing-
data situation in which proper data are missed or the
precision of the acquired data are too low to apply
machine learning. For machine learning, collecting the
right data is the most imperative task that should be
performed in the data acquisition phase. It is highly
recommended to collect high-precision data. However,
it will be costly. Therefore, finding right prediction level
according to analysis goals is a critical task.
VI. CONCLUSION
In this paper, we presented ML technologies in a steel plate
production line. We focused on finding high-scored ML
algorithms which can be used for the roll force and plate
thickness prediction at each rolling pass, so that one can find
the best control conditions to produce high-quality steel plate
products. In addition, the ML approach in this paper can
reduce a sensor cost as well as its operational cost. In our
experiment, DC-ML shows the acceptable results for the roll
force and plate thickness prediction.
The idea behind this paper can also be used to apply other
operations in a smart factory. In the era of Big Data, unused
data in manufacturing lines are overflowing and sleeping.
The prediction capability of machine learning with such
data can be utilized for replacing existing facilities, devices,
and sensors in the manufacturing lines. By doing so, the
operational cost can be significantly reduced. Especially,
DC-ML has characteristics suitable for smart manufacturing,
aiming small-quantity batch production for various products,
because it can provide multiple ML models according to
different kinds of products in a same category. In this paper,
we only focused on the operation of steel plate rolling smart
factory. Future work will consider to apply the approach in
this paper to other facilities and other smart factories.
ACKNOWLEDGMENT
The authors would like to thank Dr. Sung Tae Kim for his
statistical analysis in this research that provided insight into
the understanding of the target system. The authors also
appreciate Dr. Shou Matsumoto, Mr. Hang Seok Choi, and
Mr. Dong Jin Lee for their insightful comments on this
research, and Mr. JuByung Ha for his contributions in data
engineering.
REFERENCES
[1] S. M. L. Coalition, “Implementing 21st century smart manufacturing,” in
Workshop summary report, 2011.
[2] J. Lee, H.-A. Kao, and S. Yang, “Service innovation and smart analytics
for industry 4.0 and big data environment,” Procedia Cirp, vol. 16, pp. 3–8,
2014.
[3] Y. Lu, K. C. Morris, and S. Frechette, “Current standards landscape
for smart manufacturing systems,” National Institute of Standards and
Technology, NISTIR, vol. 8107, p. 39, 2016.
[4] C. Y. Park, K. B. Laskey, S. Salim, and J. Y. Lee, “Predictive situation
awareness model for smart manufacturing,” in 2017 20th International
Conference on Information Fusion (Fusion). IEEE, 2017, pp. 1–8.
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[5] J. Li, H. Deng, and W. Jiang, “Secure vibration control of flexible arms
based on operators’ behaviors,” in International Conference on Security,
Privacy and Anonymity in Computation, Communication and Storage.
Springer, 2017, pp. 420–431.
[6] H. Lee and J. Lee, “Development concepts of smart service system-based
smart factory (4sf),” in INCOSE International Symposium, vol. 28, no. 1.
Wiley Online Library, 2018, pp. 1153–1169.
[7] G. Qiao and B. A. Weiss, “Quick health assessment for industrial robot
health degradation and the supporting advanced sensing development,
Journal of manufacturing systems, vol. 48, pp. 51–59, 2018.
[8] K. S. Kiangala and Z. Wang, “Initiating predictive maintenance for a
conveyor motor in a bottling plant using industry 4.0 concepts,” The
International Journal of Advanced Manufacturing Technology, vol. 97, no.
9-12, pp. 3251–3271, 2018.
[9] Y. Qu, X. Ming, Z. Liu, X. Zhang, and Z. Hou, “Smart manufacturing
systems: state of the art and future trends,” The International Journal of
Advanced Manufacturing Technology, pp. 1–18, 2019.
[10] A. D. Landmark, E. Arica, B. Kløve, P. F. Kamsvåg, E. A. Seim, and
M. Oliveira, “Situation awareness for effective production control,” in
IFIP International Conference on Advances in Production Management
Systems. Springer, 2019, pp. 690–698.
[11] T. Nkonyana, Y. Sun, B. Twala, and E. Dogo, “Performance evaluation
of data mining techniques in steel manufacturing industry,” Procedia
Manufacturing, vol. 35, pp. 623–628, 2019.
[12] S. Guo, J. Yu, X. Liu, C. Wang, and Q. Jiang, “A predicting model for
properties of steel using the industrial big data based on machine learning,”
Computational Materials Science, vol. 160, pp. 95–104, 2019.
[13] K. J. Åström, T. Hägglund, C. C. Hang, and W. K. Ho, “Automatic tuning
and adaptation for pid controllers-a survey,” Control Engineering Practice,
vol. 1, no. 4, pp. 699–714, 1993.
[14] S. Bennett, “The past of pid controllers,” Annual Reviews in Control,
vol. 25, pp. 43–53, 2001.
[15] K. J. Åström, T. Hägglund, and K. J. Astrom, Advanced PID control.
ISA-The Instrumentation, Systems, and Automation Society Research
Triangle .. ., 2006, vol. 461.
[16] X. Zhang, X. Yao, Q. Wu, and D. Li, “The application of generalized
predictive control to the hagc,” in 2008 Fifth International Conference on
Fuzzy Systems and Knowledge Discovery, vol. 1. IEEE, 2008, pp. 444–
447.
[17] A. Karandaev, A. Radionov, V. Khramshin, I. Y. Andryushin, and A. Shu-
bin, “Automatic gauge control system with combined control of the screw-
down arrangement position,” in 2014 12th International Conference on
Actual Problems of Electronics Instrument Engineering (APEIE). IEEE,
2014, pp. 88–94.
[18] Z. Zhang and W. Ding, “A new anti-disturbance strategy of automatic
gauge control for small workroll cold reversing mill,” in 2016 IEEE
Advanced Information Management, Communicates, Electronic and Au-
tomation Control Conference (IMCEC). IEEE, 2016, pp. 2004–2008.
[19] S. Wang, “Real-time neurofuzzy control for rolling mills,” 1999.
[20] L. A. Zadeh, “Fuzzy sets,” Information and control, vol. 8, no. 3, pp. 338–
353, 1965.
[21] X. Wang, Y. Xiao, and D. Zhang, “The design of mill automatic gauge con-
trol system based on the fuzzy proportion integral differential controller,
in 2008 Fifth International Conference on Fuzzy Systems and Knowledge
Discovery, vol. 3. IEEE, 2008, pp. 249–253.
[22] V. Ginzburg, “Steel-rolling technology: theory and practice. 1989,” New
York: Marcel.
[23] D. M. Lee and S. Choi, “Application of on-line adaptable neural network
for the rolling force set-up of a plate mill,” Engineering applications of
artificial intelligence, vol. 17, no. 5, pp. 557–565, 2004.
[24] F. Zhang, Y. Zhao, and J. Shao, “Rolling force prediction in heavy plate
rolling based on uniform differential neural network,” Journal of Control
Science and Engineering, vol. 2016, 2016.
[25] S. Rath, A. Singh, U. Bhaskar, B. Krishna, B. Santra, D. Rai, and N. Neogi,
“Artificial neural network modeling for prediction of roll force during plate
rolling process,” Materials and Manufacturing Processes, vol. 25, no. 1-3,
pp. 149–153, 2010.
[26] M. Bagheripoor and H. Bisadi, “Application of artificial neural networks
for the prediction of roll force and roll torque in hot strip rolling process,”
Applied Mathematical Modelling, vol. 37, no. 7, pp. 4593–4607, 2013.
[27] Z.-H. Wang, D.-Y. Gong, X. Li, G.-T. Li, and D.-H. Zhang, “Prediction of
bending force in the hot strip rolling process using artificial neural network
and genetic algorithm (ann-ga),” The International Journal of Advanced
Manufacturing Technology, vol. 93, no. 9-12, pp. 3325–3338, 2017.
[28] J. Liu, X. Liu, and B. T. Le, “Rolling force prediction of hot rolling based
on ga-melm,” Complexity, vol. 2019, 2019.
[29] K. Esenda˘
g, A. H. Orta, ˙
I. Kayaba¸sı, and S. ˙
Ilker, “Prediction of reversible
cold rolling process parameters with artificial neural network and regres-
sion models for industrial applications: A case study,” Procedia CIRP,
vol. 79, pp. 644–648, 2019.
[30] H. Wang, B. van Stein, M. Emmerich, and T. Bäck, “Time complexity
reduction in efficient global optimization using cluster kriging,” in Pro-
ceedings of the Genetic and Evolutionary Computation Conference, 2017,
pp. 889–896.
[31] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine
learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.
[32] F. Qiang, H. Shang-Xu, and Z. Sheng-Ying, “Clustering-based selective
neural network ensemble,” Journal of Zhejiang University-Science A,
vol. 6, no. 5, pp. 387–392, 2005.
[33] J. A. Hartigan, “Clustering algorithms,” 1975.
[34] W. L. Roberts, Flat processing of steel. M. Dekker, 1988.
[35] V. B. Ginzburg and R. Ballas, Flat rolling fundamentals. CRC Press,
2000.
[36] H. Yim, B. Joo, G. Lee, J. Seo, and Y. Moon, “A study on the roll gap set-up
to compensate thickness variation at top-end in plate rolling,” Transactions
of Materials Processing, vol. 18, no. 4, pp. 290–295, 2009.
[37] Y.-H. Moon and J.-J. Yi, “Improvement of roll-gap set-up accuracy using
a modified mill stiffness from gaugemeter diagrams,” Journal of materials
processing technology, vol. 70, no. 1-3, pp. 194–197, 1997.
[38] Y. Hwang and H. Hsu, “An investigation into the plastic deformation be-
havior at the roll gap during plate rolling,” Journal ofMaterials Processing
Technology, vol. 88, no. 1-3, pp. 97–104, 1999.
[39] M. Deisenroth and J. W. Ng, “Distributed gaussian processes,” in
Proceedings of the 32nd International Conference on Machine Learning,
ser. Proceedings of Machine Learning Research, F. Bach and D. Blei,
Eds., vol. 37. Lille, France: PMLR, 07–09 Jul 2015, pp. 1481–1490.
[Online]. Available: http://proceedings.mlr.press/v37/deisenroth15.html
[40] T. K. Ho, “Random decision forests,” in Proceedings of 3rd international
conference on document analysis and recognition, vol. 1. IEEE, 1995,
pp. 278–282.
[41] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and
regression trees,” 1984.
[42] L. Breiman, “Arcing classifiers,” Annals of Statistics, vol. 26, pp. 123–40,
1996.
[43] R. E. Schapire, “The strength of weak learnability,” Machine learning,
vol. 5, no. 2, pp. 197–227, 1990.
[44] Y. Freund, R. E. Schapire et al., “Experiments with a new boosting
algorithm,” in icml, vol. 96. Citeseer, 1996, pp. 148–156.
[45] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer
School on Machine Learning. Springer, 2003, pp. 63–71.
[46] S. L. Lauritzen and N. Wermuth, “Graphical models for associations
between variables, some of which are qualitative and some quantitative,”
The annals of Statistics, pp. 31–57, 1989.
[47] W. Sun, C. Y. Park, and R. Carvalho, “A new research tool for hybrid
bayesian networks using script language,” in Signal Processing, Sensor
Fusion, and Target Recognition XX, vol. 8050. International Society for
Optics and Photonics, 2011, p. 80501Q.
[48] C. Y. Park, K. B. Laskey, P. C. G. Costa, and S. Matsumoto, “Message
passing for hybrid bayesian networks using gaussian mixture reduction,” in
2015 Tenth International Conference on Digital Information Management
(ICDIM). IEEE, 2015, pp. 210–216.
[49] C. Y. Park, K. B. Laskey, P. C. Costa, and S. Matsumoto, “Gaussian
mixture reduction for time-constrained approximate inference in hybrid
bayesian networks,” Applied Sciences, vol. 9, no. 10, p. 2055, 2019.
[50] A. Rencher, “Methods of multivariate analysis (vol. 492). hoboken,” 2003.
[51] S. Matsumoto, R. N. Carvalho, M. Ladeira, P. C. G. da Costa, L. L.
Santos, D. Silva, M. Onishi, E. Machado, and K. Cai, “Unbbayes: a java
framework for probabilistic models in ai,” Java in academia and research,
p. 34, 2011.
[52] C. Y. Park, K. B. Laskey, P. C. G. Costa, and S. Matsumoto, “Multi-entity
bayesian networks learning for hybrid variables in situation awareness,” in
Proceedings of the 16th International Conference on Information Fusion.
IEEE, 2013, pp. 1894–1901.
[53] G. Celeux and G. Govaert, “A classification em algorithm for clustering
and two stochastic versions,” Computational statistics & Data analysis,
vol. 14, no. 3, pp. 315–332, 1992.
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2983188, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[54] T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: an efficient data
clustering method for very large databases,” ACM Sigmod Record, vol. 25,
no. 2, pp. 103–114, 1996.
[55] B. A. Pearlmutter, “Learning state space trajectories in recurrent neural
networks,” Neural Computation, vol. 1, no. 2, pp. 263–269, 1989.
[56] C. L. Giles, G. M. Kuhn, and R. J. Williams, “Dynamic recurrent neural
networks: Theory and applications,” IEEE Transactions on Neural Net-
works, vol. 5, no. 2, pp. 153–156, 1994.
[57] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
CHEOL YOUNG PARK has researched and de-
veloped machine learning algorithms for Multi-
Entity Bayesian Networks to support predictive
situation awareness systems (e.g., a MSAW sys-
tem for smart manufacturing, a PROGNOS system
for maritime situation awareness, and a HERALD
system for critical infrastructure defense). He had
worked for the C4I and Cyber center at GMU
as a research associate. Currently, he is working
for BAIES, LLC as a machine learning research
engineer. At BAIES, he conducted the project about a collective intelligence
multi-model integration platform, called Bayes Cloud. His researches were
supported by funds from the Office of Naval Research, KEIT, POSCO, etc.
He volunteers to teach artificial intelligence and software programming to
high school students in Northern Virginia, USA.
JIN WOOG KIM is exploring, in reinforcement
learning, how to discard existing historical data
and learn new data effectively with only a small
amount of computation, while data is constantly
being input in real time. In particular, He is
researching neural network ensembles that use
Bayesian reasoning to improve learning perfor-
mance in transfer learning and share weights be-
tween models that have learned Dropout. He cur-
rently runs a DEEP-IN company in South Korea
and runs an auto-trading system for trading cryptocurrencies and FX based
on AI Engine using Bayesian Deep Learning. His PhD research focuses on
the initial weighting of neural networks using prior probabilities and the
relative performance improvement of deep learning models.
BOSUNG KIM has pursued a research agenda
focusing on the Steel Plate Rolling Engineering
Technology in steel making industry. He has tried
to improve accuracy and precision in the Rolling
control system. He has researched and devel-
oped various Micro Control Application System
in Rolling Mill for 7 years. He got a Bachelor’s
degree at PNU and majoring in Material Engineer-
ing. He is currently working as Junior Manager at
POSCO Corp.
JOONGYOON LEE has researched focusing on
the application of the systems engineering technol-
ogy for various industrial areas. He has researched
and developed architectures of smart manufac-
turing systems, railway systems, plant systems,
and various military systems. He had worked for
DAEWOO Motor Corp. as a researcher and for SE
Technology Corp. as a chief architect and CEO.
Currently, he is working for POSTECH University
as a professor of systems engineering since 2012.
He is serving INCOSE as a representative of the Korean Chapter. He is a
member of the ISO/IEC JTC1 SC7 for Software and systems engineering.
His Ph.D. research subject was a study on the process and tool for system
requirements definition.
16 VOLUME 4, 2016
... Walther et al. (2019) engaged traditional machine learning algorithms such as Random Forest (RF), Extremely Randomized Trees, and Gradient Boosting Regression Trees with the objective of forecasting the short-term electric load within a factory setting [3]. Park et al. (2020) conducted an exploration to identify critical factors influencing the quality of steel plate manufacturing, subsequently utilizing machine learning to predict these variables [4]. Complementing these studies, Mohamadi and Ehteram (2020) engineered a machine learning model capable of predicting the monthly evaporation of water resources with marked accuracy [5]. ...
... Reddy et al. (2019) developed data-driven empirical models specific to LNG BOG (boil-off gas) compressors, demonstrating superior performance over traditional Artificial Neural Network (ANN) and Kriging models [2]. Walther et al. (2019) engaged traditional machine learning algorithms such as Random Forest (RF), Extremely Randomized Trees, and Gradient Boosting Regression Trees with the objective of forecasting the short-term electric load within a factory setting [3]. Park et al. (2020) conducted an exploration to identify critical factors influencing the quality of steel plate manufacturing, subsequently utilizing machine learning to predict these variables [4]. Complementing these studies, Mohamadi and Ehteram (2020) engineered a machine learning model capable of predicting the monthly evaporation of water resources with marked accuracy [5]. ...
Article
Full-text available
In this study, we conducted preliminary research with the objective of leveraging artificial intelligence to optimize the efficiency and safety of the entire Ambient Air Vaporizer (AAV) system for LNG (Liquid Natural Gas). By analyzing a year-long dataset of real operational data, we identified key variables that significantly influence the outlet temperature of Natural Gas (NG). Based on these insights, a Deep Neural Network (DNN) prediction model was developed to forecast the NG outlet temperature. The endeavor to create an effective prediction model faced specific challenges, primarily due to the narrow operational range of fan speeds and safety-focused guidelines. To surmount these obstacles, various learning algorithms were evaluated under multiple conditions. Ultimately, a DNN model exhibiting lower values of both absolute mean error (MAE) and mean square error (MSE) was successfully established.
... Steel plate is widely utilized in aerospace production [1] and [2], architecture industry [3] and [4], and machinery manufacturing [5] and [6]. Its surface defects are the key factors affecting the quality of steel products. ...
Article
Full-text available
Efficient surface defects classification is one of the research hotpots in steel plate defect recognition. Compared with traditional methods, deep learning methods have been effective in improving classification accuracy and efficiency, but require a large amount of labeled data, resulting in limited improvement of detection efficiency. To reduce the labeling effort under the premise of satisfying the classification accuracy, a deep active learning method is proposed for steel plate surface defects classification. Firstly, a lightweight convolutional neural network is designed, which speeds up the training process and enhances the model regularization. Secondly, a novel uncertainty-based sampling strategy, which calculates Kullback-Leibler (KL) divergence between two kinds of distributions, is used as an uncertainty measure to select new samples for labeling. Finally, the performance of the proposed method is validated using the steel surface defects dataset from Northeastern University (NEU-CLS) and the milling steel surface defects dataset from a local laboratory. The proposed global pooling-based classifier with global average pooling (GAPC) network model combined with the Kullback-Leibler divergence sampling (KLS) strategy has the best performance in the classification of steel plate surface defects. This method achieves 97 % classification accuracy with 44 % labeled data on the NEU-CLS dataset and 92.3 % classification accuracy with 50 % labeled data on the milling steel surface defects dataset. The experimental results show that the proposed method can achieve steel surface defects classification accuracy of not less than 92 % with no more than 50 % of the dataset to be labeled, which indicates that this method has potential application in surface defect classification of industrial products.
... In the manufacturing process, the integration of physical information systems enables equipment to carry out intelligent decision-making and adaptive control of processing conditions, effectively ensuring forming quality. Park et al. [13] introduced data clustering-based machine learning (ML) to predict process thickness in steel plate ring-rolling operations, combining clustering algorithms and supervised learning algorithms to propose ML techniques. Bader et al. [14] investigated various methods of measuring bending curvature by comparing optical, tactile, and electromagnetic induction tests, correlating bending measurements with open-loop control to propose an optimal method for measuring bent steel strips for an automatic straightening process. ...
Article
Full-text available
Isothermal forging stands as an effective technology for the production of large-scale titanium alloy multi-rib components. However, challenges have persisted, including die underfilling and strain concentration due to the complex material flow and heterogeneous deformation within the forging die cavity. While approaches centered on optimized billet designs have mitigated these challenges, uncertainties in process parameters continue to introduce unacceptable variations in forming accuracy and stability. To tackle this issue, this study introduced a multi-objective robust optimization approach for billet design, accounting for the multi-rib eigenstructure and potential uncertainties. The approach includes finite element (FE) modeling for analyzing the die-filling and strain inhomogeneity within the multi-rib eigenstructure. Furthermore, it integrated image acquisition perception and feed back technologies (IAPF) for real-time monitoring of material flow and filling sequences within die rib-grooves, validating the accuracy of the FE modeling. By incorporating dimensional parameters of the billet and uncertainty factors, including friction, draft angle, forming temperature, speed, and deviations in billet and die, quantitative analyses on the rib-groove filling and strain inhomogeneity with fluctuation were conducted. Subsequently, a dual-response surface model was developed for statistical analysis of the cavity filling and strain homogeneity. Finally, the robust optimization was processed using a non-dominated sorting genetic algorithm II (NSGA-II) and validated using the IAPF technologies. The proposed approach enables robust design enhancements for rib-groove filling and strain homogeneity in titanium alloy multi-rib components.
... Accurately forecasting price trends and market behaviors amidst such multifaceted dynamics presents a formidable challenge. Conventional methods often struggle to encapsulate the intricacies and non-linear relationships inherent in this market [20]. This study endeavors to overcome these challenges by employing an enhanced Random Forest algorithm, strategically designed to navigate the labyrinthine nature of the steel industry. ...
Article
Full-text available
The steel industry, as a cornerstone of global manufacturing, is intrinsically linked to market dynamics influenced by complex factors. Predicting price fluctuations and market trends within this sector is crucial for strategic decision-making. This study presents an enhanced predictive analytics approach utilizing an Improved Random Forest algorithm to forecast steel market behaviors. Traditional Random Forest models have shown efficacy in handling large datasets and capturing nonlinear relationships. This research builds upon this foundation by introducing novel feature engineering techniques and fine-tuning hyperparameters to bolster predictive accuracy and mitigate overfitting. The study harnesses historical market data, incorporating key economic indicators, global demand-supply patterns, and market sentiment analysis as predictive features. The improved Random Forest model demonstrates substantial advancements in predictive power, showcasing superior performance metrics compared to algorithms like Support Vector Machine (SVM), Random Forest (RF), Linear Regression (LR), Naive Bayes (NB), and K-Nearest Neighbors (KNN) methodologies. Through rigorous evaluation using cross-validation techniques, the model exhibits commendable accuracy, precision, recall, and F1-score in predicting critical market movements. The findings underscore the efficacy of this enhanced Random Forest approach as a robust tool for predictive analytics in the dynamic steel market. The study's outcomes contribute to advancing predictive methodologies in the industry, facilitating proactive decision-making and risk mitigation strategies for stakeholders across the steel supply chain.
... Deep learning-based SPSD technology not only improves surface quality but also supports subsequent maintenance and upgrading of steel plate production and equipment. The data shows that it can achieve defects classification and location detection, and the location coordinates and area size of defects can be obtained in real-time and saved in a specified document [4][5][6]. Deep learning technology can not only provide a reliable basis for the production and maintenance of steel plates but also support technicians in upgrading steel plate production equipment, further improving production efficiency and increasing revenue generation. In addition, the methodological conclusion of the deep learning-based SPSD technology can also provide a detection reference for other surface defective items. ...
Article
Full-text available
This study aims to design efficient and reliable artificial intelligence vision detection models to improve detection efficiency and accuracy. The study filters defect-free images by image preprocessing and region of interest detection techniques. AlexNet network is enhanced by introducing attention mechanism modules, deep separable convolutions, and more to effectively boost the network's feature extraction capacity. An area convolutional neural network is developed to rapidly identify and locate defects on steel plate surfaces, utilizing an enhanced AlexNet network for feature extraction. Results demonstrated that the algorithm attained an average detection rate of 98 % and can identify defects in a minimal time of only 0.0011 seconds. For the detection of six types of steel plate defects, the average accuracy of the optimized fast regional convolutional neural network reached more than 0.9, especially for the detection of small-size defects with excellent performance. This improved AlexNet network has a great advantage in F1 value. The conclusion of the study shows that the designed artificial intelligence vision detection model has high detection accuracy, speed, and performance stability in steel plate surface defect detection and has a wide range of application prospects.
Article
Recently, research on applying artificial intelligence (AI) to various industries, especially manufacturing, is being actively conducted. In the field of smart factory, the purpose is to improve productivity based on data generated in the process of producing or processing products. The tool breakage during metal product processing causes fatal difficulties of predicting tool life. Moreover, if tool life is not predicted, defects may occur product reliability deteriorate, which may adversely affect product performance or economic aspects. In this paper, data related to machining is collected from CNC equipment in real time, and through machine learning and deep learning, which factors affect the wear of cutting tools are identified and the lifespan of cutting tools is predicted. An AI-based solution was applied to the system, productivity improved due to an increase in tool life.
Article
After cooling in the hot rolling process, the metallographic structure of microalloyed dual-phase steel is nonuniform along the rolling direction, while the thickness fluctuation of microalloyed dual-phase steel with a nonuniform metallographic structure will occur during cold rolling. The mechanism of nonuniform phase transformation of microalloyed dual-phase steels was studied during the cooling process after hot rolling, and the nonuniform phase transformation of microalloyed dual-phase steel was regulated during the cooling process after hot rolling through process optimization. First, the empirical equation of phase transformation temperature was measured by a dilatometer considering thermal expansion. Then, the phase field and temperature field of laminar cooling process were calculated to provide initial boundary conditions for the finite element model. After that, the coupling finite element model of the temperature phase transformation of the strip steel in coiling transportation process was established. The simulation results show that the different thermal contact conditions of the microalloyed dual-phase steel during coil transportation lead to uneven cooling of the coil, which leads to nonuniform transformation of the coil along the rolling direction. In addition, by prolonging the time interval from coiling to unloading, the phenomenon of nonuniform phase transformation of microalloyed dual-phase steel can be effectively controlled. The simulation results are applied to industrial production. The application results show that prolonging the time interval from coiling to unloading can effectively improve the nonuniform phase transformation of microalloyed dual-phase steel in the cooling process after hot rolling.
Article
Small steel bars (SSBs) are common steel products, and their surface defects pose a significant quality issue. Despite numerous studies conducted to detect and predict such defects, identifying the underlying causes, or ‘quality factors,’ remains a challenge. Most of the existing methods focused on validation of known factors based on the theories of physical properties and material composition; thus, they lack exploring new quality factors. This study proposes a two-step method for identifying these quality factors, considering two common issues that often arise when analyzing operational data in SSB manufacturing processes: ‘merged measurement’ and ‘variation due to operation dates.’ These issues are resulted because the rolling process is lack of measurement capability and exposed to external environment. They make it difficult to use various analytical methods such as statistical analysis and data mining methods. The proposed method sequentially employs statistical difference testing and k-means clustering, thereby providing a practical heuristic for overcoming the aforementioned difficulties. The effectiveness of the proposed method was demonstrated through a case study that highlighted its ability to effectively identify the quality factors critical to surface defects on SSBs.
Article
Full-text available
Industry 4.0 has evolved and created a huge interest in automation and data analytics in manufacturing technologies. Internet of Things (IoT) and Cyber Physical System (CPS) are some of the recent topics of interest in the manufacturing sector. Steel manufacturing process relies on monitoring strategies such as fault detection to reduce number of errors which can lead to huge losses. Proper fault diagnosis can assist in accurate decision-making. We use in this study predictive analysis to help solve the complex challenges faced in industrial data. Random Forest, Artificial Neural Networks and Support Vector Machines are used to train and test our industrial data. We evaluate how ensemble methods compare to classical machine learning algorithms. Finally we evaluate our models’ performance and significance. Random Forest outperformed other ML methods in our study.
Article
Full-text available
In the hot continuous rolling process, the main factor affecting the actual thickness of strip is the rolling force. The precision of rolling force calculation is the key to realize accurate on-line control. However, because of the complexity and nonlinearity of the rolling process, as well as many influencing factors, the theoretical analysis of the traditional rolling force prediction model often needs to be simplified and hypothesized. This leads to the incompleteness of the mathematical model and the deviation between the calculated results and the actual working conditions. In this paper, a rolling force prediction method based on genetic algorithm (GA), particle swarm optimization algorithm (PSO), and multiple hidden layer extreme learning machine (MELM) is proposed, namely, PSO-GA-MELM algorithm, which takes MELM as the basic model for rolling force prediction. In the modeling process, GA is used to determine the optimal number of hidden layers and the optimal number of hidden nodes, and PSO is used to search for the optimal input weights and biases. This method avoids the influence of human intervention on the model and saves the modeling time. This paper takes the actual production data of BaoSteel 2050 production line as experimental data, and the experimental results indicate that the algorithm can be effectively used to determine the optimal network structure of MELM. The rolling force prediction model trained by the algorithm has excellent performance in prediction accuracy, computational stability, and the number of hidden nodes and is applicable to the prediction of rolling force in hot continuous rolling process.
Article
Full-text available
Hybrid Bayesian Networks (HBNs), which contain both discrete and continuous variables, arise naturally in many application areas (e.g., image understanding, data fusion, medical diagnosis, fraud detection). This paper concerns inference in an important subclass of HBNs, the conditional Gaussian (CG) networks, in which all continuous random variables have Gaussian distributions and all children of continuous random variables must be continuous. Inference in CG networks can be NP-hard even for special-case structures, such as poly-trees, where inference in discrete Bayesian networks can be performed in polynomial time. Therefore, approximate inference is required. In approximate inference, it is often necessary to trade off accuracy against solution time. This paper presents an extension to the Hybrid Message Passing inference algorithm for general CG networks and an algorithm for optimizing its accuracy given a bound on computation time. The extended algorithm uses Gaussian mixture reduction to prevent an exponential increase in the number of Gaussian mixture components. The trade-off algorithm performs pre-processing to find optimal run-time settings for the extended algorithm. Experimental results for four CG networks compare performance of the extended algorithm with existing algorithms and show the optimal settings for these CG networks.
Article
Full-text available
With the development and application of advanced technologies such as Cyber Physical System, Internet of Things, Industrial Internet of Things, Artificial Intelligence, Big Data, Cloud Computing, Blockchain, etc., more manufacturing enterprises are transforming to intelligent enterprises. Smart manufacturing systems (SMSs) have become the focus of attention of some countries and manufacturing enterprises. At present, there are some applications of SMSs in different industrial fields. However, there is still a lack of a unified definition of SMSs and a unified analysis of requirements. In order to have a comprehensive understanding of SMSs, this paper summarized the evolution, definition, objectives, functional requirements, business requirements, technical requirements, and components of SMSs. At the same time, it points out the current development status and level. Based on above, an autonomous SMSs model driven by dynamic demand and key performance indicators is proposed. Through the review of this paper, the reference can be provided for the transformation of more manufacturing enterprises from the traditional to the intellectualized ones.
Chapter
Situation awareness is a growing need for manufacturing operators with the digital transformation of manufacturing environments where operators are expected to take larger responsibilities and tasks on the production flow. To make effective production control decisions, workers need to be aware of the situation that consists of multiple factors such as the production status, and internal and external demand requirements. Adapting existing models for situation awareness, this paper presents four case examples of manufacturing companies that implement digital technologies for situation awareness.
Article
Reversible cold rolling process is a well-known method of metal forming. Since the process is highly-automated, predicting process parameters is essential to optimize processing time. An iterative algorithm has been developed and integrated into the industrial reversible cold rolling process. Regression and artificial neural network algorithms have been used and compared for prediction. To obtain high accuracy, the best fitted algorithms have been selected in each pass. Moreover, regression-ANN hybrid algorithms have been developed for the cases where one algorithm is insufficient to predict all parameters accurately. Finally, processing times have been calculated for the optimization process.
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.