Content uploaded by Soundar Kumara

Author content

All content in this area was uploaded by Soundar Kumara on Feb 24, 2020

Content may be subject to copyright.

Dazhong Wu

1

Department of Industrial and

Manufacturing Engineering,

National Science Foundation

Center for e-Design,

Pennsylvania State University,

University Park, PA 16802

e-mail: dxw279@psu.edu

Connor Jennings

Department of Industrial and

Manufacturing Engineering,

National Science Foundation

Center for e-Design,

Pennsylvania State University,

University Park, PA 16802

e-mail: connor@psu.edu

Janis Terpenny

Department of Industrial and

Manufacturing Engineering,

National Science Foundation

Center for e-Design,

Pennsylvania State University,

University Park, PA 16802

e-mail: jpt5311@psu.edu

Robert X. Gao

Department of Mechanical and

Aerospace Engineering,

Case Western Reserve University,

Cleveland, OH 44106

e-mail: robert.gao@case.edu

Soundar Kumara

Department of Industrial and

Manufacturing Engineering,

Pennsylvania State University,

University Park, PA 16802

e-mail: skumara@psu.edu

A Comparative Study on

Machine Learning Algorithms

for Smart Manufacturing: Tool

Wear Prediction Using

Random Forests

Manufacturers have faced an increasing need for the development of predictive models

that predict mechanical failures and the remaining useful life (RUL) of manufacturing

systems or components. Classical model-based or physics-based prognostics often

require an in-depth physical understanding of the system of interest to develop closed-

form mathematical models. However, prior knowledge of system behavior is not always

available, especially for complex manufacturing systems and processes. To complement

model-based prognostics, data-driven methods have been increasingly applied to machin-

ery prognostics and maintenance management, transforming legacy manufacturing sys-

tems into smart manufacturing systems with artiﬁcial intelligence. While previous

research has demonstrated the effectiveness of data-driven methods, most of these prog-

nostic methods are based on classical machine learning techniques, such as artiﬁcial

neural networks (ANNs) and support vector regression (SVR). With the rapid advance-

ment in artiﬁcial intelligence, various machine learning algorithms have been developed

and widely applied in many engineering ﬁelds. The objective of this research is to intro-

duce a random forests (RFs)-based prognostic method for tool wear prediction as well as

compare the performance of RFs with feed-forward back propagation (FFBP) ANNs and

SVR. Speciﬁcally, the performance of FFBP ANNs, SVR, and RFs are compared using an

experimental data collected from 315 milling tests. Experimental results have shown that

RFs can generate more accurate predictions than FFBP ANNs with a single hidden layer

and SVR. [DOI: 10.1115/1.4036350]

Keywords: tool wear prediction, predictive modeling, machine learning, random forests

(RFs), support vector machines (SVMs), artiﬁcial neural networks (ANNs), prognostics

and health management (PHM)

1 Introduction

Smart manufacturing aims to integrate big data, advanced ana-

lytics, high-performance computing, and Industrial Internet of

Things (IIoT) into traditional manufacturing systems and proc-

esses to create highly customizable products with higher quality at

lower costs. As opposed to traditional factories, a smart factory

utilizes interoperable information and communications technolo-

gies (ICT), intelligent automation systems, and sensor networks to

monitor machinery conditions, diagnose the root cause of failures,

and predict the remaining useful life (RUL) of mechanical sys-

tems or components. For example, almost all engineering systems

(e.g., aerospace systems, nuclear power plants, and machine tools)

are subject to mechanical failures resulting from deterioration

with usage and age or abnormal operating conditions [1–3].

Some of the typical failure modes include excessive load, over-

heating, deﬂection, fracture, fatigue, corrosion, and wear. The

degradation and failures of engineering systems or components

will often incur higher costs and lower productivity due to unex-

pected machine downtime. In order to increase manufacturing

productivity while reducing maintenance costs, it is crucial to

develop and implement an intelligent maintenance strategy that

allows manufacturers to determine the condition of in-service sys-

tems in order to predict when maintenance should be performed.

Conventional maintenance strategies include reactive, preven-

tive, and proactive maintenance [4–6]. The most basic approach

to maintenance is reactive, also known as run-to-failure mainte-

nance planning. In the reactive maintenance strategy, assets are

deliberately allowed to operate until failures actually occur. The

assets are maintained on an as-needed basis. One of the disadvan-

tages of reactive maintenance is that it is difﬁcult to anticipate the

maintenance resources (e.g., manpower, tools, and replacement

parts) that will be required for repairs. Preventive maintenance is

often referred to as use-based maintenance. In preventive mainte-

nance, maintenance activities are performed after a speciﬁed

period of time or amount of use based on the estimated probability

that the systems or components will fail in the speciﬁed time inter-

val. Although preventive maintenance allows for more consistent

and predictable maintenance schedules, more maintenance activ-

ities are needed as opposed to reactive maintenance. To improve

1

Corresponding author.

Manuscript received October 25, 2016; ﬁnal manuscript received March 13,

2017; published online April 18, 2017. Assoc. Editor: Laine Mears.

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-1

Copyright V

C2017 by ASME

Downloaded from https://asmedigitalcollection.asme.org/manufacturingscience/article-pdf/139/7/071018/6405639/manu_139_07_071018.pdf by The Pennsylvania State University user on 24 February 2020

the efﬁciency and effectiveness of preventive maintenance, pre-

dictive maintenance is an alternative strategy in which mainte-

nance actions are scheduled based on equipment performance or

conditions instead of time. The objective of proactive mainte-

nance is to determine the condition of in-service equipment and

ultimately to predict the time at which a system or a component

will no longer meet desired functional requirements.

The discipline that predicts health condition and remaining use-

ful life (RUL) based on previous and current operating conditions

is often referred to as prognostics and health management (PHM).

Prognostic approaches fall into two categories: model-based and

data-driven prognostics [7–12]. Model-based prognostics refer to

approaches based on mathematical models of system behavior

derived from physical laws or probability distribution. For exam-

ple, model-based prognostics include methods based on Wiener

and Gamma processes [13], hidden Markov models (HMMs) [14],

Kalman ﬁlters [15,16], and particle ﬁlters [17–20]. One of the

limitations of model-based prognostics is that an in-depth under-

standing of the underlying physical processes that lead to system

failures is required. Another limitation is that it is assumed that

underlying processes follow certain probability distributions, such

as gamma or normal distributions. While probability density func-

tions enable uncertainty quantiﬁcation, distributional assumptions

may not hold true in practice.

To complement model-based prognostics, data-driven prognos-

tics refer to approaches that build predictive models using learn-

ing algorithms and large volumes of training data. For example,

classical data-driven prognostics are based on autoregressive

(AR) models, multivariate adaptive regression, fuzzy set theory,

ANNs, and SVR. The unique beneﬁt of data-driven methods is

that an in-depth understanding of system physical behaviors is not

a prerequisite. In addition, data-driven methods do not assume

any underlying probability distributions which may not be practi-

cal for real-world applications. While ANNs and SVR have been

applied in the area of data-driven prognostics, little research has

been conducted to evaluate the performance of other machine

learning algorithms [21]. Because RFs have the potential to han-

dle a large number of input variables without variable selection

and they do not overﬁt [22–24], we investigate the ability of RFs

for the prediction of tool wear using an experimental dataset.

Further, the performance of RFs is compared with that of FFBP

ANNs and SVR using accuracy and training time.

The main contributions of this paper include the followings:

Tool wear in milling operations is predicted using RFs along

with cutting force, vibration, and acoustic emission (AE) sig-

nals. Experimental results have shown that the predictive

model trained by RFs is very accurate. The mean squared

error (MSE) on the test tool wear data is up to 7.67. The

coefﬁcient of determination (R

2

) on the test tool wear data is

up to 0.992. To the best of our knowledge, the random forest

algorithm is applied to predict tool wear for the ﬁrst time.

The performances of ANNs, support vector machines

(SVMs), and RFs are compared using an experimental data-

set with respect to the accuracy of regression (e.g., MSE and

R

2

) and training time. While the training time for RFs is lon-

ger than that of ANNs and SVMs, the predictive model built

by RFs is the most accurate for the application example.

The remainder of the paper is organized as follows: Section 2

reviews the related literature on data-driven methods for tool wear pre-

diction. Section 3presents the methodology for tool wear prediction

using ANNs, SVMs, and RFs. Section 4presents an experimental

setup and the experimental dataset acquired from different types of

sensors (e.g., cutting force sensor, vibration sensor, acoustic emis-

sion sensor) on a computer numerical control (CNC) milling

machine. Section 5presents experimental results, demonstrates the

effectiveness of the three machine learning algorithms, and com-

pares the performance of each. Section 6provides conclusions that

include a discussion of research contribution and future work.

2 Data-Driven Methods for Tool Wear Prediction

Tool wear is the most commonly observed and unavoidable

phenomenon in manufacturing processes, such as drilling, milling,

and turning [25–27]. The rate of tool wear is typically affected by

process parameters (e.g., cutting speed and feed rate), cutting tool

geometry, and properties of workpiece and tool materials. Tay-

lor’s equation for tool life expectancy [28] provides an approxi-

mation of tool wear. However, with the rapid advancement of

sensing technology and increasing number of sensors equipped on

modern CNC machines, it is possible to predict tool wear more

accurately using various measurement data. This section presents

a review of data-driven methods for tool wear prediction.

Schwabacher and Goebel [29] conducted a review of data-

driven methods for prognostics. The most popular data-driven

approaches to prognostics include ANNs, decision trees, and

SVMs in the context of systems health management. ANNs are a

family of computational models based on biological neural net-

works which are used to estimate complex relationships between

inputs and outputs. Bukkapatnam et al. [30–32] developed effec-

tive tool wear monitoring techniques using ANNs based on fea-

tures extracted from the principles of nonlinear dynamics. €

Ozel

and Karpat [33] presented a predictive modeling approach for sur-

face roughness and tool wear for hard turning processes using

ANNs. The inputs of the ANN model include workpiece hardness,

cutting speed, feed rate, axial cutting length, and mean values of

three force components. Experimental results have shown that the

model trained by ANNs provides accurate predictions of surface

roughness and tool ﬂank wear. Palanisamy et al. [34] developed a

predictive model for predicting tool ﬂank wear in end milling

operations using feed-forward back propagation (FFBP) ANNs.

Experimental results have shown that the predictive model based

on ANNs can make accurate predictions of tool ﬂank wear using

cutting speeds, feed rates, and depth of cut. Sanjay et al. [35]

developed a model for predicting tool ﬂank wear in drilling using

ANNs. The feed rates, spindle speeds, torques, machining times,

and thrust forces are used to train the ANN model. The experi-

mental results have demonstrated that ANNs can predict tool wear

accurately. Chungchoo and Saini [36] developed an online fuzzy

neural network (FNN) algorithm that estimates the average width

of ﬂank wear and maximum depth of crater wear. A modiﬁed

least-square backpropagation neural network was built to estimate

ﬂank and crater wear based on cutting force and acoustic emission

signals. Chen and Chen [37] developed an in-process tool wear

prediction system using ANNs for milling operations. A total of

100 experimental data were used for training the ANN model.

The input variables include feed rate, depth of cut, and average

peak cutting forces. The ANN model can predict tool wear with

an error of 0.037 mm on average. Paul and Varadarajan [38] intro-

duced a multisensor fusion model to predict tool wear in turning

processes using ANNs. A regression model and an ANN were

developed to fuse the cutting force, cutting temperature, and

vibration signals. Experimental results showed that the coefﬁcient

of determination was 0.956 for the regression model trained by

the ANN. Karayel [39] presented a neural network approach

for the prediction of surface roughness in turning operations. A

feed-forward back-propagation multilayer neural network was

developed to train a predictive model using the data collected

from 49 cutting tests. Experimental results showed that the predic-

tive model has an average absolute error of 2.29%.

Cho et al. [40] developed an intelligent tool breakage detection

system with the SVM algorithm by monitoring cutting forces and

power consumption in end milling processes. Linear and polyno-

mial kernel functions were applied in the SVM algorithm. It has

been demonstrated that the predictive model built by SVMs can

recognize process abnormalities in milling. Benkedjouh et al. [41]

presented a method for tool wear assessment and remaining useful

life prediction using SVMs. The features were extracted from

cutting force, vibration, and acoustic emission signals. The experi-

mental results have shown that SVMs can be used to estimate the

071018-2 / Vol. 139, JULY 2017 Transactions of the ASME

Downloaded from https://asmedigitalcollection.asme.org/manufacturingscience/article-pdf/139/7/071018/6405639/manu_139_07_071018.pdf by The Pennsylvania State University user on 24 February 2020

wear progression and predict RUL of cutting tools effectively. Shi

and Gindy [42] introduced a predictive modeling method by com-

bining least squares SVMs and principal component analysis

(PCA). PCA was used to extract statistical features from multiple

sensor signals acquired from broaching processes. Experimental

results showed that the predictive model trained by SVMs was

effective to predict tool wear using the features extracted by PCA.

Another data-driven method for prognostics is based on deci-

sion trees. Decision trees are a nonparametric supervised learning

method used for classiﬁcation and regression. The goal of deci-

sion tree learning is to create a model that predicts the value of a

target variable by learning decision rules inferred from data fea-

tures. A decision tree is a ﬂowchart-like structure in which each

internal node denotes a test on an attribute, each branch represents

the outcome of a test, and each leaf node holds a class label. Jiaa

and Dornfeld [43] proposed a decision tree-based method for the

prediction of tool ﬂank wear in a turning operation using acoustic

emission and cutting force signals. The features characterizing the

AE root-mean-square and cutting force signals were extracted from

both time and frequency domains. The decision tree approach was

demonstrated to be able to make reliable inferences and decisions

on tool wear classiﬁcation. Elangovan et al. [44] developed a deci-

sion tree-based algorithm for tool wear prediction using vibration

signals. Ten-fold cross-validation was used to evaluate the accuracy

of the predictive model created by the decision tree algorithm. The

maximum classiﬁcation accuracy was 87.5%. Arisoy and €

Ozel [45]

investigated the effects of machining parameters on surface micro-

hardness and microstructure such as grain size and fractions using a

random forests-based predictive modeling method along with

ﬁnite element simulations. Predicted microhardness proﬁles and

grain sizes were used to understand the effects of cutting speed,

tool coating, and edge radius on the surface integrity.

In summary, the related work presented in this section builds

on previous research to explore how the conditions of tool wear

can be monitored as well as how tool wear can be predicted using

predictive modeling. While earlier work focused on prediction of

tool wear using ANNs, SVMs, and decision trees, this paper

explores the potential of a new method, random forests, for tool

wear prediction. Further, the performance of RFs is compared

with that of ANNs and SVMs. Because RFs are an extension of

decision trees, the performance of RFs is not compared with that

of decision trees.

3 Methodology

This section presents the methodology for data-driven prognos-

tics for tool wear prediction using ANNs, SVR, and RFs. The

input of ANNs, SVR, and RFs is the following labeled training

data:

D¼ðxi;yiÞ

where xi¼ðFX;FY;FZ;VX;VY;VZ;AEÞ,yi2R. The description

of these input data can be found in Table 1.

3.1 Tool Wear Prediction Using ANNs. ANNs are a family

of models inspired by biological neural networks. An ANN is

deﬁned by three types of parameters: (1) the interconnection pat-

tern between different layers of neurons, (2) the learning process

for updating the weights of the interconnections, and (3) the acti-

vation function that converts a neuron’s weighted input to its out-

put activation. Among many types of ANNs, the feed-forward

neural network is the ﬁrst and the most popular ANN. Back-

propagation is a learning algorithm for training ANNs in conjunc-

tion with an optimization method such as gradient descent.

Figure 1illustrates the architecture of the FFBP ANN with a

single hidden layer. In this research, the ANN has three layers,

including input layer i, hidden layer j, and output layer k. Each

layer consists of one or more neurons or units, represented by

the circles. The ﬂow of information is represented by the lines

between the units. The ﬁrst layer has input neurons which act as

buffers for distributing the extracted features (i.e., Fi) from the

input data (i.e., xi). The number of the neurons in the input layer

is the same as that of extracted features from input variables.

Each value from the input layer is duplicated and sent to all

neurons in the hidden layer. The hidden layer is used to process

and connect the information from the input layer to the output

layer in a forward direction. Speciﬁcally, these values entering a

neuron in the hidden layer are multiplied by weights wij . Initial

weights are randomly selected between 0 and 1. A neuron in the

hidden layer sums up the weighted inputs and generates a single

output. This value is the input of an activation function (sigmoid

function) in the hidden layer fhthat converts the weighted input

to the output of the neuron. Similarly, the outputs of all the neu-

rons in the hidden layer are multiplied by weights wjk . A neural

in the output layer sums up the weighted inputs and generates a

single value. An activation function in the output layer focon-

verts the weighted input to the predicted output ykof the ANN,

which is the predicted ﬂank wear VB. The output layer has only

one neuron because there is only one response variable. The per-

formance of ANNs depends on the topology or architecture of

ANNs (i.e., the number of layers) and the number of neurons in

each layer. However, there are no standard or well-accepted

methods or rules for determining the number of hidden layers

and neurons in each hidden layer. In this research, the single-

hidden-layer ANNs with 2, 4, 8, 16, and 32 neurons in the hid-

den layer are selected. The termination criterion of the training

algorithm is that training stops if the ﬁt criterion (i.e., least

squares) falls below 1.0 10

4

.

Table 1 Signal channel and data description

Signal channel Data description

Channel 1 FX: force (N) in Xdimension

Channel 2 FY:force (N) in Ydimension

Channel 3 FZ: force (N) in Zdimension

Channel 4 VX: vibration (g) in Xdimension

Channel 5 VY: vibration (g) in Ydimension

Channel 6 VZ: vibration (g) in Zdimension

Channel 7 AE: acoustic emission (V)

Fig. 1 Tool wear prediction using a feed-forward back-

propagation (FFBP) ANN

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-3

Downloaded from https://asmedigitalcollection.asme.org/manufacturingscience/article-pdf/139/7/071018/6405639/manu_139_07_071018.pdf by The Pennsylvania State University user on 24 February 2020

3.2 Tool Wear Prediction Using SVR. The original SVM

for regression was developed by Vapnik and coworkers [46,47]. A

SVM constructs a hyperplane or set of hyperplanes in a high- or

inﬁnite-dimensional space, which can be used for classiﬁcation

and regression.

The framework of SVR for linear cases is illustrated in Fig. 2.

Formally, SVR can be formulated as a convex optimization

problem

Minimize 1

2kxk2þCX

‘

i¼1

niþn

i

Subject to

yihx;xiibeþni

hx;xiiþbyieþn

i

ni;n

i0

8

>

<

>

:

(3.1)

where x2v;C¼1, e¼0:1, and ni;n

i¼0:001. bcan be com-

puted as follows:

b¼yihx;xiiefor ai2½0;C

b¼yihx;xiiþefor a

i2½0;C(3.2)

For nonlinear SVR, the training patterns xican be preprocessed

by a nonlinear kernel function kðx;x0Þ:¼hUðxÞ;UðxÞi0, where

UðxÞis a transformation that maps xto a high-dimensional space.

These kernel functions need to satisfy the Mercer’s theorem.

Many kernels have been developed for various applications. The

most popular kernels include polynomial, Gaussian radial basis

function (RBF), and sigmoid. In many applications, a nonlinear

kernel function provides better accuracy. According to the litera-

ture [32,33], the Gaussian RBF kernel is one of the most effective

kernel functions used in tool wear prediction. In this research, the

Gaussian RBF kernel is used to transform the input dataset

D¼ðxi;yiÞ, where xiis the input vector and yiis the response

variable (i.e., ﬂank wear) into a new dataset in a high-dimensional

space. The new dataset is linearly separable by a hyperplane in a

higher-dimensional Euclidean space as illustrated in Fig. 2. The

slack variables niand n

iare introduced in the instances where the

constraints are infeasible. The slack variables denote the deviation

from predicted values with the error of e¼0:1. The RBF kernel is

kðxi;xjÞ¼exp ðððkxixjk2Þ=2r2ÞÞ, where r2¼0:5. At the

optimal solution, we obtain

x¼X

‘

i¼1

ðaia

iÞUðxÞand fðxÞ¼X

‘

i¼1

ðaia

iÞkðxi;xjÞþb

(3.3)

3.3 Tool Wear Prediction Using RFs. The random forest

algorithm, developed by Breiman [22,48], is an ensemble learning

method that constructs a forest of decision trees from bootstrap

samples of a training dataset. Each decision tree produces a

response, given a set of predictor values. In a decision tree, each

internal node represents a test on an attribute, each branch repre-

sents the outcome of the test, and each leaf node represents a class

label for classiﬁcation or a response for regression. A decision

tree in which the response is continuous is also referred to as a

regression tree. In the context of tool wear prediction, each indi-

vidual decision tree in a random forest is a regression tree because

tool wear describes the gradual failure of cutting tools. A compre-

hensive tutorial on RFs can be found in Refs. [22,48,49]. Some of

the important concepts related to RFs, including bootstrap aggre-

gating or bagging, slipping, and stopping criterion, are introduced

in Secs. 3.3.1–3.3.4.

3.3.1 Bootstrap Aggregating or Bagging. Given a training

dataset D¼fðx1;y1Þ;ðx2;y2Þ;…;ðxN;yNÞg, bootstrap aggregating

or bagging generates Bnew training datasets Diof size Nby sam-

pling from the original training dataset Dwith replacement. Diis

referred to as a bootstrap sample. By sampling with replacement

or bootstrapping, some observations may be repeated in each Di.

Bagging helps reduce variance and avoid overﬁtting. The number

of regression trees Bis a parameter speciﬁed by users. Typically,

a few hundred to several thousand trees are used in the random

forest algorithm.

3.3.2 Choosing Variables to Split On. For each of the boot-

strap samples, grow an un-pruned regression tree with the follow-

ing procedure: At each node, randomly sample mvariables and

choose the best split among those variables rather than choosing

the best split among all predictors. This process is sometimes

called “feature bagging.” The reason why a random subset of the

predictors or features is selected is because the correlation of the

trees in an ordinary bootstrap sample can be reduced. For regres-

sion, the default m¼p=3.

3.3.3 Splitting Criterion. Suppose that a partition is divided

into Mregions R1,R2,…, Rm. The response is modeled as a con-

stant cmin each region

fðxÞ¼X

M

m¼1

cmIðxRmÞ(3.4)

The splitting criterion at each node is to minimize the sum of

squares. Therefore, the best c

cmis the average of yiin region Rm

c

cm¼aveðyijxiRmÞ(3.5)

Consider a splitting variable jand split point s, and deﬁne the

pair of half-planes

R1ðj;sÞ¼fXjXjsgand R2ðj;sÞ¼fXjXjsg(3.6)

The splitting variable jand split point sshould satisfy

min

j;smin

c1X

xi2R1ðj;sÞ

ðyic1Þ2þmin

c2X

xi2R2ðj;sÞ

ðyic2Þ2

(3.7)

For any jand s, the inner minimization is solved by

b

c1¼aveðyijxiR1ðj;sÞÞ and b

c2¼aveðyijxiR2ðj;sÞÞ (3.8)

Having found the best split, the dataset is partitioned into two

resulting regions and repeat the splitting process on each of the

two regions. This splitting process is repeated until a predeﬁned

stopping criterion is satisﬁed.

3.3.4 Stopping Criterion. Tree size is a tuning parameter gov-

erning the complexity of a model. The stopping criterion is that

the splitting process proceeds until the number of records in Di

falls below a threshold, and ﬁve is used as the threshold.

Fig. 2 Tool wear prediction using SVR

071018-4 / Vol. 139, JULY 2017 Transactions of the ASME

After Bsuch trees fTbgB

1are constructed, a prediction at a new

point xcan be made by averaging the predictions from all the indi-

vidual Bregression trees on x

b

fB

rf x

ðÞ

¼1

BX

B

b¼1

Tbx

ðÞ (3.9)

The random forest algorithm [48,49] for regression is as follows:

(1) Draw a bootstrap sample Zof size Nfrom the training data.

(2) For each bootstrap sample, construct a regression tree by

splitting a node into two children nodes until the stopping

criterion is satisﬁed.

(3) Output the ensemble of trees fTbgB

1.

(4) Make a prediction at a new point xby aggregating the pre-

dictions of the Btrees.

The framework of predicting ﬂank wear using an RF is illus-

trated in Fig. 3. In this research, a random forest is constructed

using B¼500 regression trees. Given the labeled training dataset

D¼ðxi;yiÞ, a bootstrap sample of size N¼630 is drawn from

the training dataset. For each regression tree, m¼9ðm¼ðp=3Þ;

p¼28Þvariables are selected at random from the 28 variables/

features. The best variable/split-point is selected among the nine

variables. A regression tree progressively splits the training data-

set into two child nodes: left node (with samples <z) and right

node (with samples z). A splitting variable and split point are

selected by solving Eqs. (3.7) and (3.8). The process is applied

recursively on the dataset in each child node. The splitting process

stops if the number of records in a node is less than 5. An

individual regression tree is built by starting at the root node of

the tree, performing a sequence of tests about the predictors, and

organizing the tests in a hierarchical binary tree structure as

shown in Fig. 4. After 500 regression trees are constructed, a pre-

diction at a new point can be made by averaging the predictions

from all the individual binary regression trees on this point.

4 Experimental Setup

The data used in this paper were obtained from Li et al. [50].

Some details of the experiment are presented in this section. The

experimental setup is shown in Fig. 5.

The cutter material and workpiece material used in the experi-

ment are high-speed steel and stainless steel, respectively. The

detailed description of the operating conditions in the dry milling

operation can be found in Table 2. The spindle speed of the cutter

was 10,400 RPM. The feed rate was 1555 mm/min. The Ydepth

of cut (radial) was 0.125 mm. The Zdepth of cut (axial) was

0.2 mm.

315 cutting tests were conducted on a three-axis high-speed

CNC machine (R€

oders Tech RFM 760). During each cutting test,

seven signal channels, including cutting force, vibration, and

acoustic emission data, were monitored in real-time. The sampling

rate was 50 kHz/channel. Each cutting test took about 15 s. A sta-

tionary dynamometer, mounted on the table of the CNC machine,

was used to measure cutting forces in three, mutually perpendicu-

lar axes (x,y, and zdimensions). Three piezo accelerometers,

mounted on the workpiece, were used to measure vibration in

three, mutually perpendicular axes (x,y, and zdimensions). An

acoustic emission (AE) sensor, mounted on the workpiece, was

Fig. 3 Tool wear prediction using an RF

Fig. 4 Binary regression tree growing process

Fig. 5 Experimental setup

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-5

used to monitor a high-frequency oscillation that occurs spontane-

ously within metals due to crack formation or plastic deformation.

Acoustic emission is caused by the release of strain energy as the

microstructure of the material is rearranged. After each cutting

test, the value of tool wear was measured off-line using a micro-

scope (Leica MZ12). The total size of the condition monitoring

data is about 8.67 GB.

5 Results and Discussion

In machine learning, feature extraction is an essential prepro-

cessing step in which raw data collected from various signal chan-

nels are converted into a set of statistical features in a format

supported by machine learning algorithms. The statistical features

are then given as an input to a machine learning algorithm. In this

experiment, the condition monitoring data were collected from (1)

cutting force, (2) vibration, and (3) acoustic emission signal chan-

nels. A set of statistical features (28 features) was extracted from

these signals, including maximum, median, mean, and standard

deviation as listed in Table 3.

Three predictive models were developed using ANNs, SVR,

and RFs, respectively. Two-thirds (2/3) of the input data (i.e.,

three datasets) were selected at random for model development

(training). The remainder (1/3) of the input data was used for

model validation (testing). Figures 6–8show the predicted against

observed tool wear values with the test dataset using ANNs, SVR,

and RFs, respectively. Figure 9shows the tool wear against time

with RFs.

In addition, the performance of the three algorithms was

evaluated on the test dataset using accuracy and training time.

Accuracy is measured using the R2statistic, also referred to as

the coefﬁcient of determination, and mean squared error (MSE).

In statistics, the coefﬁcient of determination is deﬁned as

R2¼1ðSSE=SSTÞ, where SSE is the sum of the squares of

Table 2 Operating conditions

Parameter Value

Spindle speed 10,400 RPM

Feed rate 1555 mm/min

Ydepth of cut 0.125 mm

Zdepth of cut 0.2 mm

Sampling rate 50 kHz/channel

Material Stainless steel

Table 3 List of extracted features

Cutting force

(X,Y,Zdimensions)

Vibration

(X,Y,Zdimensions)

Acoustic

emission

Max Max Max

Median Median Median

Mean Mean Mean

Standard deviation Standard deviation Standard deviation

Fig. 6 Comparison of observed and predicted tool wear using

an ANN with 16 neurons in the hidden layer (termination crite-

rion: tolerance is equal to 1.0 310

24

)

Fig. 7 Comparison of observed and predicted tool wear using

SVR (termination criterion: slack variable or tolerance nis equal

to 0.001)

Fig. 8 Comparison of observed and predicted tool wear using

RFs (termination criterion: minimum number of samples in

each node is equal to 5)

071018-6 / Vol. 139, JULY 2017 Transactions of the ASME

residuals, SST is the total sum of squares. The coefﬁcient of deter-

mination is a measure that indicates the percentage of the response

variable variation that is explained by a regression model. A

higher R-squared indicates that more variability is explained by

the regression model. For example, an R2of 100% indicates that

the regression model explains all the variability of the response

data around its mean. In general, the higher the R-squared, the

better the regression model ﬁts the data. The MSE of an estimator

measures the average of the squares of the errors. The MSE is

deﬁned as MSE ¼ð1=nÞPn

i¼1ðb

yiyiÞ2, where b

yiis a predicted

value, yiis an observed value, and nis the sample size. The ANN,

SVR, and RF algorithms use between 50% and 90% of the input

data for model development (training) and use the remainder for

model validation (testing). Because the performance of ANNs

depends on the hidden layer conﬁguration, ﬁve ANNs with a sin-

gle hidden layer but different number of neurons were tested on

the training dataset. Tables 4–8list the MSE, R-squared, and train-

ing time for the ANNs with 2, 4, 8, 16, and 32 neurons. With

respect to the performance of the ANN, the training time increases

as the number of neurons increases. However, the increased in

training time are not signiﬁcant as shown in Fig. 10. In addition,

while the prediction accuracy increases as the number of neurons

increases, the performance is not signiﬁcantly improved by adding

more than eight neurons in the hidden layer as shown in Figs. 11

and 12. Tables 9and 10 list the MSE, R-squared, and training

time for SVR and RFs. While the training time for RFs is longer

than that of ANNs and SVR, the predictive model built by RFs is

the most accurate as shown in Figs. 10–12.

Table 5 Accuracy on the test data and training time for the

FFBP ANN with four neurons in the hidden layer

ANN (number of neurons ¼4)

Training size (%) MSE R

2

Training time (s)

50 43.428 0.958 0.122

60 51.001 0.951 0.084

70 43.645 0.958 0.093

80 45.661 0.955 0.103

90 45.058 0.958 0.118

Table 6 Accuracy on the test data and training time for the

FFBP ANN with eight neurons in the hidden layer

ANN (number of neurons ¼8)

Training size (%) MSE R

2

Training time (s)

50 36.810 0.964 0.167

60 34.168 0.968 0.186

70 39.795 0.961 0.202

80 44.175 0.957 0.197

90 46.634 0.954 0.234

Table 7 Accuracy on the test data and training time for the

FFBP ANN with 16 neurons in the hidden layer

ANN (number of neurons ¼16)

Training size (%) MSE R

2

Training time (s)

50 36.337 0.964 0.394

60 41.420 0.959 0.412

70 40.138 0.960 0.468

80 42.486 0.957 0.506

90 44.056 0.957 0.566

Fig. 9 Tool wear against time (cut) using RFs

Table 4 Accuracy on the test data and training time for the

FFBP ANN with two neurons in the hidden layer

ANN (number of neurons ¼2)

Training size (%) MSE R

2

Training time (s)

50 49.790 0.951 0.049

60 45.072 0.955 0.054

70 45.626 0.956 0.055

80 47.966 0.953 0.062

90 48.743 0.955 0.056

Table 8 Accuracy on the test data and training time for the

FFBP ANN with 32 neurons in the hidden layer

ANN (number of neurons ¼32)

Training size (%) MSE R

2

Training time (s)

50 35.305 0.965 1.165

60 38.612 0.963 1.301

70 38.824 0.963 1.498

80 42.469 0.959 1.496

90 48.138 0.953 1.633

Fig. 10 Comparison of training times

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-7

6 Conclusions and Future Work

In this paper, the prediction of tool wear in milling operations

was conducted using three popular machine learning algorithms,

including ANNs, SVR, and RFs. The performance of these

algorithms was evaluated on the dataset collected from 315 mill-

ing tests. The performance measures include mean squared error,

R-squared, and training time. A set of statistical features was

extracted from cutting forces, vibrations, and acoustic emissions.

The experimental results have shown that while the training time

on the particular dataset using RFs is longer than the FFBP ANNs

with a single hidden layer and SVR, RFs generate more accurate

predictions than the FFBP ANNs with a single hidden layer and

SVR. The main contribution of this paper is twofold: (1) we dem-

onstrated that the predictive model trained by RFs can predict tool

wear in milling processes very accurately for the ﬁrst time to the

best of our knowledge and (2) we compared the performance of

RFs with that of FFBP ANNs and SVR, as well as observed

that RFs outperform FFBP ANNs and SVR for this particular

application example.

In the future, a comparison of the performance of SVR and RFs

with that of other types of ANNs, such as recurrent neural networks

and dynamic neural networks, will be conducted. In addition, our

future work will focus on designing the parallel implementation of

machine learning algorithms that can be applied to large-scale and

real-time prognosis.

Acknowledgment

The research reported in this paper is partially supported by

NSF under Grant Nos. IIP-1238335 and DMDII-15-14-01. Any

opinions, ﬁndings, and conclusions or recommendations expressed

in this paper are those of the authors and do not necessarily reﬂect

the views of the National Science Foundation and the Digital Man-

ufacturing and Design Innovation Institute.

References

[1] Swanson, L., 2001, “Linking Maintenance Strategi es to Performance,” Int. J.

Prod. Econ.,70(3), pp. 237–244.

[2] Valdez-Flores, C., and Feldman, R. M., 1989, “A Survey of Preventive Mainte-

nance Models for Stochastically Deteriorating Single-Unit Systems,” Nav. Res.

Logist.,36(4), pp. 419–446.

[3] Wu, D., Terpenny, J., Zhang, L., Gao, R., and Kurfess, T., 2016, “Fog-Enabled

Architecture for Data-Driven Cyber-Manufacturing Systems,” ASME Paper

No. MSEC2016-8559.

[4] Lee, J., 1995, “Machine Performance Monitoring and Proactive Maintenance in

Computer-Integrated Manufacturing: Review and Perspective,” Int. J. Comput.

Integr. Manuf.,8(5), pp. 370–380.

[5] Bevilacqua, M., and Braglia, M., 2000, “The Analytic Hierarchy Process

Applied to Maintenance Strategy Selection,” Reliab. Eng. Syst. Saf.,70(1),

pp. 71–83.

[6] Suh, J. H., Kumara, S. R., and Mysore, S. P., 1999, “Machiner y Fault Diagnosis

and Prognosis: Application of Advanced Signal Processing Techniques,” CIRP

Ann.-Manuf. Technol.,48(1), pp. 317–320.

[7] Hu, C., Youn, B. D., and Kim, T., 2012, “Semi-Supervised Learning With Co-

Training for Data-Driven Prognostics,” IEEE Conference on Prognostics and

Health Management (PHM), Denver, CO, June 18–21, pp. 1–10.

[8] Schwabacher, M., 2005, “A Survey of Data-Driven Prognostics,” AIAA Paper

No. 2005-7002.

[9] Byrne, G., Dornfeld, D., Inasaki, I., Ketteler, G., K€

onig, W., and Teti, R., 1995,

“Tool Condition Monitoring (TCM)—The Status of Research and Industrial

Application,” CIRP Ann.-Manuf. Technol.,44(2), pp. 541–567.

[10] Teti, R., Jemielniak, K., O’Donnell, G., and Dornfeld, D., 2010, “Advanced

Monitoring of Machining Operations,” CIRP Ann.-Manuf. Technol.,59(2),

pp. 717–739.

[11] Gao, R., Wang, L., Teti, R., Dornfe ld, D., Kumara, S., Mori, M., and Helu, M.,

2015, “Cloud-Enabled Prognosis for Manufacturing,” CIRP Ann.-Manuf. Tech-

nol.,64(2), pp. 749–772.

[12] Daigle , M. J., and Goebel, K., 2013, “Model-Based Prognostics With Concur-

rent Damage Progression Processes,” IEEE Trans. Syst. Man Cybernetics:

Syst.,43(3), pp. 535–546.

[13] Si, X.-S., Wang, W., Hu, C.-H., Chen, M.-Y., and Zhou, D.-H., 2013, “A

Wiener-Process-Based Degradation Model With a Recursive Filter Algorithm

for Remaining Useful Life Estimation,” Mech. Syst. Signal Process.,35(1),

pp. 219–237.

[14] Dong, M., and He, D., 2007, “Hidden Semi-Markov Model-Based Methodology

for Multi-Sensor Equipment Health Diagnosis and Prognosis,” Eur. J. Oper.

Res.,178(3), pp. 858–878.

Fig. 11 Comparison of MSEs

Fig. 12 Comparison of R-squared errors

Table 9 Accuracy on the test data and training time for SVR

with radial basis kernel

SVR

Training size (%) MSE R

2

Training time (s)

50 54.993 0.946 0.060

60 49.868 0.952 0.073

70 41.072 0.959 0.088

80 31.958 0.969 0.107

90 23.997 0.975 0.126

Table 10 Accuracy on the test data and training time for RFs

RFs (500 trees)

Training size (%) MSE R

2

Training time (s)

50 14.170 0.986 1.079

60 11.053 0.989 1.386

70 10.156 0.990 1.700

80 8.633 0.991 2.003

90 7.674 0.992 2.325

071018-8 / Vol. 139, JULY 2017 Transactions of the ASME

[15] Saha, B., Goebel, K., and Christophersen, J., 2009, “Comparison of Prognostic

Algorithms for Estimating Remaining Useful Life of Batteries,” Trans. Inst.

Meas. Control,31(3–4), pp. 293–308.

[16] Niaki, F. A., Michel, M., and Mears, L., 2016, “State of Health Monitoring in

Machining: Extended Kalman Filter for Tool Wear Assessment in Turning of

IN718 Hard-to-Machine Alloy,” J. Manuf. Processes,24(Part 2), pp. 361–369.

[17] Orchard, M. E., and Vachtsevanos, G. J., 2009, “A Particle-Filtering Approach

for On-Line Fault Diagnosis and Failure Prognosis,” Trans. Inst. Meas. Control,

31(3–4), pp. 221–246.

[18] Wang, P., and Gao, R. X., 2015, “Adaptive Resampling-Based Particle Filtering

for Tool Life Prediction,” J. Manuf. Syst.,37(Part 2), pp. 528–534.

[19] Niaki, F. A., Ulutan, D., and Mears, L., 2015, “Stochastic Tool Wear Assess-

ment in Milling Difﬁcult to Machine Alloys,” Int. J. Mechatronics Manuf.

Syst.,8(3–4), pp. 134–159.

[20] Wang, P., and Gao, R. X., 2016, “Stochastic Tool Wear Prediction for Sustain-

able Manufacturing,” Proc. CIRP,48, pp. 236–241.

[21] Sick, B., 2002, “On-Line and Indirect Tool Wear Monitoring in Turning With

Artiﬁcial Neural Networks: A Review of More Than a Decade of Research,”

Mech. Syst. Signal Process.,16(4), pp. 487–546.

[22] Breiman, L., 2001, “Random Forests,” Mach. Learn.,45(1), pp. 5–32.

[23] Biau, G., 2012, “An alysis of a Random Forests Model,” J. Mach. Learn. Res.,

13, pp. 1063–1095.

[24] Verikas, A., Gelzinis, A., and Bacauskiene, M., 2011, “Mining Data With

Random Forests: A Survey and Results of New Tests,” Pattern Recognit.,

44(2), pp. 330–349.

[25] Kamarthi, S., Kumara, S., and Cohen, P., 2000, “Flank Wear Estimation in

Turning Through Wavelet Representation of Acoustic Emission Signals,”

ASME J. Manuf. Sci. Eng.,122(1), pp. 12–19.

[26] Liang, S., and Dornfeld, D., 1989, “Tool Wear Detection Using Time Series

Analysis of Acoustic Emission,” J. Eng. Ind.,111(3), pp. 199–205.

[27] Huang, Y., and Liang, S. Y., 2004, “Modeling of CBN Tool Flank Wear Progression

in Finish Hard Turning,” ASME J. Manuf. Sci. Eng.,126(1), pp. 98–106.

[28] Taylor, F. W., 1907, On the Art of Cutting Metals, ASME, New York.

[29] Schwabacher, M., and Goebel, K., 2007, “A Survey of Artiﬁcial Intelligence

for Prognostics,” AAAI Fall Symposium, Arlington, VA, Nov. 9–11, pp.

107–114.

[30] Bukkapatnam, S. T., Lakhtakia, A., and Kumara, S. R., 1995, “Analysis of Sen-

sor Signals Shows Turning on a Lathe Exhibits Low-Dimensional Chaos,”

Phys. Rev. E,52(3), p. 2375.

[31] Bukkapatnam, S. T., Kumara, S. R., and Lakhtakia, A., 2000, “Fractal Estima-

tion of Flank Wear in Turning,” ASME J. Dyn. Syst. Meas. Control,122(1), pp.

89–94.

[32] Bukkapatnam, S., Kumara, S., and Lakhtakia, A., 1999, “An alysis of Acoustic

Emission Signals in Machining,” ASME J. Manuf. Sci. Eng.,121(4), pp.

568–576.

[33] €

Ozel, T., and Karpat, Y., 2005, “Predictive Modeling of Surface Roughness and

Tool Wear in Hard Turning Using Regression and Neural Networks,” Int. J.

Mach. Tools Manuf.,45(4), pp. 467–479.

[34] Palanisamy, P., Rajendra n, I., and Shanmugasundaram, S., 2008, “Prediction of

Tool Wear Using Regression and ANN Models in End-Milling Operation,” Int.

J. Adv. Manuf. Technol.,37(1–2), pp. 29–41.

[35] Sanjay, C., Neema, M., and Chin, C., 2005, “Modeling of Tool Wear in Drilling

by Statistical Analysis and Artiﬁcial Neural Network,” J. Mater. Process. Tech-

nol.,170(3), pp. 494–500.

[36] Chungchoo, C., and Sain i, D., 2002, “On-Line Tool Wear Estimation in CNC

Turning Operations Using Fuzzy Neural Network Model,” Int. J. Mach. Tools

Manuf.,42(1), pp. 29–40.

[37] Chen, J. C., and Chen, J. C., 2005, “An Artiﬁcial-Neural-Networks-Based In-

Process Tool Wear Prediction System in Milling Operations,” Int. J. Adv.

Manuf. Technol.,25(5–6), pp. 427–434.

[38] Paul, P. S., and Varada rajan, A., 2012, “A Multi-Sensor Fusion Model Based

on Artiﬁcial Neural Network to Predict Tool Wear During Hard Turning,”

Proc. Inst. Mech. Eng., Part B,226(5), pp. 853–860.

[39] Karayel, D., 2009, “Prediction and Control of SurfaceRoughness in CNC Lathe Using

Artiﬁcial Neural Network,” J. Mater. Process. Technol.,209(7), pp. 3125–3137.

[40] Cho, S., Asfou r, S., Onar, A., and Kaundinya, N., 2005, “Tool Breakage Detec-

tion Using Support Vector Machine Learning in a Milling Process,” Int. J.

Mach. Tools Manuf.,45(3), pp. 241–249.

[41] Benkedjouh, T., Medjaher, K., Zerhouni, N., and Rechak, S., 2015, “Health

Assessment and Life Prediction of Cutting Tools Based on Support Vector

Regression,” J. Intell. Manuf.,26(2), pp. 213–223.

[42] Shi, D., and Gindy, N. N., 2007, “Tool Wear Predictive Model Based on Least

Squares Support Vector Machines,” Mech. Syst. Signal Process.,21(4), pp.

1799–1814.

[43] Jiaa, C. L., and Dornfeld, D. A., 1998, “A Self-Organizing Approach to the Pre-

diction and Detection of Tool Wear,” ISA Trans.,37(4), pp. 239–255.

[44] Elangovan, M., Devasenapati, S. B., Sakthivel, N., and Ramachandran, K.,

2011, “Evaluation of Expert System for Condition Monitoring of a Single Point

Cutting Tool Using Principle Component Analysis and Decision Tree Algo-

rithm,” Expert Syst. Appl.,38(4), pp. 4450–4459.

[45] Arisoy, Y. M., and €

Ozel, T., 2015, “Machine Learning Based Predictive Model-

ing of Machining Induced Microhardness and Grain Size in Ti–6Al–4V Alloy,”

Mater. Manuf. Process.,30(4), pp. 425–433.

[46] Cortes, C., and Vapnik, V., 1995, “Support-Vector Networks,” Mach. Learn.,

20(3), pp. 273–297.

[47] Drucker, H., Burges, C. J., Kaufman, L., Smola, A., and Vapnik, V., 1997,

“Support Vector Regression Machines,” Advances in Neural Information Proc-

essing Systems, Vol. 9, pp. 155–161.

[48] Liaw, A., and Wiener, M., 2002, “Classiﬁcation and Regression by Random

Forest,” R News,2(3), pp. 18–22.

[49] Friedman, J., Hastie, T., and Tibshirani, R., 2001, The Elements of Statistical

Learning, Springer Series in Statistics, Springer, Berlin.

[50] Li, X., Lim, B., Zhou, J., Huang, S., Phua, S., Shaw, K., and Er, M., 2009,

“Fuzzy Neural Network Modelling for Tool Wear Estimation in Dry Milling

Operation,” Annual Conference of the Prognostics and Health Management

Society (PHM), San Diego, CA, Sept. 27–Oct. 1, pp. 1–11.

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-9