Content uploaded by Ashis Gopal Banerjee

Author content

All content in this area was uploaded by Ashis Gopal Banerjee on Sep 27, 2017

Content may be subject to copyright.

This document contains a draft version of the following paper:

W. Guo and A. G. Banerjee. Identification of Key Features Using Topological Data Analysis for

Accurate Prediction of Manufacturing System Outputs. Journal of Manufacturing Systems,

43(2), 225-234, 2017.

Readers are encouraged to get the official copy from the publisher or by contacting Dr. Ashis G.

Banerjee at ashisb@uw.edu

!

Identiﬁcation of Key Features Using Topological Data

Analysis for Accurate Prediction of Manufacturing

System Outputs

Wei Guoa, Ashis G. Banerjeea,b,∗

aDepartment of Industrial & Systems Engineering, University of Washington, Seattle,

WA 98195, USA

bDepartment of Mechanical Engineering, University of Washington, Seattle, WA 98195,

USA

Abstract

Topological data analysis (TDA) has emerged as one of the most promising

approaches to extract insights from high-dimensional data of varying types

such as images, point clouds, and meshes, in an unsupervised manner. To

the best of our knowledge, here, we provide the ﬁrst successful application of

TDA in the manufacturing systems domain. We apply a widely used TDA

method, known as the Mapper algorithm, on two benchmark data sets for

chemical process yield prediction and semiconductor wafer fault detection,

respectively. The algorithm yields topological networks that capture the

intrinsic clusters and connections among the clusters present in the data

sets, which are diﬃcult to detect using traditional methods. We select key

process variables or features that impact the system outcomes by analyzing

the network shapes. We then use predictive models to evaluate the impact of

the selected features. Results show that the models achieve at least the same

level of high prediction accuracy as with all the process variables, thereby,

providing a way to carry out process monitoring and control in a more cost-

eﬀective manner.

Keywords: Topological data analysis, Feature selection, Yield prediction,

Fault detection

∗Corresponding author

Email addresses: weig@uw.edu (Wei Guo), ashisb@uw.edu (Ashis G. Banerjee )

Preprint submitted to Journal of Manufacturing Systems March 3, 2017

1. Introduction1

Sensors play an essential role in carrying out product feasibility assess-2

ment, yield enhancement, and quality control in modern manufacturing sys-3

tems such as vehicle assembly, microprocessor fabrication, and pharmaceu-4

ticals development [1]. A large number of sensors of many diﬀerent types5

are typically employed in such systems to measure a variety of process vari-6

ables ranging from operating conditions and equipment states to material7

compositions and processing defects over extended time periods. Thus, the8

volume of acquired data is so vast and heterogeneous that the contribution9

of individual sensor measurements in predicting the overall system outputs10

gets obscured. This prediction is made more challenging by the fact that the11

measurements are often noisy and replete with missing or outlier values. Fur-12

thermore, there is signiﬁcant redundancy among the sensor measurements,13

leading to the presence of numerous false correlations in the recorded data. It14

is, therefore, necessary to perform an analysis using statistical methods that15

are speciﬁcally suited to identifying and ﬁltering out existing correlations in16

erroneous, heterogeneous, and high-dimensional data sets.17

Historically, multivariate statistical process control (MSPC) methods,18

such as principal component analysis (PCA) and partial least-squares (PLS),19

have served as the dominant mode of addressing this problem [2]. The com-20

mon idea behind these methods is to deﬁne a new set of variables (known as21

latent variables) through linear combinations of the original variables that22

describe the sensor measurements. The set of latent variables may be re-23

duced in some cases by performing subsequent dimensionality reduction tech-24

niques. However, these methods do not work particularly well when there25

are a large number of input process variables, and they share highly non-26

linear relationships with the system outputs that cannot be modeled using27

Gaussian distributions. The methods also encounter diﬃculties in removing28

the false correlations among the measurements particularly when they are29

erroneous. More recently, several non-linear prediction methods have been30

developed based on response surface ﬁtting as well as kernelized and robust31

variants of the MSPC techniques [3, 4]. While these methods may achieve32

high prediction accuracy, they do not provide any direct way of quantifying33

the contribution or impact of the individual process variables.34

Here, we present an alternative method that leverages the emerging topic35

area of topological data analysis (TDA) [5] to select the important variables36

that are subsequently used in both linear and non-linear prediction models.37

2

More speciﬁcally, we employ a well-established TDA method known as the38

Mapper algorithm developed by Singh et al. [6]. It is based on the core idea39

of understanding the unknown topology of the high-dimensional manifold in40

which the data resides to extract hidden patterns. In particular, it clusters all41

the level sets of the data (deﬁned using a projection of the high dimensional42

data to a lower dimensional space) to generate a topological network that43

represents the inherent clusters and connections among the clusters in the44

actual data.45

This Mapper algorithm has already enjoyed immense popularity in ﬁelds46

such as bioinformatics and machine vision. For example, it has been used to47

reveal unique and subtle aspects of the folding patterns of RNA [7] and to48

unlock previously unidentiﬁed relationships in immune cell reactivity between49

patients with type-1 and type-2 diabetes [8]. Another inﬂuential example50

occurs in personalized breast cancer diagnosis, in which a novel subgroup of51

tumors with a unique mutational proﬁle and 100% survival rate has been52

discovered [9]. Additionally, its deformation invariant property has been53

used to detect 3D objects from point cloud data with intrinsically diﬀerent54

shapes [6].55

Despite the potential of TDA in general and the Mapper algorithm in56

particular, there has been no prior application in the manufacturing domain57

to the best of our knowledge. Inspired by the success in biomedical and vision58

problems, we employ the Mapper algorithm and show that it facilitates the59

analysis of the impact of each process variable on system outputs through60

direct visualization. It also determines whether particular subgroups of the61

data are selectively responsive to diﬀerent process variables, which helps to62

monitor and diagnose processes eﬀectively.63

We ﬁrst apply the Mapper algorithm on a benchmark chemical processing64

data set to predict product yield [10]. Speciﬁcally, the shape of the generated65

topological network is used to select key features that explain the observed66

diﬀerences in the process measurements in a statistically signiﬁcant manner.67

Second, we investigate the role of individual process variables in causing wafer68

failures in another publicly available semiconductor manufacturing data set.69

Although it has been recognized that k-nearest neighbor methods can identify70

faulty wafers eﬀectively [11, 12, 13, 14], the actual process variables that71

result in the wafer anomalies have never been identiﬁed. To this end, we72

demonstrate how the Mapper algorithm rapidly traces the causality hidden73

in this high-dimensional data set.74

The rest of the paper is organized as follows. Section 2 gives an overview75

3

of the general characteristics of manufacturing data and the types of predictor76

(feature) and response variables that are of interest to us. In Section 3, we77

review the Mapper algorithm and its application in feature selection. We78

demonstrate the applicability of the Mapper algorithm for feature selection79

on two benchmark manufacturing data sets in Section 4. The eﬀectiveness80

of the selected features is further assessed through predictive models. We81

conclude the paper with remarks and future research topics.82

2. Problem formulation83

In real-world manufacturing systems, data is collected using a large num-84

ber of sensors that are aﬃxed to or embedded within diﬀerent machines and85

equipment, resulting in a high-dimensional body of heterogeneous data. The86

data is usually in the form of time series measurements of diﬀerent process87

variables such as temperature, pressure, density, humidity, voltage, chemical88

or material composition including the relative proportions of various con-89

stituents of alloys or mixtures, material removal or deposition rate, number90

and severity of processed part ﬂaws and defects, and so on. The sensors, thus,91

come in myriad forms ranging from thermocouples, pressure gauges, hydrom-92

eters, hygrometers, and voltmeters to optical cameras, spectrometers, laser93

scanners, and ultrasonic transducers.94

Consequently, manufacturing sensor data is prone to noise terms, missing95

values, and outliers. These measurement errors depend on the sensitivity of96

the sensors to the operating conditions based on their underlying physical97

principles of actions. For example, it is not at all uncommon for temporary98

sensor hardware malfunction to result in missing values. A further problem is99

that of co-linearity, which is usually caused by partial redundancy in the sen-100

sor arrangement such as the placement of multiple sensors in close proximity101

to one another. The net result of these complications is that manufacturing102

systems are often “data-rich but information-poor”.103

Consequently, there is a strong need to eﬀectively select a minimal num-104

ber of process variables that primarily aﬀect the output variables of interest105

such as product quality and yield of a manufacturing system comprising sev-106

eral processes of varying types. As discussed earlier in Section 1, this form107

of selection facilitates process monitoring and diagnostics through targeted108

sensor data acquisition, storage, and processing. Even if it is cheap or con-109

venient to manage data from all the sensors, knowing which measurements110

of what variables matter the most makes it feasible to rapidly regulate out-111

4

of-control processes or adapt them to manufacture high quality products at112

desired rates.113

To formulate the problem mathematically, we suppose there are mprocess114

variables (features) and Nsensor measurements recorded at diﬀerent time115

instants. Each measurement is, thus, represented by an m-dimensional vector116

xi∈Rm,i= 1,2, . . . , N. The data is then assembled into a matrix X=117

[x1,x2,...,xN]T∈RN×m. Each column denotes a process variable, which118

is measured by one sensor operating alone or by the concurrent operation119

of several sensors that function in unison. The latter case is known as data120

fusion [15], which provides a wide range of sensed parameters, and is, hence,121

more reliable for data analysis.122

In a batch process with batch length L, a 3-D data array ¯

X∈RN×m×L

123

is often unfolded batch-wise into a 2-D matrix X∈RN×mL. In this case,124

each measurement xi∈RmL is a batch and each process variable is mea-125

sured Ltimes throughout the batch, hence, corresponding to Lcolumns.126

For each row, the measurement is either spatially-sampled or temporally-127

sampled. For instance, in the semiconductor manufacturing environment,128

electronic wafer map data collected from in-line measurements are sampled129

spatially across the surface of the wafer for defect inspection [16]. Usually,130

there will also be one or more response variables to reﬂect the output quality131

or quantity. We write the output with rresponse variables into a matrix132

Y= [y1,y2,...,yN]T∈RN×r, where each response variable is represented133

by one column. Response variables are commonly seen as continuous vari-134

ables denoting production yields or binary variables indicating pass or fail135

outcomes.136

3. Technical approach137

We now present the framework of the Mapper algorithm and outline the138

typical pipeline of feature selection using the Mapper algorithm. For more139

details about the Mapper algorithm and concrete examples of real applica-140

tions, we refer the reader to [6, 17].141

3.1. Mapper algorithm142

The Mapper algorithm can be considered as a partial clustering algorithm143

inspired by the classical discrete Morse theory [6]. In topology, discrete Morse144

theory enables one to characterize the topology of high dimensional data via145

some functional level sets [18]. More speciﬁcally, given a topological space146

5

X, when h:X → Ris a smooth real-valued function (Morse function),147

topological information of Xis inferred from the level sets h−1(c) for some148

real c.149

The Mapper algorithm extends this inference to incorporate standard150

clustering methods for the analysis of high dimensional data sets. Given a151

data matrix X, the setup of the Mapper algorithm includes:152

1. Set resolution parameters: a number of intervals land overlap percent-153

age p, where p∈(0,100).154

2. Compute the pairwise distance matrix D= [d(xi,xj)] ∈RN×Nbased155

on the distance metric chosen.156

3. Select a ﬁlter function f:X → Rnto stratify the data.157

The most crucial step in the Mapper algorithm is the selection of the158

ﬁlter function to “guide” a clustering algorithm on the high-dimensional159

data. A few common ﬁlter functions include Gaussian kernel density es-160

timator, eccentricity ﬁlter, principal metric SVD ﬁlter, and eigenvectors of161

graph Laplacians. Moreover, we can take the projection found by dimension-162

ality reduction/manifold learning techniques that maps the high-dimensional163

data to a low-dimensional space as the ﬁlter function. For example, in the164

chemical manufacturing process study, our choice of the ﬁlter function is the165

2-D projection found by the multidimensional scaling (MDS) method. MDS166

in this case attempts to embed the data such that the pairwise distances in167

the high-dimensional space are preserved in the 2-D Euclidean space. Ac-168

cordingly, the 2-D embedding coordinates denoted by ˆx1,ˆx2,...,ˆxN, are the169

minimizers of a loss function, σ, deﬁned as170

σ(ˆx1,ˆx2,...,ˆxN) =

N

X

j=2

j−1

X

i=1

(||ˆxi−ˆxj||2−d(xi,xj))2.(1)

Therefore, the ﬁlter function is speciﬁed as171

f:X → f1×f2,(2)

where f1and f2are coordinates of ˆx1,ˆx2,...,ˆxNalong the 1st and 2nd dimen-172

sion, respectively. For the study of fault detection in the semiconductor man-173

ufacturing processes, we employ the 2-D projection found by the t-distributed174

stochastic neighboring (t-SNE) algorithm as the ﬁlter function [19]. t-SNE175

aims to preserve the joint probabilities pij that measure similarities between176

6

xiand xj,i, j = 1,2, . . . , N , as much as possible in the 2-D space. Speciﬁ-177

cally, pij is deﬁned as178

pij =pj|i+pi|j

2N,(3)

where the conditional probability pj|ithat represents the similarity of xjto179

xiis given by180

pj|i=exp(−||xi−xj||2/2σ2

i)

Pk6=iexp(−||xi−xk||2/2σ2

i).(4)

Herein the variance of the Gaussian σicentered at xiis determined by a pre-181

deﬁned perplexity. On the other hand, the joint probability qij that reﬂects182

the similarity between 2-D embedding coordinates ˜xiand ˜xjis deﬁned based183

on a heavy-tailed Student’s t-distribution with one degree of freedom:184

qij =(1 + ||˜xi−˜xj||2)−1

Pk6=l(1 + ||˜xk−˜xl||2)−1,(5)

such that dissimilar measurements in the m-D space are mapped far apart185

in the 2-D space. ˜xi,i= 1,2, . . . , N are then determined by minimizing the186

Kullback-Leibler divergence between the joint probability distribution Pin187

the m-D space and the joint probability distribution Qin the 2-D space,188

DKL(P||Q) =

N

X

j=2

j−1

X

i=1

pij log pij

qij

.(6)

Likewise, the ﬁlter function in this case is given by189

f:X → g1×g2,(7)

where g1and g2are coordinates of ˜x1,˜x2,...,˜xNalong the 1st and 2nd di-190

mension, respectively. In addition, it should be noted that when Nis too191

large, numerical optimization techniques are used.192

The algorithm is summarized as a ﬂow chart in Fig. 1. After setup, the193

ﬁrst step is to divide the ﬁlter range and cover it with overlapped intervals194

so that the clustering algorithm in the ensuing step focuses on the local195

information of the data that is likely to be ignored by the clustering over196

the entire data. The second step is to cluster the data in the original high197

dimensional space for every level set (subset). The Mapper algorithm is not198

7

tied to any particular clustering algorithm. However, it is always required to199

estimate certain parameters (thresholds) in order to determine the number200

of clusters in every level set. The last step of the algorithm is to link any201

two clusters from neighboring level sets together if they have one or more202

common data points.203

Cluster

Subsets of

Original Data

Choose

Filter Function

Input

Data

Construct

Vertices and

Edges

Choose

Resolution

Parameters

Choose

Distance

Metric

Output

Simplicial

Complex

Cover

Filter Range

with Subsets

Setup

Figure 1: Framework of the Mapper algorithm for generating topological networks.

In the 1-D Mapper case, the output is a 1-D simplicial complex that204

comprises only vertices (0-simplex) and edges (1-simplex). More generally,205

if the target space is Rn, higher simplices may appear in the output sim-206

plicial complex, such as triangular faces (2-simplex) whenever three clusters207

from neighboring level sets have nonempty intersections. The compressed208

representation of the simplicial complex allows us to obtain a qualitative209

understanding of how the data are organized on a large scale through di-210

rect visualization. Additionally, the resolution of the complex changes from211

coarse to ﬁne as the number of intervals lincreases. This change of resolution212

reﬂects the change in topology of the data set.213

It is worth mentioning that the ﬁlter range is not necessarily covered by l214

overlapped intervals of equal length. In fact, the Mapper algorithm is highly215

parallelizable. To improve the eﬃciency of parallel computation, it is more216

convenient to decompose the ﬁlter range into loverlapped intervals wherein217

each interval contains the same number of points so that the running times218

would be similar for all the level sets.219

3.2. Application of Mapper algorithm to feature selection220

The output graph of the Mapper algorithm contains the information of221

clusters in the data at the local level, as well as their positions relative to222

one another and to the remainder of the data set. Therefore, the principle223

8

of applying the Mapper algorithm to feature selection is to recognize shapes224

in the resulting graph that encode the essential structural information of the225

data. Typical shapes of interest found in a graph are subgroups of clusters226

that display distinct patterns such as “loops” (continuous circular segments)227

and “ﬂares” (long linear segments), as opposed to portions of the graph228

within which the local environment of each cluster is roughly identical.229

Aside from shapes of interest, we also discern the trends in terms of230

the output values associated with each cluster in the graph rendered by the231

Mapper algorithm, such as which clusters contain several measurements from232

faulty samples in the case of anomaly detection. Furthermore, we are able to233

distinguish the fundamental subgroups from artifacts by observing whether234

the shapes of the given subgroups remain consistent when the resolution235

parameters are varied over a wide range of values. After the fundamental236

subgroups of interest are detected, standard statistical tests, such as the237

Kolmogorov-Smirnov test and Student’s t-test, are performed to identify the238

features that best distinguish the subgroups from one another. The ﬁnal set239

of features thus selected are then fed into classiﬁcation or regression models240

to perform a desired prediction task.241

Thus, we end up addressing two main challenges in applying the Map-242

per algorithm to identify key features from manufacturing data. The ﬁrst243

one pertains to a suitable selection of the ﬁlter function so as to map the244

high-dimensional data to a low-dimensional space where the data can be245

conveniently stratiﬁed. Unlike in the case of point clouds, meshes, or im-246

ages, there is no well-established function, and we select the MDS projection247

method based on ﬁnal output prediction quality. The second challenge is on248

varying the resolution parameters appropriately so that the fundamental sub-249

groups are correctly distinguished from artifacts in the generated topological250

networks. Choice of a coarse granularity of variation leads to the appearance251

and disappearance of subgroups, whereas the use of very ﬁne granularity252

makes the process time-consuming. We vary the parameters in a simple way253

such that a majority of the subgroups, which are identiﬁed at a particu-254

lar resolution, remain intact as the parameters change (the other subgroups255

appear and disappear enabling us to characterize them as artifacts).256

4. Results257

In this section, we conduct two studies to show how to achieve feature258

selection using the Mapper algorithm. With selected features, the ﬁrst study259

9

obtains accurate predictions of productivity for a chemical processing bench-260

mark, and the second study reaches a high accuracy in fault classiﬁcation for261

a semiconductor etch process.262

4.1. Prediction of manufacturing productivity263

The data is for a chemical process plant that is described in [20] and can264

be obtained from the R package “AppliedPredictiveModeling”. The data set265

contains 176 measurements of biological materials for which 57 variables are266

measured, where there are 12 biological starting materials and 45 manufac-267

turing process parameters (predictors). The starting material is generated268

from a biological unit and has a wide range of quality and characteristics. The269

manufacturing process parameters include temperature, drying time, wash-270

ing time, and concentrations of by-products at various steps. The biological271

variables are used to gauge the quality of the raw material before processing272

but cannot be changed, whereas the manufacturing process parameters can273

be changed during operations. The measurements are not independent since274

partial measurements are produced from the same batch of biological start-275

ing materials. We aim to investigate the relationships between the predictors276

and the ﬁnal pharmaceutical product yield, and develop a model to estimate277

the percentage yield of the manufacturing process.278

4.1.1. Data preprocessing279

As we want to maximize the level of automation in predicting manufac-280

turing productivity for industrial applications, the data is preprocessed with281

a minimum amount of work. First, the outliers in the data set are marked as282

missing values and the features with near-zero variances are discarded. Dur-283

ing this step, BiologicalMaterial07 is removed. Second, we apply Box-Cox284

transformation to the data to eliminate distributional skewness, and scale285

each column of the data to zero mean and unit variance. The last step is to286

impute the missing values by the k-NN method with k= 5. Note that all of287

these steps can be handled automatically in the production environment.288

4.1.2. Feature selection289

To begin with, we choose Euclidean distance as the metric to represent290

the similarity between the measurements. In this work, much eﬀort is spent291

on the suitable selection of the ﬁlter function due to the complex underlying292

structure of the data. Some commonly considered ﬁlter functions include293

the eccentricity function, linear and nonlinear projections such as PCA and294

10

Isomap. Regarding the quantity of interest and the purpose of the ﬁlter295

function, we use the response variable to “supervise” the stratiﬁcation of296

the data. The output of the MDS method that reduces the data set to297

2 dimensions is shown to provide the smoothest variations of the response298

values over the embedding coordinates, and is eventually chosen as the ﬁlter299

function.300

In the next stage, each dimension is covered by 14 intervals of equal length301

with 80% overlap between any two successive intervals, leading to the ﬁlter302

range being divided into 196 level sets in all. Density-based spatial clustering303

of applications with noise (DBSCAN) method is subsequently employed for304

clustering in each level set, where the number of clusters is determined by the305

minimum number of measurements in a cluster and the maximum distance306

between two measurements in the same cluster [21].307

As a result, we implement the steps above in Python1and obtain a topo-308

logical network in the form of a simplicial complex as shown in Fig. 2. Each309

cluster is represented by a node, and the node size is proportional to the310

number of measurements in the cluster based on a logarithmic scale. An311

edge is generated between any two nodes from neighboring level sets that312

have at least one measurement in common. We normalize the value of the313

product yield within the range 0-1, and color each node based on the aver-314

age normalized yield value for the measurements in the node. As is seen in315

Fig. 2, the shape of the data is captured by the generated topological net-316

work after iterating through multiple times at various resolution scales. The317

resolution is set at a large number of intervals and a high overlap percentage.318

A large number of intervals helps to uncover subtle aspects of the shape of319

the data rather than a blob, and a high overlap percentage seeks to have all320

nodes connected as a single network if possible. Thus, we are able to ﬁnd321

out the subgroups of interest and acquire an overall structural information322

of how the data is distributed within the network. In this problem, we are323

interested in the diﬀerence in patterns between the measurements with high324

and low yields. Notice that the high yields are separated into two subgroups,325

and the low yields are also bifurcated into two subgroups with diﬀerent pat-326

terns. Therefore, two subgroups of measurements with high yield and two327

subgroups of measurements with low yield are extracted from the data as328

encircled in Fig. 2.329

1Code adapted from https://github.com/MLWave/kepler-mapper

11

A

B

D

C

0 0.1 0.2 0.90.80.70.60.5

0.4

0.3 1

Figure 2: Topological network derived from the chemical processing data at a speciﬁed

resolution. Each node is colored based on the average normalized yield value for the

measurements in the node, where the normalized yield varies between 0 and 1. High and

low yield subgroups are isolated from the rest of the network, where A and C are extracted

as outer ﬂares and B and D are extracted from the periphery of the network as suggested

in [17].

Two-sample Kolmogorov-Smirnov (KS) test, which is sensitive to the dif-330

ference in both location and shape of the empirical cumulative distributions331

of two groups, is then performed between subgroups A and C, A and D, B332

and C, and B and D over all the columns in the data matrix to identify the333

features that best discriminate between them. We record the largest KS-334

score and the associated p-value as well as the adjusted p-value among the335

four tests for each feature. The results are presented in Table 1 and further336

visualized in Fig. 3. The p-values are adjusted using the well-established337

Benjamini-Hochberg (B-H) procedure [22, 23] that is commonly used to re-338

duce the false discovery rate (FDR) when multiple features or variables are339

evaluated for statistical signiﬁcance. The B-H adjustment provides greater340

ﬂexibility at the expense of somewhat higher FDR as compared to the tradi-341

tional Bonferroni correction method. This adjustment is, thus, better suited342

for our purpose as we want to identify all the process variables that may343

have an impact on the manufacturing system outputs. The most salient fea-344

tures are selected based on high KS-scores (>0.9) and low p-values (<0.05),345

12

where 11 of them are the measurements of manufacturing processes that can346

be controlled. Thus, the product yield should be improved by altering these

Table 1: Kolmogorov-Smirnov test to identify features that best diﬀerentiate between the

subgroups.

Feature KS-score p-value Adj. Feature KS-score p-value Adj.

p-value p-value

B01 0.882 5.53e-7 2.21e-6 M18 0.882 1.93e-7 7.20e-7

B02 1 7.57e-8 1.06e-6 M19 1 1.95e-9 2.18e-8

B03 1 7.57e-8 1.06e-6 M20 0.778 1.12e-4 3.49e-4

B04 0.917 1.16e-6 9.28e-6 M21 0.598 0.002 0.004

B05 0.739 2.36e-5 6.01e-5 M22 0.203 0.821 0.901

B06 1 7.57e-8 1.06e-6 M23 0.369 0.142 0.204

B08 1 7.55e-9 8.46e-8 M24 0.539 0.007 0.012

B09 0.417 0.054 0.082 M25 0.787 5.22e-6 1.39e-5

B10 0.728 3.32e-5 8.09e-5 M26 0.941 2.08e-8 1.17e-7

B11 0.886 4.95e-7 2.22e-6 M27 0.717 4.64e-5 1.04e-4

B12 0.952 1.34e-8 9.39e-8 M28 1 1.95e-9 2.18e-8

M01 0.533 0.008 0.013 M29 1 1.95e-9 2.18e-8

M02 1 7.55e-9 8.46e-8 M30 0.768 2.15e-5 6.35e-5

M03 0.650 0.001 1.23e-3 M31 0.944 6.14e-8 3.82e-7

M04 1 1.88e-7 1.75e-6 M32 0.941 2.08e-8 1.17e-7

M05 0.647 5.95e-4 1.28e-3 M33 0.894 1.28e-7 5.50e-7

M06 0.722 4.32e-4 1.15e-3 M34 0.238 0.718 0.855

M07 0.261 0.521 0.635 M35 0.501 0.011 0.017

M08 0.314 0.259 0.345 M36 0.787 5.22e-6 1.39e-5

M09 0.944 1.08e-6 6.05e-6 M37 0.317 0.284 0.379

M10 0.833 2.64e-5 1.06e-4 M38 0.381 0.167 0.253

M11 0.886 4.95e-7 2.22e-6 M39 0.294 0.371 0.472

M12 0.667 0.001 0.003 M40 0.278 0.560 0.713

M13 1 1.88e-7 1.75e-6 M41 0.262 0.601 0.783

M14 0.692 9.71e-5 2.01e-4 M42 0.488 0.034 0.064

M15 0.905 8.40e-8 4.28e-7 M43 0.846 7.12e-7 2.21e-6

M16 0.690 0.001 0.002 M44 0.291 0.342 0.426

M17 0.833 2.64e-5 1.06e-4 M45 0.222 0.819 0.936

aB: BiologicalMaterial; M: ManufacturingProcess

bKey features characterized with high KS-score (>0.9) and low adjusted p-value (<0.05)

are written in bold.

347

13

B02

B03

B04

B06

B08

B12

M02

M04

M09

M13

M15

M19

M26

M28

M29

M31

M32

0

0.2

0.4

0.6

0.8

1

KS-score

B02

B03

B04

B06

B08

B12

M02

M04

M09

M13

M15

M19

M26

M28

M29

M31

M32

Process variable

10-9

10-6

10-3

100

p-value

Without adjustment

With adjustment

KS<0.9

p<0.05

Figure 3: Key features (marked by x-axis tick labels) that best diﬀerentiate between the

subgroups are identiﬁed by Kolmogorov-Smirnov tests as those which yield a high KS-score

(>0.9) and a low corresponding adjusted p-value (<0.05).

steps in the process to have higher or lower values. We also note that the348

selection of the most salient features are not aﬀected by the B-H procedure349

in this case.350

Fig. 4 examines the eﬀects of the features on the product yield and probes351

the relationships between them. We color the same network nodes based on352

normalized feature values. The color of each node encodes the normalized353

feature value averaged across all the measurements in the node, with blue354

denoting a low value and red indicating a large value. We see that signiﬁcant355

diﬀerences between the subgroups exist both for BiologicalMaterial06 and356

ManufacturingProcess13, both of which are selected in Table 1. Contrary357

to Fig. 4(a)(b), an unselected feature ManufacturingProcess22 shows no sig-358

niﬁcant diﬀerence between any of the subgroups in Fig. 4(c). Meanwhile,359

on comparing with Fig. 2, BiologicalMaterial06 shows a positive relationship360

with the yield, whereas ManufacturingProcess13 displays a negative relation-361

ship with the yield.362

4.1.3. Predictive modeling363

Four regression models, PLS, random forest (RF), cubist and Gaussian364

process with a Gaussian kernel (kGP), are chosen to predict the yield of365

14

Significant differences

between subgroups

A/B and C/D

Positive relationship

with yield

A

B

D

C

(a) BiologicalMaterial06

Significant difference

between subgroups

B and D

Negative relationship

with yield

D

B

(b) ManufacturingProcess13

No significant difference

between subgroups

(c) ManufacturingProcess22

Figure 4: Topological networks colored based on diﬀerent selection of features at the same

resolution as in Fig. 2. For every network, each node is colored based on the average nor-

malized feature value of all the measurements included in the node, where the normalized

feature value varies between 0 and 1.

the chemical manufacturing process. These models represent a linear model,366

a tree-based model, a rule-based model and a kernelized technique, respec-367

tively. We randomly split the entire data set into a training set and a testing368

set in 7:3 ratio. Parameters in each trained model are tuned to be optimal369

using 25 iterations of 10-fold cross-validated search over the parameter set.370

The trained models are then adopted to predict the percentage yield for the371

testing set.

Table 2: Estimation errors and computation times for diﬀerent models with all features

and selected features

Method Errors (RMSE) Computation Times (s)

Training Testing Training Testing

All

features

PLS 1.20±0.05 1.29±0.10 1.33±0.30 0.005±0.001

RF 1.13±0.06 1.15±0.15 130±2.40 0.006±0.001

Cubist 1.00±0.07 1.15±0.13 58.5±4.11 0.025±0.004

kGP 1.21±0.04 1.25±0.11 8.14±0.43 0.002±9.4e-4

Selected

features

PLS 1.13±0.05 1.25±0.09 1.02±0.16 0.002±3.8e-4

RF 1.11±0.06 1.13±0.15 91.2±2.53 0.005±8.9e-4

Cubist 1.05±0.10 1.20±0.08 24.7±1.54 0.008±0.002

kGP 1.19±0.05 1.22±0.11 6.24±0.33 0.001±6.3e-4

372

Table 2 compares the prediction results and computation times between373

15

all the features and just the selected features for the models based on 30374

runs. The prediction accuracy is evaluated by the root mean squared error375

(RMSE) and computation times are measured on a laptop with an Intel Core376

i5 (1.7 GHz) CPU and 4 GB RAM. We ﬁnd that the models with selected377

features achieve comparable performance as the models with all the features.378

Especially, in the case of the PLS, RF and kGP models, selected features379

outperform all the features in terms of both training and testing errors, which380

highlights the eﬃcacy of the selected features based on the Mapper algorithm.381

Meanwhile, the training times are reduced by about 30%∼60% for the RF382

and Cubist models using the selected features.383

Table 3 compares the top features identiﬁed by diﬀerent methods. Since384

there is almost no dominant feature due to the complexity of the data, the385

features identiﬁed by each method vary from each other. For the Mapper386

algorithm, the feature that overlap with the features identiﬁed by at least387

one of the other methods are highlighted. In fact, even though the other388

four methods have the ability to detect signiﬁcant features, it is diﬃcult for389

them to interpret how the yield is aﬀected by these features. In contrast, the390

Mapper algorithm is well capable of unraveling the relationships between the391

features and product yield through easy and rapid visualization as shown in392

Fig. 4.393

4.2. Fault detection of semiconductor manufacturing process394

In this study, the data set2is collected from an Al stack etch process395

performed on a commercial-scale Lam 9600 plasma etch tool at Texas In-396

strument Inc. [24]. The data consists of 108 normal wafers and 21 faulty397

wafers from three separate experiments (denoted as experiment numbers 29,398

31, and 33) with 19 process variables for monitoring. Since two of the process399

variables, RF reﬂected power and TCP reﬂected power, remain almost zero400

during the entire batch, only 17 process variables are used for fault detection401

and diagnosis, as tabulated in Table 4. Moreover, one normal wafer and one402

faulty wafer are removed from the data set due to a large amount of missing403

values. Finally, because the experiments were run several weeks apart from404

one another, the process shift and drift lead to diﬀerent means and covariance405

structures in the data gathered in each of the three experiments.406

The faulty wafers were intentionally induced through the modiﬁcation of407

2Available at http://software.eigenvector.com/Data/Etch/index.html.

16

Table 3: Top 17 important features identiﬁed by diﬀerent methods

PLS RF Cubist kGP TDA

M32 M32 M32 M32 B02

M36 B06 M17 B06 B03

M17 M17 M31 M13 B06

M13 M31 B06 M17 B08

M09 B03 M13 M36 M02

M33 M13 M04 B03 M04

M06 M01 M21 M31 M13

B06 B08 B03 M33 M19

M12 B11 M09 M09 M28

B03 M39 M01 B04 M29

B04 B04 M20 M06 B12

B08 M20 M39 M29 M09

B01 B09 B04 M04 M31

B11 M06 M33 B11 M26

M31 M18 M02 M02 M32

M04 M11 M05 B01 B04

M28 M33 B10 M27 M15

aB02, B12, M30, M40 are excluded from the PLS, RF, Cubist or kGP model since these

features are removed before models being trained due to their high correlation with other

features.

bThe important features given by PLS, RF, Cubist, kGP and the TDA method are

ranked based on the weighted sums of the absolute regression coeﬃcients, average impurity

reduction, usage in the rule conditions, and KS-score in Table 1, respectively. Features

with the same KS-score are ordered by their feature names. For the kGP method, a LOESS

smoother is ﬁtted to assess the relationship between each feature and the outcome. The

importance of the features is ranked by their R2statistics.

cThe ranking of feature importance varies somewhat with the training samples and the

results in Table 3 are reported based on a certain choice of the samples.

17

Table 4: Process variables for semiconductor wafer fault detection

No. Variables No. Variables

1 BCl3ﬂow 10 RF power

2 Cl2ﬂow 11 RF impedance

3 RF bottom power 12 TCP tuner

4 Endpoint A detector 13 TCP phase error

5 Helium chuck pressure 14 TCP impedance

6 Pressure 15 TCP top power

7 RF tuner 16 TCP load

8 RF load 17 Vat valve

9 RF phase error

several of the process variables: TCP power, RF power, pressure, BCl3or Cl2

408

ﬂow rate, and Helium chuck pressure. To simulate an actual sensor failure,409

readings from the corresponding sensor were intentionally adjusted using a410

bias term so that its mean value was equal to the original baseline value of411

the relevant process variable. For example, if the TCP power was modiﬁed412

from its normal baseline value of 350W to a value of 400W, the values of413

TCP power in the data set would be reset to a mean of 350W by adding a414

constant bias of –50W. Table 5 lists the induced faults associated with each415

faulty wafer in the three experiments. In general, the modiﬁcation of any416

one of the process variables may generally be expected to result in changes to417

the remainder of them because of correlations which may exist between the418

process variables. In this work, our goal is to determine the process variables419

which are most aﬀected by the induced faults and use this information to420

construct a classiﬁcation model for fault detection.421

4.2.1. Data preprocessing422

We follow a similar data preprocessing step as the aforementioned study.423

First, we remove the ﬁrst ﬁve records to eliminate eﬀects which due to initial424

ﬂuctuations. To accommodate shorter batches, we retain 85 records in each425

batch to ensure that each batch record is of equal length. Next, the 3-D data426

array is unfolded batch-wise to a 2-D matrix, resulting in a total of 1445427

features, i.e. each feature is considered to be a pairwise combination of a428

process variable and a batch record. Finally, each column of the 2-D matrix429

is scaled to zero mean and unit variance.430

18

Table 5: Induced faults and experiments associated with each faulty wafer

No. Exp. Fault names No. Exp. Fault names

1 29 TCP power +50a11 31 Cl2ﬂow +5

2 29 RF power -12 12 31 BCl3ﬂow -5

3 29 RF power +10 13 31 Pressure +2

4 29 Pressure +3 14 31 TCP power -20

5 29 TCP power +10 15 33 TCP power -15

6 29 BCl3ﬂow +5 16 33 Cl2ﬂow -10

7 29 Pressure -2 17 33 RF power -12

8 29 Cl2ﬂow -5 18 33 BCl3ﬂow +10

9 29 Helium chuck pressure 19 33 Pressure +1

10 31 TCP power +30 20 33 TCP power +20

aThe addition term in each fault name represents an oﬀset of the process variable from

its normal baseline value during the batch. For example, “TCP power +50” means that

the induced fault is an increase of 50 units in the TCP power.

4.2.2. Feature selection431

The etching process reﬂected in our data set is a typical nonlinear, mul-432

timodal process. For this reason, the ﬁlter function used to identify a 2-D433

embedding of the data is taken to correspond to that of t-SNE, a nonlinear434

dimensionality reduction method which, as previously mentioned, tends to435

map dissimilar measurements far apart in the low-dimensional space. The436

distance metric used between a given pair of 1445-dimensional data mea-437

surements is, therefore, taken to be the joint probability between the two,438

as deﬁned in Eq. (3). The resolution is 24 intervals per dimension with 80%439

overlap between adjacent intervals and the DBSCAN method is once again440

used for subset clustering. Fig. 5 shows that the generated topological net-441

work of the semiconductor data is separated into three subnetworks. This is442

consistent with the fact that the data sets collected from the three experi-443

ments have diﬀerent means and somewhat diﬀerent covariance structure. It444

is worth noting that faulty wafers 7 and 13 are two exceptions in the sense445

that each one is grouped with other wafers which originated from a diﬀerent446

experiment.447

19

0 0.1 0.2 0.90.80.70.60.5

0.4

0.3 1

Figure 5: Topological network derived from the semiconductor data at a speciﬁed resolu-

tion. Each node is colored based on the average output for the measurements included in

the node, where the output of a faulty wafer is 1 while the output of a normal wafer is 0.

Subgroups that consist of nodes containing measurements of faulty wafers are extracted

from each subnetwork.

We color each node based on the the average output values across all448

the measurements in the node. The output is either 0 or 1, representing a449

normal or a faulty wafer, respectively. As expected, measurements represent-450

ing faulty wafers are positioned at the boundary regions of each subnetwork.451

We conjecture that this is because each faulty wafer was induced diﬀerently,452

giving rise to diﬀerent behaviors in the wafer processing. We further iden-453

tify subgroups consisting of nodes containing measurements of faulty wafers454

in Fig. 5, as indicated by closed elliptical paths. Since the subgroups for455

faulty wafers 10, 12, 14, and 20 have extremely small sample size, they are456

excluded from the statistical tests for feature selection. For the rest of the457

subgroups, the Wilcoxon rank-sum tests are performed across all of the pro-458

cess variables throughout the batch. As a non-parametric alternative to the459

two-sample Student’s t-test, the Wilcoxon rank-sum test is able to handle460

small sample size for non-normal distributions. These tests are conducted461

between each subgroup of faulty wafers and the nodes corresponding to nor-462

mal wafers in the rest of its subnetwork, excluding those which belong to463

other subgroups of faulty wafers. The results of these tests are shown in464

20

Fig. 6, where they are organized by process variable in subﬁgure (a) and by465

batch record in subﬁgure (b).466

0

1No. 1, 5, 9

0

1No. 2

0

1No. 3, 6, 8, 4, 13

0

1

Indicator of statistical significance

No. 7, 11

0

1No. 15, 16

0

1No. 17

0

1No. 18

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

Process variable

0

1No. 19

(a) Features ordered by process variables

21

0

1No. 1, 5, 9

0

1No. 2

0

1No. 3, 6, 8, 4, 13

0

1

Indicator of statistical significance

No. 7, 11

0

1No. 15, 16

0

1No. 17

0

1No. 18

05 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85

Batch records

0

1No. 19

(b) Features ordered by batch records

Figure 6: Wilcoxon rank-sum test to identify the features that best diﬀerentiate between

faulty wafers and normal wafers. The features are ordered by (a) process variables and

(b) batch records, respectively. Statistically signiﬁcant features (p < 0.05) have values of

1 as represented by the blue lines.

By comparing the two rankings of the features, we ﬁnd that statistically467

signiﬁcant features (p < 0.05) are more concentrated within individual pro-468

cess variables than within individual batch records. For example, it is evident469

that process variable 17 (Vat valve) is strongly correlated with faulty wafers,470

while process variables 5 (Helium chuck pressure) has little impact on wafer471

failure. As in Section 4,1, we perform B-H procedure to adjust the p-values472

and count the occurrence of each statistically signiﬁcant feature throughout473

the batch for every process variable. The results for both raw and adjusted474

p-values are shown in Fig. 7. It is seen that the relative importance of the475

process variables remains more or less the same after B-H adjustment, es-476

pecially for the ﬁrst eight process variables. Hence, we only select the ﬁrst477

eight process variables for fault classiﬁcation.478

4.2.3. Predictive modeling479

To build a fault detection classiﬁer, we ﬁrst compute the column means480

throughout the batch for each variable and use them for the new feature481

values. The transformed data is then randomly split into a training set and482

22

17 04 16 12 08 10 06 07 11 14 09 13 02 01 03 15 05

Process variables

0

50

100

150

200

250

300

350

Frenquency

Without adjustment of p-values

With adjustment of p-values

Figure 7: Counts of statistically signiﬁcant features in terms of diﬀerentiating between

faulty and normal wafers for each process variable.

a testing set in the ratio of 7:3, where each set maintains the same pro-483

portion of normal and faulty wafers. The standard soft margin C-support484

vector machine (SVM) classiﬁer with a Gaussian kernel, as implemented in485

LIBSVM [25], is employed for fault classiﬁcation. The cost factor Cand the486

variance σof the Gaussian kernel are tuned using 10-fold cross-validation on487

the training set using an iterative grid search. We start a coarse grid search488

with exponentially growing sequences of Cand γﬁrst, thereafter proceeding489

with ﬁner grid searches in the vicinity of the optimal region yielded by the490

previous grid search. Each grid search includes a total of 50 pairs of (C, γ)491

values which are used to apply the training model. To illustrate the perfor-492

mance of the fault classiﬁers, receiver operating characteristic (ROC) curves493

for the testing set with all process variables and with selected variables are494

reported in Fig. 8. As seen in Fig. 8, the fault classiﬁer with the eight selected495

process variables outperforms the classiﬁer which uses all process variables,496

indicating the eﬀectiveness of the former variables in predicting wafer failure.497

Meanwhile, about 18% reduction in the computational time is achieved from498

∼1.1s to ∼0.9s of each run.499

23

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False positive rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

True positive rate

All process variables

Selected process variables

Random

Figure 8: ROC curves of Gaussian kernel SVM classiﬁers on the data with all process

variables and with selected process variables.

5. Conclusion500

In this paper, we apply a powerful TDA tool, the Mapper algorithm, for501

predictive analysis of a chemical manufacturing process data set for yield502

prediction and a semiconductor etch process data set for fault detection. We503

show that the Mapper algorithm adds a new perspective to the traditional504

means of feature selection and provide critical insights hidden in the complex505

data. Through direct visualization, we generate an abstract view of the data506

to facilitate a better understanding of the casual relationships between the507

features and manufacturing system outputs. The contributions of the work508

are summarized below:509

•To the best of our knowledge, we successfully demonstrate the value510

of any TDA method in the manufacturing systems domain for the ﬁrst511

time.512

•We eﬀectively detect structural information present in manufacturing513

systems data, which is highly valuable as it allows identiﬁcation of514

subgroups of interest for targeted hypothesis testing with respect to515

the diﬀerences in the observed patterns.516

24

•We show that just using the identiﬁed features with the most signif-517

icant causal relationships provides a similarly high level of prediction518

accuracy as achieved with the complete set of features but with sub-519

stantially reduced training times.520

Thus, our results open a feasible path for eﬃcient manufacturing process521

monitoring and control especially in complex systems with a large number522

of process variables. In the future, we plan to embed the Mapper algorithm523

in a sparse sensing framework to further reduce the need for measurements524

in an optimal manner. We further aim to combine the Mapper algorithm525

with existing machine learning techniques to increase the robustness of our526

approach and yield a practical method which is more suitable to the context527

of high-dimensional, heterogeneous manufacturing data in general.528

Acknowledgments529

We would like to thank the anonymous reviewers for their helpful com-530

ments.531

[1] J. Tlusty, Manufacturing processes and equipment, Prentice Hall, 2000.532

[2] J. F. MacGregor, T. Kourti, Statistical process control of multivariate533

processes, Control Engineering Practice 3 (1995) 403–414.534

[3] B. Sch¨olkopf, A. Smola, K.-R. M¨uller, Nonlinear component analysis as535

a kernel eigenvalue problem, Neural computation 10 (1998) 1299–1319.536

[4] R. Rosipal, L. J. Trejo, Kernel partial least squares regression in re-537

producing kernel hilbert space, Journal of machine learning research 2538

(2001) 97–123.539

[5] G. Carlsson, Topology and data, Bulletin of the American Mathematical540

Society 46 (2009) 255–308.541

[6] G. Singh, F. M´emoli, G. E. Carlsson, Topological methods for the anal-542

ysis of high dimensional data sets and 3D object recognition, in: Euro-543

graphics Symposium on Point-Based Graphics, pp. 91–100.544

[7] Y. Yao, J. Sun, X. Huang, G. R. Bowman, G. Singh, M. Lesnick, L. J.545

Guibas, V. S. Pande, G. Carlsson, Topological methods for exploring546

low-density states in biomolecular folding pathways, The Journal of547

Chemical Physics 130 (2009) 144115.548

25

[8] G. Sarikonda, et al., CD8 T-cell reactivity to islet antigens is unique549

to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2550

diabetes, Journal of autoimmunity 50 (2014) 77–82.551

[9] M. Nicolau, A. J. Levine, G. Carlsson, Topology based data analysis552

identiﬁes a subgroup of breast cancers with a unique mutational proﬁle553

and excellent survival 108 (2011) 7265–7270.554

[10] W. Guo, A. G. Banerjee, Toward automated prediction of manufacturing555

productivity based on feature selection using topological data analysis,556

in: Proceedings of IEEE International Symposium on Assembly and557

Manufacturing, pp. 31–36.558

[11] Q. P. He, J. Wang, Fault detection using the k-nearest neighbor rule for559

semiconductor manufacturing processes, IEEE Transactions on Semi-560

conductor Manufacturing 20 (2007) 345–354.561

[12] Q. P. He, J. Wang, Large-scale semiconductor process fault detection562

using a fast pattern recognition-based method, IEEE Transactions on563

Semiconductor Manufacturing 23 (2010) 194–200.564

[13] Y. Li, X. Zhang, Diﬀusion maps based k-nearest-neighbor rule technique565

for semiconductor manufacturing process fault detection, Chemometrics566

and Intelligent Laboratory Systems 136 (2014) 47–57.567

[14] Z. Zhou, C. Wen, C. Yang, Fault detection using random projections568

and k-nearest neighbor rule for semiconductor manufacturing processes,569

IEEE Transactions on Semiconductor Manufacturing 28 (2015) 70–79.570

[15] F. Famili, W.-M. Shen, R. Weber, E. Simoudis, Data pre-processing571

and intelligent data analysis, International Journal on Intelligent Data572

Analysis 1 (1997).573

[16] C.-T. Su, T. Yang, C.-M. Ke, A neural-network approach for semicon-574

ductor wafer post-sawing inspection, IEEE Transactions on Semicon-575

ductor Manufacturing 15 (2002) 260–266.576

[17] P. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson,577

M. Alagappan, J. Carlsson, G. Carlsson, Extracting insights from the578

shape of complex data using topology, Scientiﬁc Reports 3 (2013).579

26

[18] J. W. Milnor, Morse theory, 51, Princeton University Press, 1963.580

[19] L. V. D. Maaten, G. Hinton, Visualizing data using t-SNE, Journal of581

Machine Learning Research 9 (2008) 2579–2605.582

[20] M. Kuhn, K. Johnson, Applied predictive modeling, Springer, 2013.583

[21] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm584

for discovering clusters in large spatial databases with noise, in: Pro-585

ceedings of 2nd International Conference on Knowledge Discovery and586

Data Mining, volume 96, pp. 226–231.587

[22] Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a588

practical and powerful approach to multiple testing, Journal of the589

Royal Statistical Society, Series B 57 (1995) 289–300.590

[23] D. Yekutieli, Y. Benjamini, Disovering the false discovery rate, Journal591

of Statiatical Planning and Inference 82 (1999) 171–196.592

[24] B. M. Wise, et al., A comparison of principal component analysis, mul-593

tiway principal component analysis, trilinear decomposition and parallel594

factor analysis for fault detection in a semiconductor etch process, Jour-595

nal of Chemometrics 13 (1999) 379–396.596

[25] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines,597

ACM Transactions on Intelligent Systems and Technology (TIST) 2598

(2011) 27.599

27