ArticlePDF Available

Combination of statistical process control (SPC) methods and classification strategies for situation assessment of batch process

Authors:

Abstract and Figures

The paper focuses on the development of a classification strategy to identify critic situation in batch process control. Data acquired from a batch execution is reduced by means of multiway principal component analysis in order to be assessed according to the statistical model of the process. Multiple situations have been categorized by a classification algorithm applied to the principal components in order to identify misbehaviour causes.
Content may be subject to copyright.
Combination of statistical process control (SPC)
methods and classification strategies for situation
assessment of batch process
Magda Ruiz, Joan Colomer, Joaquim Mel´endez
Dept.de Electr`onica, Inform`atica i Autom`atica
Universitat de Girona
Campus Montilivi, Edifici PIV
17071 Girona - Spain
{mlruizo,colomer,quimmel}@eia.udg.es
Abstract
The paper focuses on the development of a classification strategy to identify critic situation in batch process
control. Data acquired from a batch execution is reduced by means of multiway principal component analysis
in order to be assessed according to the statistical model of the process. Multiple situations have been
categorized by a classification algorithm applied to the principal components in order to identify misbehaviour
causes.
Palabras clave: Multiway Principal Component Analysis (MPCA), situation assessment, Batch Processes.
1. Introduction
Many strategies for fault detection and diagnosis
are referenced in the bibliography. According to
[18], fault diagnosis methods can been classify in
three general categories: quantitative model ba-
sed methods, qualitative model based methods
and process history based methods, illustrated by
figure 1.
Figure 1. Classification of diagnostic
algorithms, according to [18]
The solution proposed in this work falls in the
third category; and particularly in the subgroup
of statistical methods. A biological batch process
for the treatment of wastewater has been used to
develop and test the supervision method.
Multivariate Statistic Process Control (MSPC)
methods have shown to be effective in detec-
ting and diagnosing events that cause a signifi-
cant change in the dynamic correlation structure
among the process variables [3] some examples
are: polymerization reactor process [12], pharma-
ceutical process [7], the elaboration at industrial
scale of the polymer polypropylene oxide [20],
WasteWater Treatment Plant [6] among others.
These variables utilize the information directly
and systematically and scientifically recognize the
Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial. No 29 (2006), pp. 99-107.
ISSN: 1137-3601. c
AEPIA (http://www.aepia.org/revista)
ARTÍCULO
normal operation behavior of the process. Diffe-
rent applications have been proposed in the lite-
rature according to this principle. In [2] a strategy
to isolate sensors that are affected by nonconfor-
ming operation is described. It allows to distin-
guish between failed sensors and process upsets.
In [4] MSPC is combined with wavelet properties,
in this way was created adaptive multiscale MP-
CA in order to detect abnormal behaviors and to
identify the major sources of process disturban-
ces.
In this work a combination between MSPC and a
classification tool is proposed. The combination of
both methods improves the rsults obtained using
only MSPC. The paper describes the operation
of the SBR process in section 2. Then, section 3
is focused on those MSPC extensions for process
monitoring. Section 4, the classification method
is presented. And finally in section 5 and subse-
quent a example is presented and evaluated using
data acquired from the real plant.
2. Biological batch process
A WasteWater Treatment Pilot Plant has been
used in this work. The plant operates as a Bat-
ch Reactor (SBR) as Figure 2 depicts. In a SBR
wastewater treatment plant nitrogen removal and
elimination of organic matter is done with sludge.
Sludge is responsible for the organic matter de-
gradation and nitrogen removal. SBR Pilot plant
is composed of a metal square reactor with a ca-
pacity of 200 liters of water to process. Wastewa-
ter is taken directly from a real station sited in
Girona (Spain). Next, the wastewater is pumped
to the reactor where the treatment is performed.
Figure 2. Real SBR pilot plant
In SBR pilot plant the nitrogen and organic mat-
ter are removed after a 8 hours cycle in which
anoxic and aerobic stages are alternated. In an
aerobic stage the ammonia is converted to nitrate
and under anoxic condition nitrate is converted to
nitrogen gas. Four process variables are monito-
red: pH, Oxidation Potential Reduction (ORP),
Oxygen Dissolved (OD) and Temperature. The
process is highly nonlinear, time-varying and sub-
ject to significant disturbances such as atmosp-
heric changes, variation in the composition of in-
fluent. The process has been characterized statis-
tically by its covariance matrix in order to study
the correlation structure between variables and
streams of them.
3. MSPC for batch processes
MSPC is a reduction technique based on classical
statistical process control (SPC) theory extend to
operate with multiple variables. Nowadays, it has
also been adapted to characterise batch proces-
ses by considering as an additional dimension the
number of batches (execution of a process accor-
ding to a recipe) assuming the same length (same
number of samples). The bases of MSPC for batch
processes are the extensions of Principal Compo-
nent Analysis (PCA) and Partial Least Squares
(PLS) [11][6][10][5]. Extensions of principal com-
ponent analysis are described in the next section.
3.1. Multiway principal component
analysis (MPCA)
Consider a typical batch run in which j=1,2,...,J
variables are measured at k=1,2,...,Ktime ins-
tants throughout the batch. Similar data will
exist on a number of such batch runs i=1,2,...,I.
All the data has been summarized in the X(IxJ
xK) array illustrated in figure 3, where different
batches are organized along the vertical side, the
measurement variables along the horizontal side,
and their time evolution occupant the third di-
mension. Each horizontal slice through this array
is a (JxK) data matrix representing the time
histories or trajectories for all variables of a sin-
gle batch (i). Each vertical slice is an (IxJ)
matrix representing the values of all the variables
for all batches at a common time interval (k) [10]
[19].
100 Inteligencia Artificial Vol. 10 No 29 (2006)
Figure 3. Arrangement of a three-way array
X
MPCA is equivalent to performing ordinary PCA
on a large two-dimensional (2 D) matrix cons-
tructed by unfolding the three-way. Six ways of
unfolding the three-way data matrix Xare pos-
sible [20]. In this work the unfolding (IK xJ) in
variable direction and (IxKJ ) in batch direc-
tion are used. Undey and Cinar inspired in Wold
[16] uses type (IK xJ) (figure 4), motives wit-
hin on-line monitoring of the batch process. The
unfolding corresponds to type (IxKJ ) is used
by Nomikos and MacGregor [10] (figure 5), this
unfolding is particularly meaningful because, by
subtracting the mean of each column of this ma-
trix X, these procedures are subtracting the mean
trajectory of each variable, thereby removing the
main nonlinear and dynamic components in the
data [9].
Figure 4. Decomposition of X to 2-D (IK xJ)
Figure 5. Decomposition of X to 2-D (IxKJ )
The objective of MPCA is to decompose the
three-way X, into a large two-dimensional matrix
X. This decomposition is accomplished in accor-
dance with the principal of PCA and separates
the data in an optimal way into two parts: The
noise or residual part (E), which is as small as
possible in a least squares sense, and the syste-
matic part (PR
r=1 trNPr), which expresses it
as one fraction (t) related only to batches and a
second fraction (P) related to variables and their
time variation [10]. The MPCA algorithm derives
directly from the NIPALS algorithm , resulting
the matrix X. It is the product of score vector tr
and loading matrices Pr, plus a residual matrix
E, that is minimized in a least-squares sense:
X=
R
X
r=1
trOPr(1)
X=
R
X
r=1
trPT
r+E=ˆ
X+E(2)
MPCA decomposes the three-way Xarray where
Ndenotes the Kronecker product (X=tNPis
X(i, j, k) = t(i)P(j, k)) and Rdenotes the num-
ber of principal components retained. The equa-
tion (1) is the 3-D decomposition while the equa-
tion (2) displays the more common 2-D decom-
position [16].
3.2. Multiblock multiway principal
component analysis (MMP-
CA)
In this case the data matrix X(IxKJ) is divi-
ded into Kblocks (X1, X2, ..., XK) in such a way
that the variables from each time instant can be
blocked in the same block (see figure 6) [6][16].
This approach has significant benefits because the
latent variable structure is allowed to change at
each phase in the batch processes. In the lower
layer of the model, each data block is conside-
red as a separate source of information and the
details of the blocks are modelled by correspon-
ding block model. In the upper layer, information
from all blocks on the lower data level is combi-
ned and the relative importance of the different
blocks, Xb., for each dimension is obtained. In the
upper layer information from the previous block,
block scores tb(k1) , is combined with the block
score vector from the lower layer [17][15].
Inteligencia Artificial Vol. 10 No 29 (2006) 101
Figure 6. Dividing batch data into different
phases
3.3. Control charts
Abnormal behavior of batch can be identified
by projecting the batch onto the model. Control
charts that are used in monitoring batch proces-
ses are generally based on the the Q-statistic and
D-statistic, in which control limits are used to de-
termine whether the process is in control or not.
The Q-statistic is a measure of the lack of fit with
the established model. For batch number i,Qiis
calculated as:
Qi=
J
X
j=1
K
X
k=1
(ejk )2gx2
(h)(3)
where ejk are the elements of E.Qiindicates the
distance between the actual values of the batch
and the projected values onto the reduced space.
The D-statistic or Hotelling T2statistic, measu-
res the degree to which data fit the calibration
model:
Di=tT
iS1tiI(IR)
R(I21)FR,IR(4)
where Sis the estimated covariance matrix of
the scores. The D-statistic gives a measure of the
Mahalanobis distance in the reduced space be-
tween of batch and the origin that designates the
point with average batch process behavior.
4. Classification method
For classification, the Learning Algorithm for
Multivariate Data Analysis (LAMDA) has been
used [1]. This method takes advantage of hybrid
logical connectives to perform a soft bounded
classification.
LAMDA is proposed as a classification technique
to apply to principal components selected for mo-
nitoring. The goal is to assess the actual situation
according to profiles previously learned [1][8].
Figure 7. Basic LAMDA recognition
methodology
Input data is presented to LAMDA as a set of ob-
servations or individuals characterized by its des-
criptors or attributes and recorded as rows. Prin-
cipal components obtained in the MPCA step are
used as input variables to be classified. Once, the
descriptors are loaded, every individual is proces-
sed individually according to the desired goal [1]:
1. To classify the individuals according to a
known and fixed set of classes.
2. To learn and adapt from a previous given
set classes which can be modified according
to the new individuals.
3. To discover and learn representative parti-
tions in the training set.
The basic assignment of an individual to a class
follows the procedure represented by figure 7. In
this, MAD and GAD stand for Marginal (it takes
into account only one attribute) and Global Ade-
quacy Degree (obtained from the hybrid logical
combination of the previously obtained MADs)
respectively, of an individual to a given class.
Equations (5) and (6) are used to calculate them.
This classifying structure resembles that of a sin-
gle neuron ANN [1].
102 Inteligencia Artificial Vol. 10 No 29 (2006)
MAD(dixji/k ) = ρdixj
i/k (1 ρi/k)1dixj(5)
where
dixj= Descriptor iof the object
j ρi/k =ρof descriptor iand class k
GAD =βT (M AD) + (1 β)S(MAD) (6)
Formalizing the description of LAMDA, it is pos-
sible to define an individual as a series of des-
criptors values d1, ...,dnsuch that each djtakes
values from the either finite or infinite set Dj. We
will call universe or context to the Cartesian pro-
duct U=D1xD2... x Dj. Thus, any object or
individual is represented as a vector x= (x1,...,
xn) from U, such that each component xjexpres-
ses the value for the descriptor djin the object x.
The subset of Ugathering all these vectors will
be called data base or population. To assign indi-
viduals to classes MAD step will be calculated for
each individual, every class and each descriptor,
and these partial results will be aggregated in or-
der to get the GAD of an individual to a class.
The simplest way to build this system would be
by using probability distributions functions, and
aggregating them by the simple product, but that
would force us to impose a series of hypothesis on
the data distribution and independence which are
too arbitrary. Finally, MAD and GAD have been
used according to definitions of equation 5 and
equation 6 respectively [1]. The hybrid connective
used for GAD is a combination between a t-norm
and a t-conorm by means of the βparameter. β=
0 represents the intersection and β= 1 means the
union. This parameter will -inversely- determine
the exigency level of the classification, so it can be
identified as a tolerance or exigency parameter.
5. Results
5.1. Types of batch process
The data obtained from the SBR process was
analyzed under to points of view. The first one,
based on analytical methods proposed in [13]
where the sludge reaction is explained. The se-
cond one, was a preliminary MSPC analysis whe-
re some batches are detected to be outside the
control limit. This study create five types of SBR
batch process: Electrical fault, variation com-
position, atmospheric changes (corresponding to
rain), equipment defects and normal behavior.
According to the classification it is possible to
quantify the number of batches for each group,
in the Table 1 all batches of the SBR process are
summarised. There are 60 (equivalent to 33,5 %)
batches with abnormal behavior. The normal
behavior was the most common type (66,5 %)
with a higher nitrogen efficiency than legally re-
quired effluent standards, which are classified ac-
cording to the final quality of the wastewater.
Table 1. Batch classification by group
5.1.1. MPCA: batch direction
Each batch lasts 8 hours (5760 samples for each
variable sampled every 5 seconds). Only 392 sam-
ples from each one of the four acquired variables
have been used in order to reduce computatio-
nal cost resulting a I xK xJ = 179x4x392 array,
(X) for the collection of 179 available batches.
MPCA algorithm was applied to the three-way
data array, Xunfolded in the batch direction (I
xKJ) resulting 8 principal component. So, the
new dimensionality becomes (179 x 8). The sta-
tistical model was created with eight components,
which explain 92,79 % of the total variability. To
examine the process data in the reduced pro jec-
tion spaces (defined by a small number of latent
variables), the variables contribution analysis are
made; as is shown in Figure 8 the temperature
variable is positively correlated with loadings 1
where can be appreciate that in sample 1113 had
a increase of the temperature. From the Figure 9
represents loadings 2 where Load2 represents at
ORP variable.
Inteligencia Artificial Vol. 10 No 29 (2006) 103
Figure 8. Variable loadings for the principal
component
Figure 9. Variable loadings for the second
component
Figure 10. Multiway PCA. Qand T2charts
with 92,79 % confidence limits
Figure 10 shows the Qand T2charts for all pro-
cess batches. In the Qchart, it can be seen that
some batches exceed its limits. These batches ha-
ve several behaviors. In T2, two batches are out-
side. These batches had electrical fault (EF).
In Table 2 the batches outside the model are sum-
marised. In the Qchart, only a third of the to-
tal the abnormal behavior is detected, further-
more there are 8 false alarms. The T2chart has
20 batches with abnormal behavior (without fal-
se alarm). 39 about 60 of the abnormal behavior
can be detected, 9 batches are in both charts.
Table 2. MPCA classification
5.1.2. MPCA: variable direction
Three-data matrix Xhas been unfolded in varia-
ble direction too (IK xJ). The model was deve-
loped with dimensions (70168 x 4), where MPCA
squeezes in 3 principal components explaining the
95,18 % of the total variability. In figure 11 a pro-
jection on the first and second component plane
of the statistical model.
Figure 11. Variables weighs and model in
variable direction
The batches are sequentially ordered and there
are 3 sections into the model. Each section corres-
ponds to batch gathered during specific seasons.
Test of SBR process match the first month with
monitoring; Spring is the batches developed in
104 Inteligencia Artificial Vol. 10 No 29 (2006)
spring season finally summer correspond to cy-
cles in summer season. Temperature contribution
was demonstrated to be less important than ot-
hers variables (0,25 of first component) in con-
sequence, it was omitted and a new model was
built using only 3 variables. Figure 12 shows how
the the new model is equally representative.
Figure 12. Variables weighs and model in
variable direction without Temperature
variable
5.1.3. Multiblock MPCA
The SBR pilot plant consists of 6 stages in which
the latent variable structure can change due to
different environments. Applying Multiway MP-
CA, the data matrix Xcan be break. In this
way, it is possible to work with the total three-
way data array, X, with dimensions 179 x 4 x
5760. Data array for each stage are: cycle 1 (179
x 4 x 780);cycle 2 (179 x 4 x 780);cycle 3 (179
x 4 x 780);4 (179 x 4 x 780);cycle 5 (179 x 4 x
780);cycle 6 (179 x 4 x 804);purge cycle (179 x
4 x 36);settling cycle (179 x 4 x 720);draw cycle
(179 x 4 x 300). Using the control charts by ea-
ch stage, it is possible to observe the following:
Batches 11 to 17 have variation in the compo-
sition and these batches are identified by the Q
and T2control charts. The alarms by each stage
are summarized in Table 3 (common batches are
discounted). Purge, settling and draw are stages
without nitrogen removal, they have more false
alarms than other stages.
Table 3. Alarms by each stage
These Multiblock charts supply knowledge by sta-
ges which potentially help to fault location and
diagnosis. Furthermore the data interpretation is
easier. Some types of batch process with large du-
ration fault have been found to be present in the
6 stages models, for example the batches 10 to 16.
5.1.4. Conclusions of MSPC
Initially with the combination of MPCA and
analytical methods it was possible to classify all
batches. Individually, the model developed with
MPCA in batch direction has produced satisfac-
tory results because knowledge of the process was
obtained while the model developed with MPCA
in variable direction allows to detect the relations-
hip between process behavior and environment
(rain period, summer, among others) and Multi-
block MPCA gives detail of the process. In ge-
neral, MSPC has been used to detect abnormal
behavior in SBR process, by projecting the da-
ta into a lower dimensional space that accurately
characterizes the state of the process. The use of
a classification tool to the new variables allows a
simple identification and grouping of similar si-
tuations according to a matching criteria.
5.2. Classification for situation as-
sessment
Initially, MPLS was used. This technique is a di-
mensionality reduction that maximises the rela-
tion between the matrix X(IxJK ) and the
predicted matrix Y[14] (179 x 5) where 179 is
the number of historical data batches and 5 are
the types of batch process. The model make did
not describe the process because matrix Ywas
created with the results obtained of the prelimi-
nary MSPC analysis. Matrix Yshould be cons-
tructed with quality variables which are obtained
each three days, finding now the missing problem.
Thus, a classification tool for situation assessment
is used: MPCA + classification tool.
5.3. MPCA classification
ˆ
Xis the principal components by each batch with
dimensions 8 x 179. These were sued as descrip-
tors to feed into LAMDA algorithm to discover
relevant classes under an unsupervised schema.
The tool automatically classified the data in ele-
ven classes (11). Table 4 compares the classes and
the types of batch process. According to these re-
sults, it is possible to identify classes that only
contain batches with equipment defects, electrical
faults, atmospheric changes and variation in the
composition. The classes 1,9 and 10 correspond
to normal behavior. The group 6 is associated to
atmospheric changes. Classes 3 and 11 represent
Inteligencia Artificial Vol. 10 No 29 (2006) 105
variations in the composition while classes 7 and
8 include electrical fault. Finally, the classes 2, 4
and 5 groups different types of batches. The pre-
dominant class (class 1) has 48,04 % of the total
historical data, this class represents the normal
behavior. The class 5 is abnormal behavior due
to atmospheric changes and equipment defects.
Table 4. Composition by class
The relationship between the class and princi-
pal components is another observation. The 8th
component is less predominant because it does
not change. It indicates that ˆ
Xcan be compu-
ted using only seven descriptors. Then, the to-
tal variability will be 90,54 %. Consequently, only
7 principal components are used in the analysis
Multiblock MPCA.
5.4. MMPCA classification
According to previous analysis, there are seven
principal components (seven descriptors for each
stage). Classification tool is used individually at
every stage taking the whole set of batches. It
resulted that at different stages the numbers of
classes was very also different. Likewise to MP-
CA classification, the classes were marked (Table
5). Table 6 summarises the error for this classifica-
tion. Other observation: electrical Fault is present
in cycles 2 and 6 because one batch experimented
a sags in two cycles (Remember Table 1). Types
of normal behavior are the classes more popula-
ted.
Table 5. Classes by each cycle
Table 6. Classes by each cycle
6. Conclusions
Multivariate Statistical Process Control has been
used to detect abnormal behavior in SBR process
by projecting the data into a lower dimensional
space that accurately characterizes the state of
the process. Therefore, the new variable matrix
is smaller. The use of a classification tools has
been teste with previously known data to verify
the utility of it to discover clusters of data in the
historical registers useful for further situation as-
sessment. MSPC and classification tool. Splitting
data into meaningful groups allows a faster loca-
lization and identification of faults reporting si-
milar experiences.
In order to improve the results and to process the
data faster, it is necessary to developed a tech-
nique that combine the dimensionality reduction
and nonlinear classification instead of the clas-
sical strategy. The use of a classification tool to
the new variables allows a simple identification
and grouping of similar situations according to a
matching criteria.
Acknowledgement
This work is part of the research project De-
velopment of a intelligent control system apply
to a Sequencing Batch Reactor by loads (SBR)
for the elimination of organic matter, nitrogen
and phosphorus DPI2005-08922-C02-02 suppor-
ted by the Spanish Government and the FEDER
Founds.
Referencias
[1] J. Aguliar-Martin and R. Lopez. The process
of classification and learning the meaning of
linguistic descriptors of concepts. Approxi-
mate Reasoning in Decision Analysis, pages
165–175, 1982.
[2] Fuat Doymaz, Jose Romagnoli, and Ahmet
Palazoglu. A strategy for detection and
isolation of sensor failures and process up-
sets. Chemometrics and Intelligent Labora-
tory Systems, 55:109–123, 2000.
106 Inteligencia Artificial Vol. 10 No 29 (2006)
[3] Alberto Ferrer, editor. Control Estadisti-
co MegaVariante para los Procesos del Siglo
XXI. 27 Congreso Nacional de Estadistica e
Investigacion Operativa (Spain), 2003.
[4] Dae Sung Lee, Jong Moon Park, and Peter
Vanrolleghem. Adaptive multiscale principal
component analysis for on-line monitoring of
a sequencing batch reactor. Journal of Bio-
technology, 116:195–210, 2005.
[5] Dae Sung Lee and Peter A. Vanrolleghem.
Adaptive consensus principal component
analysis for on-line batch process monito-
ring. Technical report, Fund for Scientific
Reseach - Flander (F.W.O.) and the Ghent
University Resarch Fund, Coupure Links
653, B-9000 Gent, Belgium, 2003.
[6] Dae Sung Lee and Peter A. Vanrolleghem.
Monitoring of a sequencing batch reactor
using adaptive multiblock principal compo-
nent analysis. Biotechnology and Bioenginee-
ring, 82(4):489–497, mai 2003.
[7] J.A. Lopes, J.C. Menezes, J.A. Westerhuis,
and A.K. Smilde. Multiblock pls analysis of
an industrial pharmaceutical process. Biote-
chnol Bioeng, (80):419–427, 2002.
[8] K. Moore. Using neural nets to analyse quali-
tative data. A Marketing Research, 7(1):35–
39, 1995.
[9] Paul Nomikos and John MacGregor. Multi-
variate spc charts for monitoring batch pro-
cess. Technometrics, 37(1):41–59, feb 1995.
[10] Paul Nomikos and John F. MacGregor.
Monitoring batch processes using multiway
principal component analysis. AIChE,
40(8):1361–1375, aug 1994.
[11] Paul Nomikos and John F. MacGregor.
Multi-way partial least squares in monitoring
batch processes. First International Chemo-
metrics InterNet Conference, 1994.
[12] Aras Norvilas, Eric Tatara, Antoine Negiz,
Jeffrey DeCicco, and Ali Cinar, editors. Mo-
nitoring and fault diagnosis of a polymeriza-
tion reactor by interfacing knowledge based
and multivariate SPM tools, number 0-7803-
453. American Control Conference, 1998.
[13] S. Puig, M.T. Vives, Ll. Corominas, M.D.
Balaguer, and J. Colprim, editors. Was-
tewater nitrogen removal in SBRs, applying
a step-feed strategy: From Lab-Scale to pilot
plant operation. 3aIWA Specialised Confe-
rence on Sequencing Batch Reactor, Austra-
lia, feb 2004.
[14] Evan L. Russell, Leo H. Chiang, and Ri-
chard D. Braatz. Data-driven techniques
for fault detection and diagnosis in chemical
processes ”Advances in Industrial Control”.
ISBN 1-85233-258-1, London, 2000.
[15] Age K. Smilde, Johan A. Westerhuis, and
Ricard Boqu´e. Multiway multiblock compo-
nent and covariates regression models. Jour-
nal of Chemometrics, 14:301–331, 2000.
[16] Cenk Undey and Ali Cinar. Statistical moni-
toring of multistage, multiphase batch pro-
cesses. IEEE Control Systems Magazine,
22(5):40–52, oct 2002.
[17] Cenk Undey, Bruce A. Williams, and Ali
Cinar, editors. Monitoring of Batch Phar-
maceutical Fermentations: Data Synchroni-
zation, Landmark Alignment, and Real-Time
Monitoring. 15th Triennial World Congress,
Barcelona Spain 2002 IFAC, 2002.
[18] V. Venkatasubramanian, R. Rengaswamy,
K. Yin, and S. Kavuri. A review of process
fault detection and diagnosis part i quanti-
tative model-based methods. Computer and
Chemical Engineering, 27:293–311, 2003.
[19] Johan A. Westerhuis, Theodora Kourti, and
John F. MacGregor. Analysis of multiblock
and hierarchical pca and pls models. Journal
of Chemometrics, 12:301–321, 1998.
[20] Manuel Zarzo and Alberto Ferrer. Batch
process diagnosis: Pls with variable selection
versus block-wise pcr. Chemometrics and
intelligent laboratory systems, 73:15–27, jun
2004.
Inteligencia Artificial Vol. 10 No 29 (2006) 107
Article
Background The main goal of wastewater treatment is to obtain high quality effluent. This study proposes a methodology to estimate in real-time the effluent quality in a biological nutrient removal (BNR) sequencing batch reactor (SBR) process. ResultsThis is achieved by: (i) detecting the batch quality; and (ii) predicting the classification of the release according to different effluent characteristics. A principal component analysis (PCA) model is built to discern normal or abnormal behavior of the batch release. An index is given to every phase of the process by means of contribution analysis, and a fault signature (FS) is created. The FS in a classification model is associated with a biological removal quality. Conclusion The model is applied as a soft-sensor in real-time to new batch releases to obtain a qualitative estimate of the effluent. A correct estimation for the qualitative variables, of above 95%, would provide a reliable tool to estimate BNR performances.© 2012 Society of Chemical Industry
Article
In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number of components can be selected for each block. Furthermore, the method can be generalized to incorporate multiway blocks to which any multiway model can be applied. The method is a direct extension of principal covariates regression and therefore works in a simultaneous fashion in which a clearly defined objective criterion is minimized. It can be tuned to fulfil the requirements of the user. Algorithms to calculate the components will be presented. The method will be illustrated with two three-block examples and compared to existing approaches. The first example is with two-way data and the second example is with a three-way array. It will be shown that predictions are as good as with the existing methods, but because for most blocks fewer components are required, diagnostic properties of the method are improved. Copyright (C) 2000 John Wiley & Sons, Ltd.
Article
Fault detection and diagnosis is an important problem in process engineering. It is the central component of abnormal event management (AEM) which has attracted a lot of attention recently. AEM deals with the timely detection, diagnosis and correction of abnormal conditions of faults in a process. Early detection and diagnosis of process faults while the plant is still operating in a controllable region can help avoid abnormal event progression and reduce productivity loss. Since the petrochemical industries lose an estimated 20 billion dollars every year, they have rated AEM as their number one problem that needs to be solved. Hence, there is considerable interest in this field now from industrial practitioners as well as academic researchers, as opposed to a decade or so ago. There is an abundance of literature on process fault diagnosis ranging from analytical methods to artificial intelligence and statistical approaches. From a modelling perspective, there are methods that require accurate process models, semi-quantitative models, or qualitative models. At the other end of the spectrum, there are methods that do not assume any form of model information and rely only on historic process data. In addition, given the process knowledge, there are different search techniques that can be applied to perform diagnosis. Such a collection of bewildering array of methodologies and alternatives often poses a difficult challenge to any aspirant who is not a specialist in these techniques. Some of these ideas seem so far apart from one another that a non-expert researcher or practitioner is often left wondering about the suitability of a method for his or her diagnostic situation. While there have been some excellent reviews in this field in the past, they often focused on a particular branch, such as analytical models, of this broad discipline. The basic aim of this three part series of papers is to provide a systematic and comparative study of various diagnostic methods from different perspectives. We broadly classify fault diagnosis methods into three general categories and review them in three parts. They are quantitative model-based methods, qualitative model-based methods, and process history based methods. In the first part of the series, the problem of fault diagnosis is introduced and approaches based on quantitative models are reviewed. In the remaining two parts, methods based on qualitative models and process history data are reviewed. Furthermore, these disparate methods will be compared and evaluated based on a common set of criteria introduced in the first part of the series. We conclude the series with a discussion on the relationship of fault diagnosis to other process operations and on emerging trends such as hybrid blackboard-based frameworks for fault diagnosis.
Article
Data from a batch chemical process have been analysed in order to diagnose the causes of variability of a final quality parameter. The trajectories of 47 process variables from 37 batches have been arranged in a matrix by using alignment methods. Two different approaches are compared to diagnose the key process variables: PLS with variable selection and block-wise PCR. The application of Unfold Partial Least Squares Regression (U-PLS) leads to one significant component. By means of weight plots, the variables most correlated with the final quality are identified. Nevertheless, with observed data, it is not possible to know if correlation is due to causality (and hence related to a critical point) or is due to other causes. Pruning PLS models by using variable selection methods and technical information of the process has allowed the process variables most correlated with the final quality to be revealed. The application of Principal Component Regression to the trajectories of the process variables (block-wise PCR) has given straightforward results without requiring a deep knowledge of the process. The results obtained have been used to propose several hypotheses about the likely key process variables that require a better control, as a previous step to conducting further studies for process diagnosis and optimisation, like experimental designs.
Article
In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number of components can be selected for each block. Furthermore, the method can be generalized to incorporate multiway blocks to which any multiway model can be applied. The method is a direct extension of principal covariates regression and therefore works in a simultaneous fashion in which a clearly defined objective criterion is minimized. It can be tuned to fulfil the requirements of the user. Algorithms to calculate the components will be presented. The method will be illustrated with two three-block examples and compared to existing approaches. The first example is with two-way data and the second example is with a three-way array. It will be shown that predictions are as good as with the existing methods, but because for most blocks fewer components are required, diagnostic properties of the method are improved. Copyright © 2000 John Wiley & Sons, Ltd.
Article
The problem of using time-varying trajectory data measured on many process variables over the finite duration of a batch process is considered. Multiway principal-component analysis is used to compress the information contained in the data trajectories into low-dimensional spaces that describe the operation of past batches. This approach facilitates the analysis of operational and quality-control problems in past batches and allows for the development of multivariate statistical process control charts for on-line monitoring of the progress of new batches. Control limits for the proposed charts are developed using information from the historical reference distribution of past successful batches. The method is applied to data collected from an industrial batch polymerization reactor.
Article
Multiblock and hierarchical PCA and PLS methods have been proposed in the recent literature in order to improve the interpretability of multivariate models. They have been used in cases where the number of variables is large and additional information is available for blocking the variables into conceptually meaningful blocks. In this paper we compare these methods from a theoretical or algorithmic viewpoint using a common notation and illustrate their differences with several case studies. Undesirable properties of some of these methods, such as convergence problems or loss of data information due to deflation procedures, are pointed out and corrected where possible. It is shown that the objective function of the hierarchical PCA and hierarchical PLS methods is not clear and the corresponding algorithms may converge to different solutions depending on the initial guess of the super score. It is also shown that the results of consensus PCA (CPCA) and multiblock PLS (MBPLS) can be calculated from the standard PCA and PLS methods when the same variable scalings are applied for these methods. The standard PCA and PLS methods require less computation and give better estimation of the scores in the case of missing data. It is therefore recommended that in cases where the variables can be separated into meaningful blocks, the standard PCA and PLS methods be used to build the models and then the weights and loadings of the individual blocks and super block and the percentage variation explained in each block be calculated from the results. © 1998 John Wiley & Sons, Ltd.
Article
Multivariate statistical methods are used to analyze data from an industrial batch drying process. The objective of the study was to uncover possible reasons for major problems occurring in the quality of the product produced in the process. Partial least-squares (PLS) methods were able to isolate which group of variables in the chemistry, in the timing of the various stages of the batch, and in the shape of the time-varying trajectories of the process variables were related to a poor-quality product. The industrial study illustrates the approach and the power of these multivariate methods for troubleshooting problems occurring in complex batch processes. Several new variations in the multivariate PLS methodology for the analysis of batch data are also implemented. In particular, an application utilizing a novel approach to the time warping of the trajectories for batches, and the subsequent use of the time-warping information, is presented. The use of the time history of the PLS weights of the process variable trajectories to diagnose problems in the dynamic operation of the batches is also clearly illustrated, as is the use of contribution plots for finding features which distinguished between the operating histories of good and bad batches.
Article
A novel approach is proposed to isolate sensors that are affected by the root cause of nonconforming operation and to distinguish between failed sensors and process upsets. Systems having multivariate nature can be monitored by building a principal component analysis (PCA) model using historical data. T2 and sum-of-squared-prediction error (SPE) of the calibration model facilitate fault detection and isolation on-line. These two measures are complementary in explaining the events captured and not captured by the model. In this paper, we put more emphasis on the importance of using the T2 and the SPE together for fault detection and identification. Correlation coefficient criterion was utilized to infer about the state of the correlation structure between one sensor and its closest neighbor for distinguishing between sensor failures and process upsets. Faulty measurements were reconstructed from available sensors using the calibration model and an optimization algorithm which in turn unveiled more process upsets. The strategy is illustrated on a benchmark industrial liquid-fed ceramic melter.