Article

Multivariate SPC Methods for Process and Product Monitoring

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Statistical process control methods for monitoring processes with multivariate measurements in both the product quality variable space and the process variable space are considered. Traditional multivariate control charts based on X2 and T2 statistics are shown to be very effective for detecting events when the multivariate space is not too large or ill-conditioned. Methods for detecting the variable(s) contributing to the out-of-control signal of the multivariate chart are suggested. Newer approaches based on principal component analysis and partial least squares are able to handle large ill-conditioned measurement space; they also provide diagnostics which can point to possible assignable causes for the event. The me hods are illustrated on a simulated process of a high pressure low density polyethylene reactor, and examples of their application to a variety of industrial processes are referenced.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... PCA is a MSPC technique that has been used as a monitoring approach in many industrial processes (Kourti and MacGregor, 1996;Shlens, 2005;Ferrer, 2007). PCA is capable of finding the principal sources of variability in the space of the measured variables, thus allowing the dimensionality of the original space to be reduced in order to fit into a new one with the minimum number of uncorrelated variables required to explain the process trends (known as latent variables or components) (Camacho et al., 2009;Banguero et al., 2020). ...
... Although PCA does not provide much information to develop a fault isolation approach, Contribution Analysis (Kourti and MacGregor, 1996;Ferrer, 2007) is a technique which has been proposed as a first attempt for fault isolation. It computes the influence of each variable to the computed value of and 2 statistics. ...
... where , are elements of P 1∶ • Overall Average variable contributions. Because it is very common that more than one score has a high value when a fault is detected, it is very useful to compute the overall average contribution per variable, instead of drawing a bar plot for every score with a high value (Kourti and MacGregor, 1996). ...
... In order to evaluate the model performance when projecting the n-th observation z n onto it, the Hotelling's T 2 in the latent space T 2 n and the Squared Prediction Error SPE n are calculated [12]: ...
... PLS models provide a great capability for diagnosing assignable causes [26]. By using contribution plots [12] the underlying PLS model can be interrogated to reveal the group of regressor variables making the greatest contributions to the deviations in the SPE and/or the scores. Although these plots will not unequivocally diagnose the root causes of the deviations, they will provide a great insight to find them. ...
Article
Full-text available
We present a novel Latent Space-based Multivariate Capability Index (LSb-MCpk) aligned with the Quality by Design initiative and used as a criterion for ranking and selecting suppliers for a particular raw material used in a manufacturing process. The novelty of this new index is that, contrary to other multivariate capability indexes that are defined either in the raw material space or in the Critical Quality Attributes (CQAs) space of the product manufactured, this new LSb-MCpk is defined in the latent space connecting both spaces. This endows the new index with a clear advantage over classical ones as it quantifies the capacity of each raw material supplier of providing assurance of quality with a certain confidence level for the CQAs of the manufactured product before manufacturing a single unit of the product. All we need is a rich database with historical information of several raw material properties along with the CQAs. Besides, we present a novel methodology to carry out the diagnosis for assignable causes when a supplier does not score a good capability index. The proposed LSb-MCpk is based on Partial Least Squares (PLS) regression, and it is illustrated using data from both an industrial and a simulation study.
... As for outlier detection, principal component analysis (PCA) was used (Kourti & MacGregor, 1996) traditonally best known for its use in dimensionality reduction. To ensure data integrity, preprocessing consisted of mitigating the issue of missing values in our dataset. ...
... To detect outliers, Kourti and MacGregor (1996) methodology employs PCA analysis to ascertain the optimal number of components that sufficiently capture the underlying structure in the data. Fig. 1 shows the relationship between the explained variance percentage and the ranking of principal components within the PCA model. ...
Article
Full-text available
Advances in computation and web-scraping techniques allow us to use real-time information from company websites and social media platforms, namely digital footprint indicators. Additionally to their real-time availability, these indicators are easily accessible, making them a potentially practical tool for monitoring a company’s level of competitiveness. Therefore, this article aims to obtain a multivariate analysis method to accurately predict the wineries’ competitiveness group, applying the methodology to a sample of Valencian wineries to explore the association between digital footprint indicators and competitiveness. Unsupervised learning techniques were implemented to detect outliers and clusters in observations using financial variables obtained from the Sistema de Análisis de Balances Ibéricos (SABI). Thus, clustering was used to identify groups of wine companies differentiating the sample of companies according to their competitiveness characteristics. Subsequently, footprint indicators were used to create multivariate models to predict the above classification of companies. Also, this methodology permits the study of which digital indicators are essential in this prediction, specifically the presence of words on the web and others regarding online company activities on social networks associated with competitiveness. This research provides practical guidance for developing and incorporating the essential digital indicators, which could be applied in wineries and any study of companies’ competitiveness.
... O monitoramento de bancos de dados de elevado dimensional, provenientes de processos multivariados em bateladas, através de cartas de controle baseadas em componentes principais foi alvo de ampla investigação [Jackson & Mudholkar, 1979;Kourti & MacGregor, 1996;MacGregor, 1997;Nomikos & MacGregor, 1994, 1995. Esses esquemas utilizam a análise de componentes principais multidirecionais (MPCA) para promover a redução de dimensionalidade dos bancos de dados do processo. ...
... O objetivo deste estudo é verificar como diferentes métodos de alinhamento e sincronização afetam a habilidade de corretamente classificar um conjunto de bateladas de conchagem do chocolate ao leite como conformes e não-conformes. A maioria das pesquisas sobre monitoramento de processos em bateladas estão focadas na fase de monitoramento de processo (Fase II do Controle Estatístico de Processo, CEP), no qual os parâmetros das cartas de controle já estão disponíveis após as análises das bateladas de referência [Kourti, 2003;Kourti & MacGregor, 1996;MacGregor 1997;Nomikos & MacGregor, 1995]. O presente estudo inova ao propor o desenvolvimento de um modelo de monitoramento sob controle (Fase I do CEP), analisando o impacto que diferentes métodos de alinhamento e sincronização apresentam na determinação dos conjuntos de referência de bateladas conformes. ...
... This fault isolation strategy was first introduced by Miller et al (1993), and MacGregor and Kourti (1995) in order to ease the diagnostic task. It was later applied by Kourti and MacGregor (1996) in order to isolate the faulty variables of a high-pressure low-density polyethylene reactor. ...
... This contribution formula has been used by Kourti and MacGregor (1996) and Westerhuis et al (2000) for T 2 statistic and the SPE index can be written as, ...
Article
Full-text available
Data driven methods have been recognized as an efficient tool of multivariate statistical process control (MSPC). Contribution plots are also well known as the popular tool of principal components analysis (PCA), which is used for isolating sensor fault without need of any priori information. However, studies carried out in the literature unified contribution plots in three general approaches. Furthermore, they demonstrated that correct diagnosis based on contribution plots is not guaranteed for both single and multiple sensor faults. Therefore, to deal with this issue, the present paper highlights a new formula of contribution called relative variation of contribution (rVOC). Simulation results show that the proposed method of contribution can successfully perform the fault isolation task, in comparison with partial decomposition contribution (PDC) and its relative version (rPDC) based on their fault isolation rate (FIR).
... In order to evaluate the model performance of an observation, the Hotelling T 2 in the latent space and the Squared Prediction Error SPE are calculated [15]. The Hotelling T 2 statistic of an observation is the estimated squared Mahalanobis distance from the center of the latent subspace to the projection of such observation onto this subspace. ...
... Firstly, since the SMB-PLS also can model the orthogonal variations in process conditions by the second block of latent variables, it provides a great capability for diagnosing assignable causes of such variations. In fact, by interrogating the underlying SMB-PLS model, one can extract diagnostic or contribution plots which reveal the group of process conditions making the greatest contributions to the deviations in the squared prediction errors, and the scores [15,25]. In addition to that, the second block of latent variables provides a better understanding of the response variability with respect to both PLS and SMB-PLS for only [Z X corr ] (this increases the response variance percentage up to 95.87%). ...
Article
Full-text available
The Sequential Multi-Block Partial Least Squares (SMB-PLS) model inversion is applied for defining analytically the multivariate raw material region providing assurance of quality with a certain confidence level for the critical to quality attributes (CQA). The SMB-PLS algorithm does identify the variation in process conditions uncorrelated with raw material properties and known disturbances, which is crucial to implement an effective process control system attenuating most raw material variations. This allows expanding the specification region and, hence, one may potentially be able to accept lower cost raw materials that will yield products with perfectly satisfactory quality properties. The methodology can be used with historical/happenstance data, typical in Industry 4.0. This is illustrated using simulated data from an industrial case study.
... SPC is a well-known concept to monitor the condition of a process over time with the objective of detecting any anomalous process behavior that affects the performance of a process (Ferrer, 2007;Kourti & MacGregor, 1996 , 2015). Within the discrete manufacturing domain, the practical application of SPC is typically based on univariate measurements of predefined quality characteristics of manufactured parts that are sampled in equidistant time intervals from the process (Montgomery, 2009). ...
... Within the discrete manufacturing domain, the practical application of SPC is typically based on univariate measurements of predefined quality characteristics of manufactured parts that are sampled in equidistant time intervals from the process (Montgomery, 2009). Literature agrees that this univariate post-process SPC-scheme is outdated, since it ignores the large amount of available process data in today's data-rich manufacturing environments (Ferrer, 2014;Kourti & MacGregor, 1996;MacGregor, 1997;Woodall, 2017). Thus, in recent years, researchers in discrete manufacturing encouraged to shift this paradigm and transcend from univariate post-process SPC to MSPC by using sensor data collected from the machining process that are analyzed with machine learning (ML) methods to evaluate the process condition (Biegel et al., 2022a(Biegel et al., , 2022bLi et al., 2020;Qiu & Xie, 2021). ...
Article
Full-text available
Self-supervised learning has demonstrated state-of-the-art performance on various anomaly detection tasks. Learning effective representations by solving a supervised pretext task with pseudo-labels generated from unlabeled data provides a promising concept for industrial downstream tasks such as process monitoring. In this paper, we present SSMSPC a novel approach for multivariate statistical in-process control (MSPC) based on self-supervised learning. Our motivation for SSMSPC is to leverage the potential of unsupervised representation learning by incorporating self-supervised learning into the general statistical process control (SPC) framework to develop a holistic approach for the detection and localization of anomalous process behavior in discrete manufacturing processes. We propose a pretext task called Location + Transformation prediction, where the objective is to classify both, the type and the location of a randomly applied augmentation on a given time series input. In the downstream task, we follow the one-class classification setting and apply the Hotelling’s T2T^2 T 2 statistic on the learned representations. We further propose an extension to the control chart view that combines metadata with the learned representations to visualize the anomalous time steps in the process data which supports a machine operator in the root cause analysis. We evaluate the effectiveness of SSMSPC with two real-world CNC-milling datasets and show that it outperforms state-of-the-art anomaly detection approaches, achieving 100%100\% 100 % and 99.6%99.6\% 99.6 % AUROC, respectively. Lastly, we deploy SSMSPC at a CNC-milling machine to demonstrate its practical applicability when used as a process monitoring tool in a running process.
... Historical databases in industry do not often contain a sufficient number of abnormal samples for building supervised models; hence, a larger portion of the MSPM literature consists of unsupervised monitoring methods. Principal component analysis (PCA) is the most widely used unsupervised dimension reduction method employed in MSPM [2] ; however, PCA is known to possess two main drawbacks. First, the highest variance components derived by PCA might differ from the non-Gaussian latent components in the process; second, the linear dimension reduction offered by PCA may yield a poor representation of a non-linear process. ...
... In the presence of highly collinear, or missing data, computation of T 2 may be problematic. [2] Consequently, after dimension reduction via PCA, T 2 may be computed using the first d PCs. ...
Article
Full-text available
Dimension reduction is an essential method used in multivariate statistical process monitoring for fault detection and diagnosis. Principal component analysis (PCA) and independent component analysis (ICA) are the most frequently used linear dimensional reduction tools, and the contribution plot is the most popular fault isolation method in the absence of any prior information on the faults. These methods, however, come with their shortcomings. The fault detection capability of linear methods may not be sufficient for non‐linear processes, and smearing effect is known to deteriorate the diagnostics obtained from contribution plots. While the fault detection rate may be increased by kernelized methods or deep artificial neural network models, tuning data‐dependent hyperparameter(s) and network structure with limited historical data is not an easy task. Furthermore, the resulting non‐linear models often do not directly possess fault isolation capability. In the current study, we aim to devise a novel method named ICApIso‐PCA, which offers non‐linear fault detection and isolation in a rather straightforward manner. The rationale of ICApIso‐PCA mainly involves building a non‐linear scores matrix, composed of principal component scores and high‐order polynomial approximated isomap embeddings, followed by implementation of the ICA‐PCA algorithm on this matrix. Applications on a toy dataset and the Tennessee Eastman plant show that the I² index from ICApIso‐PCA yields a high fault detection rate and offers accurate contribution plots with diminished smearing effects compared to those from traditional monitoring methods. Easy implementation and the potential for future research are further advantages of the proposed method.
... In order to evaluate the model performance when projecting the n-th observation z n onto it, the Hotelling T 2 in the latent space T 2 n and the Squared Prediction Error SPE zn are calculated [20]: ...
... In the case of having one specification limit (i.e., third scenario), Eq. (19) or Eq. (20), as appropriate, would be used at α significance level. ...
Article
Full-text available
A novel methodology is proposed for defining multivariate raw material specifications providing assurance of quality with a certain confidence level for the critical to quality attributes (CQA) of the manufactured product. The capability of the raw material batches of producing final product with CQAs within specifications is estimated before producing a single unit of the product, and, therefore, can be used as a decision making tool to accept or reject any new supplier raw material batch. The method is based on Partial Least Squares (PLS) model inversion taking into account the prediction uncertainty and can be used with historical/happenstance data, typical in Industry 4.0. The methodology is illustrated using data from three real industrial processes.
... Una vez detectado un fallo, se hace el diagnóstico para saber dónde está el problema que ha aparecido en la instalación. Para ello se pueden aplicar técnicas como el Análisis del discriminante de Fisher [5,6], redes neuronales, diagrama de contribuciones [11], etc. ...
... Cuando se produzcan estas alarmas, interesa saber qué variable ha contribuido más para provocar el fallo. Esto se puede hacer aplicando el diagrama de contribución [5,11]. ...
Conference Paper
En este trabajo se pretende diseñar un control tolerante a fallos, es decir un sistema que sea capaz de detectar cambios en el sistema que puedan provocar que el proceso abandone su modo de funcionamiento normal, y reconfigurar el controlador para que el sistema siga funcionando a pesar de los fallos. El sistema está formado por un controlador base, en este caso un control predictivo no lineal, un módulo de detección e identificación de fallos basado en el análisis de componentes principales (PCA) y un método de reconfiguración del controlador, que en este caso consiste en sustituir los sensores que fallan por un sensor software calculado con información del resto de las variables del sistema. Este sistema se ha probado en una planta de tratamiento de aguas residuales, y los resultados obtenidos muestran un funcionamiento adecuado a pesar de los fallos.
... In this way, the AMFCC method can adapt to OC conditions that are characterized by different optimal parameters for testing. Furthermore, a diagnostic procedure based on the contribution plot approach (Kourti and MacGregor, 1996;Kourti, 2005) is developed to identify the set of functional variables responsible for the OC condition. ...
Preprint
Full-text available
New data acquisition technologies allow one to gather huge amounts of data that are best represented as functional data. In this setting, profile monitoring assesses the stability over time of both univariate and multivariate functional quality characteristics. The detection power of profile monitoring methods could heavily depend on parameter selection criteria, which usually do not take into account any information from the out-of-control (OC) state. This work proposes a new framework, referred to as adaptive multivariate functional control chart (AMFCC), capable of adapting the monitoring of a multivariate functional quality characteristic to the unknown OC distribution, by combining p-values of the partial tests corresponding to Hotelling T2T^2-type statistics calculated at different parameter combinations. Through an extensive Monte Carlo simulation study, the performance of AMFCC is compared with methods that have already appeared in the literature. Finally, a case study is presented in which the proposed framework is used to monitor a resistance spot welding process in the automotive industry. AMFCC is implemented in the R package funcharts, available on CRAN.
... Also CM-based weak classifiers have been recently proposed [97]. Still, further un-or partly-explored research lines can be easily envisioned in this context: as examples, adapting algorithms like SIMCA, OC-PLS and PLS-DM for the analysis of non-linear data structures (relying, e.g., on the principle of non-linear kernel transformations [98]) or designing tools for the visualization of the importance or relevance of the recorded variables in SIMCA and UNEQ models [99] (exploiting, for instance, the ideas behind the well-established contribution plots [100] and/or the projection of pseudo-samples [101,102]) may represent intriguing subjects of study. Finally, another aspect that would be worth investigating is the possibility of evaluating the uncertainty associated to the classification of individual samples. ...
... Multivariate Statistical Process Control (MSPC) (Jackson & Mudholkar, 1979), (Kourti & MacGregor, 1996), (Kresta, Macgregor, & Marlin, 1991), (Westerhuis, Gurden, & Smilde, 2000), (Wise & Gallagher, 1996) is attractive data-driven approach for such purpose, which is suitable to monitor complex processes with high-dimensional data structure. The key idea of MSPC is subspace orthogonalization where Principal Component Analysis (PCA) is often utilized. ...
Article
This paper presents a new process monitoring and fault diagnosis approach based on a modified Multivariate StatisticalProcess Control (MSPC) and evaluates its applicability to municipal wastewater treatment process monitoring. Firstly,a conventional MSPC, based on Principal Component Analysis (PCA), is adjusted to provide an easy-to-understand userinterface and then a new yet simplified reconfigurable diagnostic model is introduced. The user interface that has beendeveloped is designed to integrate MSPC seamlessly with existing process monitoring systems that use the so-called trendgraphs. The proposed diagnostic model is constructed by aggregating small models with either one or two inputs, whichenhances the tractability of the diagnostic model. The effectiveness of the modified MSPC is demonstrated through a series of offline and online experiments, using a set of real multivariate process data from a municipal wastewater treatment.plant.
... latent) common cause events that are driving the process. Conventional univariate and multivariate SPC charts are not suitable to be used in these environments and, hence, several authors advocate the use of multivariate statistical process control based on latent variable models such as Principal Component Analysis (PCA) and Partial Least Squares (PLS) [1][2][3][5][6][7]. The major advantages of these models are their ability to utilize the information contained in all the measured variables simultaneously resulting in a much more powerful monitoring schemes for detecting deviations from normal operating conditions. ...
Article
Full-text available
The sequential multi-block partial least squares (SMB-PLS) is proposed for implementing a multivariate statistical process control scheme. This is of interest when the system is composed of several blocks following a sequential order and presenting correlated information, for instance, a raw material properties block followed by a process variables block that is manipulated according to raw material properties. The SMB-PLS uses orthogonalization to separate correlated information between blocks from orthogonal variations. This allows monitoring the system in different stages considering only the remaining orthogonal part in each block. Thus, the SMB-PLS increases the interpretability and process understanding in the model building (Phase I), since it provides a deep insight about the nature of the system variations. Besides, it prevents any special cause from propagating to subsequent blocks enabling their use in the model exploitation (Phase II). The methodology is applied to a real case study from a food manufacturing process.
... Contribution plots are normally used for this purpose, as they provide a clear graphical representation of each variable's contribution to the anomaly score. In PCA, different methods exist to calculate the contributions when the process is monitored through Hotellings's 2 (Miller et al., 1998) (Westerhuis et al., 2000) (Kourti & MacGregor, 1996). The main difference among those techniques lies in which terms of the linear combination defining a score are considered as contributions. ...
Preprint
Full-text available
This work introduces a novel control-aware distributed process monitoring methodology based on modules comprised of clusters of interacting measurements. The methodology relies on the process flow diagram (PFD) and control system structure without requiring cross-correlation data to create monitoring modules. The methodology is validated on the Tennessee Eastman Process benchmark using full Principal Component Analysis (f-PCA) in the monitoring modules. The results are comparable to nonlinear techniques implemented in a centralized manner such as Kernel PCA (KPCA), Autoencoders (AE), and Recurrent Neural Networks (RNN), or distributed techniques like the Distributed Canonical Correlation Analysis (DCCA). Temporal plots of fault detection by different modules show clearly the magnitude and propagation of the fault through each module, pinpointing the module where the fault originates, and separating controllable faults from other faults. This information, combined with PCA contribution plots, helps detection and identification as effectively as more complex nonlinear centralized or distributed methods.
... As prevalent as the unsupervised methods might be in industrial applications, still a sizable portion of the data analytics applications rely heavily on supervised methods. Besides the predictive and prescriptive models, even the schemes in SPC can be enhanced using supervised learning methods such as Partial Least Squares (PLS) (Kourti and MacGregor 1996). Yet, once again, the critical issue remains the availability of the output data. ...
... MSPM, nowadays, leans more towards data-based methods due to (i) the difficulty of deriving accurate mechanistic and/or knowledge-based models for nonlinear chemical processes, (ii) the recent developments in machine learning, and (iii) the existence of large historical databases of operational data in industry (Park et al. 2020a;Khalid et al. 2023). The traditional approach in MSPM consists of employing dimension reduction techniques, such as Principal Components Analysis (PCA) and Projection to Latent Structure (PLS) (Kourti and MacGregor 1996), to historical data in order to determine a latent variable subspace of "in-control" variation, and a residual subspace comprising noisy fluctuations along physical and/or operational constraints. Usually, each subspace is represented by a single surrogate variable (Hotelling's T 2 and SPE), which is monitored via Shewhart-type univariate chart. ...
Article
Current multivariate statistical process monitoring is mostly based on data-based models with the principal aim of detecting faults promptly. To increase fault detection performance, various methods, such as novel learners, sliding window-based methods, subspaces based on query point estimation residuals, and feature/component selection methods have been proposed. On the other hand, hierarchical and combined modeling have only been recently considered; furthermore, the online sampled observations, once assessed by the monitoring scheme, are not usually used again for fault detection. In the current study, we show how to obtain valuable information on faults via re-examining the recently sampled points in a conveniently built hierarchical monitoring scheme. The top level consists of a combination of a novel query point estimation method based on multilinear principal component analysis (PCA) and PCA model of the estimation residuals. Upon a warning signal from the upper level, the bottom level is implemented, that consists of retrospective PCA monitoring of the recently sampled observations, scaled with respect to estimation residuals. Implementation of the proposed scheme on a demonstrative process and Tennessee Eastman Plant data exhibits decrease both in fault detection delay and missed detection rate compared to both traditional and the recently proposed methods.
... In the multivariate setting, Nomikos and MacGregor [45,46] and Kourti and MacGregor [39] introduced methods to improve SPM of batch processes by monitoring the influence on quality characteristics of several covariates observed over a discrete time domain. Moreover, in their works, the authors addressed the problem of dimensionality reduction by projection methods of a multivariate domain, such as principal component analysis (PCA), and the joint monitoring of the covariates alone. ...
... Extending the framework to lower levels of granularity, i.e., to each independent record, is another clear path for future work, enabling the versioning of streaming datasets with single-record creation events. On this matter, the PCA framework already presents established schemes from Statistical Process Control for the online detection of unsupervised anomalies 18,28 . Finally, future research should also explore alternative data drift metrics, especially for datasets of reduced dimensions, given the results obtained with dataset DS 04 (Figs. 2, 3, 6, and 5). ...
Article
Full-text available
This paper presents a standardised dataset versioning framework for improved reusability, recognition and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration. The framework adopts a software engineering-like data versioning nomenclature (“major.minor.patch”) and incorporates data schema principles to promote reproducibility and collaboration. To quantify changes in statistical properties over time, the concept of data drift metrics (d) is introduced. Three metrics (dP, dE,PCA, and dE,AE) based on unsupervised Machine Learning techniques (Principal Component Analysis and Autoencoders) are evaluated for dataset creation, update, and deletion. The optimal choice is the dE,PCA metric, combining PCA models with splines. It exhibits efficient computational time, with values below 50 for new dataset batches and values consistent with seasonal or trend variations. Major updates (i.e., values of 100) occur when scaling transformations are applied to over 30% of variables while efficiently handling information loss, yielding values close to 0. This metric achieved a favourable trade-off between interpretability, robustness against information loss, and computation time.
... Some other control charts designed for monitoring high-dimensional processes are based on the maximum, summation, or other summaries of the CUSUM charting statistics constructed for monitoring individual quality variables (cf., Mei (2010), Tartakovsky et al. (2006), Zou et al. (2015)). In addition, some PCA-based control chart have been developed for monitoring high-dimensional processes, where the PCA technique is used for reducing the dimensionality of quality variables (cf., Ferrer (2007), Jackson (1991), Kourti and MacGregor (1996)). ...
... Methods derived from contribution analysis include complete decomposition contribution (CDC), partial decomposition contribution (PDC), and diagonal contribution (DC), which have been proposed [15]. Kourti [16] conducted a contribution analysis for a high-pressure, low-density polyethylene reactor. Liu [17] proposed a modified contribution plot-based approach to reduce the back-burying effect of non-fault variables. ...
Article
Full-text available
Accurate and timely fault detection and isolation (FDI) improve the availability, safety, and reliability of target systems and enable cost-effective operations. In this study, a shared nearest neighbor (SNN)-based method is proposed to identify the fault variables of a circulating fluidized bed boiler. SNN is a derivative method of the k-nearest neighbor (kNN), which utilizes shared neighbor information. The distance information between these neighbors can be applied to FDI. In particular, the proposed method can effectively detect faults by weighing the distance values based on the number of neighbors they share, thereby readjusting the distance values based on the shared neighbors. Moreover, the data distribution is not constrained; therefore, it can be applied to various processes. Unlike principal component analysis and independent component analysis, which are widely used to identify fault variables, the main advantage of SNN is that it does not suffer from smearing effects, because it calculates the contributions from the original input space. The proposed method is applied to two case studies and to the failure case of a real circulating fluidized bed boiler to confirm its effectiveness. The results show that the proposed method can detect faults earlier (1 h 39 min 46 s) and identify fault variables more effectively than conventional methods.
... To further understand the dependency of the "Temp" response and "Month, Year" factors, Model Driven Multivariate T-Square Control Chart Mason (2002), Tracy (1992), Kourti (1996), Nomikos (1995] was shown in Fig.8. From the left T-Square chart, the biggest outlier was happened on 12/1989. ...
... Concerning this aspect, it must be noticed that, in Ref. [15], Wold and Sjöström had already defined measures for the discriminant and modelling power of a variable, but both these measures have to be somehow adapted for being possibly exploited in the framework of the most recent SIMCA variants. Alternatively, graphical representations such as the well-established contributions plots [73] could be resorted to, but, to the best of the authors' knowledge, they are not readily suitable for dealing with joint distance indices like d, c or f. ...
Article
This article contains a comprehensive tutorial on classification by means of Soft Independent Modelling of Class Analogy (SIMCA). Such a tutorial was conceived in an attempt to offer pragmatic guidelines for a sensible and correct utilisation of this tool as well as answers to three basic questions: "why employing SIMCA?", "when employing SIMCA?" and "how employing/not employing SIMCA?". With this purpose in mind, the following points are here addressed: i) the mathematical and statistical fundamentals of the SIMCA approach are presented; ii) distinct variants of the original SIMCA algorithm are thoroughly described and compared in two different case-studies; iii) a flowchart outlining how to fine-tune the parameters of a SIMCA model for achieving an optimal performance is provided; iv) figures of merit and graphical tools for SIMCA model assessment are illustrated and v) computational details and rational suggestions about SIMCA model validation are given. Moreover, a novel Matlab toolbox, which encompasses routines and functions for running and contrasting all the aforementioned SIMCA versions is also made available.
... In 2016, Camacho et al. proposed the use of the Multivariate Statistical Network Monitoring (MSNM) framework [9] [15] as an improvement to previous PCA proposals. In essence, MSNM is an adaptation from a sibling framework traditionally used in the field of industrial process control, known as MSPC (Multivariate Statistical Process Control) [24] [25] [26]. In order to face the particularities of the networking field, MSNM adapted the MSPC methodology to introduce new data pre-processing strategies and processing steps, like the deparsing of network traces [11]. ...
Preprint
Full-text available
Network anomaly detection is a very relevant research area nowadays, especially due to its multiple applications in the field of network security. The boost of new models based on variational autoencoders and generative adversarial networks has motivated a reevaluation of traditional techniques for anomaly detection. It is, however, essential to be able to understand these new models from the perspective of the experience attained from years of evaluating network security data for anomaly detection. In this paper, we revisit anomaly detection techniques based on PCA from a probabilistic generative model point of view, and contribute a mathematical model that relates them. Specifically, we start with the probabilistic PCA model and explain its connection to the Multivariate Statistical Network Monitoring (MSNM) framework. MSNM was recently successfully proposed as a means of incorporating industrial process anomaly detection experience into the field of networking. We have evaluated the mathematical model using two different datasets. The first, a synthetic dataset created to better understand the analysis proposed, and the second, UGR'16, is a specifically designed real-traffic dataset for network security anomaly detection. We have drawn conclusions that we consider to be useful when applying generative models to network security detection.
... The CD index decomposes a particular statistic into its contributing components, such that the sum of all variable contributions yields the value of the detection statistic itself. The application of contribution plots for statistical process control (SPC) was introduced for batch processes by the authors in [33,34] . It has since been successfully implemented in many industrial applications. ...
Article
Full-text available
Multiscale PCA (MSPCA) is a well-established fault-detection and isolation (FDI) technique. It utilizes wavelet analysis and PCA to extract important features from process data. This study demonstrates limitations in the conventional MSPCA fault detection algorithm, thereby proposing an enhanced MSPCA (EMSPCA) FDI algorithm that uses a new wavelet thresholding criterion. As such, it improves the projection of faults in the residual space and the threshold estimation of the fault detection statistic. When tested with a synthetic model, EMSPCA resulted in a 30% improvement in detection rate with equal false alarm rates. The EMSPCA algorithm also relies on the novel application of reconstruction-based fault isolation at multiple scales. The proposed algorithm reduces fault smearing and consequently improves fault isolation performance. The paper will further investigate the use of soft vs. hard wavelet thresholding, decimated vs. undecimated wavelet transforms, the choice of wavelet decomposition depth, and their implications on FDI performance.The FDI performance of the developed EMSPCA method was illustrated for sensor faults. This undertaking considered synthetic data, the simulated data of a continuously stirred reactor (CSTR), and experimental data from a packed-bed pilot plant. The results of these examples show the advantages of EMSPCA over existing techniques.
... With respect to this objective, stable and controlled processes embody a key ingredient in the avoidance of unplanned downtimes and quality issues, and are thus mandatory to remain competitive [1]. Statistical Process Control (SPC) is a well known framework to eliminate the variability of a process and to detect abnormal process behaviour [2,3]. The common procedure in SPC involves the application of univariate control charts, e.g. ...
Conference Paper
Full-text available
Detecting abnormal conditions in manufacturing processes is a crucial task to avoid unplanned downtimes and prevent quality issues. The increasing amount of available high-frequency process data combined with advances in the field of deep autoencoder-based monitoring offers huge potential in enhancing the performance of existing Multivariate Statistical Process Control approaches. We investigate the application of deep auto encoder-based monitoring approaches and experiment with the reconstruction error and the latent representation of the input data to compute Hotelling’s T2 and Squared Prediction Error monitoring statistics. The investigated approaches are validated using a real-world sheet metal forming process and show promising results.
... With respect to this objective, stable and controlled processes embody a key ingredient in the avoidance of unplanned downtimes and quality issues, and are thus mandatory to remain competitive [1]. Statistical Process Control (SPC) is a well known framework to eliminate the variability of a process and to detect abnormal process behaviour [2,3]. The common procedure in SPC involves the application of univariate control charts, e.g. ...
Article
Detecting abnormal conditions in manufacturing processes is a crucial task to avoid unplanned downtimes and prevent quality issues. The increasing amount of available high-frequency process data combined with advances in the field of deep autoencoder-based monitoring offers huge potential in enhancing the performance of existing Multivariate Statistical Process Control approaches. We investigate the application of deep auto encoder-based monitoring approaches and experiment with the reconstruction error and the latent representation of the input data to compute Hotelling’s T² and Squared Prediction Error monitoring statistics. The investigated approaches are validated using a real-world sheet metal forming process and show promising results.
... Two statistical metrics Hotelling T 2 and Q-statistic are used for process monitoring. T 2 measures the variability of each sample Kourti and MacGregor [1996] calculated as follows : ...
Thesis
Full-text available
The last decade has seen remarkable advances in speech, image, and language recognition tools that have been made available to the public through computer and mobile devices’ applications. Most of these significant improvements were achieved by Artificial Intelligence (AI)/ deep learning (DL) algorithms (Hinton et al., 2006) that generally refers to a set of novel neural network architectures and algorithms such as long-short term memory (LSTM) units, convolutional networks (CNN), autoencoders (AE), t-distributed stochastic embedding (TSNE), etc. Although neural networks are not new, due to a combination of relatively novel improvements in methods for training the networks and the availability of increasingly powerful computers, one can now model much more complex nonlinear dynamic behaviour by using complex structures of neurons, i.e. more layers of neurons, than ever before (Goodfellow et al., 2016). However, it is recognized that the training of neural nets of such complex structures requires a vast amount of data. In this sense manufacturing processes are good candidates for deep learning applications since they utilize computers and information systems for monitoring and control thus generating a massive amount of data. This is especially true in pharmaceutical companies such as Sanofi Pasteur, the industrial collaborator for the current study, where large data sets are routinely stored for monitoring and regulatory purposes. Although novel DL algorithms have been applied with great success in image analysis, speech recognition, and language translation, their applications to chemical processes and pharmaceutical processes, in particular, are scarce. The current work deals with the investigation of deep learning in process systems engineering for three main areas of application: (i) Developing a deep learning classification model for profit-based operating regions. (ii) Developing both supervised and unsupervised process monitoring algorithms. (iii) Observability Analysis It is recognized that most empirical or black-box models, including DL models, have good generalization capabilities but are difficult to interpret. For example, using these methods it is difficult to understand how a particular decision is made, which input variable/feature is greatly influencing the decision made by the DL models etc. This understanding is expected to shed light on why biased results can be obtained or why a wrong class is predicted with a higher probability in classification problems. Hence, a key goal of the current work is on deriving process insights from DL models. To this end, the work proposes both supervised and unsupervised learning approaches to identify regions of process inputs that result in corresponding regions, i.e. ranges of values, of process profit. Furthermore, it will be shown that the ability to better interpret the model by identifying inputs that are most informative can be used to reduce over-fitting. To this end, a neural network (NN) pruning algorithm is developed that provides important physical insights on the system regarding the inputs that have positive and negative effect on profit function and to detect significant changes in process phenomenon. It is shown that pruning of input variables significantly reduces the number of parameters to be estimated and improves the classification test accuracy for both case studies: the Tennessee Eastman Process (TEP) and an industrial vaccine manufacturing process. The ability to store a large amount of data has permitted the use of deep learning (DL) and optimization algorithms for the process industries. In order to meet high levels of product quality, efficiency, and reliability, a process monitoring system is needed. The two aspects of Statistical Process Control (SPC) are fault detection and diagnosis (FDD). Many multivariate statistical methods like PCA and PLS and their dynamic variants have been extensively used for FD. However, the inherent non-linearities in the process pose challenges while using these linear models. Numerous deep learning FDD approaches have also been developed in the literature. However, the contribution plots for identifying the root cause of the fault have not been derived from Deep Neural Networks (DNNs). To this end, the supervised fault detection problem in the current work is formulated as a binary classification problem while the supervised fault diagnosis problem is formulated as a multi-class classification problem to identify the type of fault. Then, the application of the concept of explainability of DNNs is explored with its particular application in FDD problem. The developed methodology is demonstrated on TEP with non-incipient faults. Incipient faults are faulty conditions where signal to noise ratio is small and have not been widely studied in the literature. To address the same, a hierarchical dynamic deep learning algorithm is developed specifically to address the issue of fault detection and diagnosis of incipient faults. One of the major drawbacks of both the methods described above is the availability of labeled data i.e. normal operation and faulty operation data. From an industrial point of view, most data in an industrial setting, especially for biochemical processes, is obtained during normal operation and faulty data may not be available or may be insufficient. Hence, we also develop an unsupervised DL approach for process monitoring. It involves a novel objective function and a NN architecture that is tailored to detect the faults effectively. The idea is to learn the distribution of normal operation data to differentiate among the fault conditions. In order to demonstrate the advantages of the proposed methodology for fault detection, systematic comparisons are conducted with Multiway Principal Component Analysis (MPCA) and Multiway Partial Least Squares (MPLS) on an industrial scale Penicillin Simulator. Past investigations reported that the variability in productivity in the Sanofi's Pertussis Vaccine Manufacturing process may be highly correlated to biological phenomena, i.e. oxidative stresses, that are not routinely monitored by the company. While the company monitors and stores a large amount of fermentation data it may not be sufficiently informative about the underlying phenomena affecting the level of productivity. Furthermore, since the addition of new sensors in pharmaceutical processes requires extensive and expensive validation and certification procedures, it is very important to assess the potential ability of a sensor to observe relevant phenomena before its actual adoption in the manufacturing environment. This motivates the study of the observability of the phenomena from available data. An algorithm is proposed to check the observability for the classification task from the observed data (measurements). The proposed methodology makes use of a Supervised AE to reduce the dimensionality of the inputs. Thereafter, a criterion on the distance between the samples is used to calculate the percentage of overlap between the defined classes. The proposed algorithm is tested on the benchmark Tennessee Eastman process and then applied to the industrial vaccine manufacturing process.
Chapter
A product, from demand to delivery, must go through two stages: design and manufacturing. At present, the more mature reliability ideas in the academic circle are all aimed at the design stage. However, there is not enough research on the reliability of manufacturing process. Manufacturing process reliability can be defined as the ability of equipment (production line, process system, etc.) to maintain the indicators that guarantee product quality under specified time and environment. Its basic connotation is to control and improve some process indicators in the equipment (production line, process system, etc.), so that the product can meet the specified reliability requirements.
Article
The study introduces three novel strategies for incorporating capabilities for dynamic modelling into multiblock regression methods by integrating sequentially orthogonalised partial least squares (SO‐PLS) with different dynamic modelling techniques. The study evaluates these strategies using synthetic datasets and an industrial example, comparing their performance in predictive ability, identification of process dynamics, and quantification of block contributions. Results suggest that these approaches can effectively model the dynamics with performance comparable to state‐of‐the‐art methods, providing, at the same time, insight into the dynamic order and block contributions. One of the strategies, sequentially orthogonalised dynamic augmented (SODA)–PLS, shows promise by ensuring that redundant information in the time dimension is not included, resulting in simpler and more easily interpretable dynamic models. These multiblock dynamic regression strategies have potential applications for improved process understanding in industrial settings, especially where multiple data sources and inherent time dynamics are present.
Article
To address the issue of underreporting faults in the detection of tiny faults by dynamic factor analysis (DFA), a novel fault detection and diagnosis method based on DFA‐sliding window combined with mean square error (DFA‐SWMSE) is proposed. Firstly, the data matrix is augmented by introducing time lag shifts. Secondly, factor analysis (FA) is applied to the augmented data matrix, achieving dimensionality reduction and feature extraction while retaining most of the original data's information. Then, the sliding window technique is applied to calculate the mean square error of the dimensionally reduced data, allowing for the monitoring of the system's current state and the detection of tiny faults. Finally, effective fault diagnosis is achieved through the analysis of fault factors and variable contributions. The proposed method is validated using a complex dynamic numerical example and a three‐tank system process named Sim3Tanks. This system has gained widespread application in the field of process fault detection due to its ability to simulate and generate various types of faults. The proposed method is compared with principal component analysis (PCA), dynamic principal component analysis (DPCA), PCA similarity factor (SPCA), FA, and DFA. The experimental results thoroughly validate the effectiveness of the proposed method in detecting and diagnosing tiny faults in dynamic processes.
Chapter
Full-text available
In recent years, the wastewater treatment field has undergone an instrumentation revolution. Thanks to increased efficiency of communication networks and extreme reductions in data storage costs, wastewater plants have entered the era of big data. Meanwhile, artificial intelligence and machine learning tools have enabled the extraction of valuable information from large-scale datasets. Despite this potential, the successful deployment of AI and automation depends on the quality of the data produced and the ability to analyze it usefully in large quantities. Metadata, including a quantification of the data quality, is often missing, so vast amounts of collected data quickly become useless. Ultimately, data-dependent decisions supported by machine learning and AI will not be possible without data readiness skills accounting for all the Vs of big data: volume, velocity, variety, and veracity. Metadata Collection and Organization in Wastewater Treatment and Wastewater Resource Recovery Systems provides recommendations to handle these challenges, and aims to clarify metadata concepts and provide advice on their practical implementation in water resource recovery facilities. This includes guidance on the best practices to collect, organize, and assess data and metadata, based on existing standards and state-of-the-art algorithmic tools. This Scientific and Technical Report offers a great starting point for improved data management and decision making, and will be of interest to a wide audience, including sensor technicians, operational staff, data management specialists, and plant managers. ISBN: 9781789061147 (Paperback) ISBN: 9781789061154 (eBook) ISBN: 9781789061161 (ePub)
Article
The service performance of an industrial bearing may deteriorate sharply after a sudden transition which is very difficult to be captured prior to its appearance. It is accordingly valuable to estimate this critical transition for avoiding serious catastrophes caused by the faulty bearing. To address this issue, an asynchronous gated recurrent network (AGRN) is proposed to estimate the critical transition of the bearing deterioration. A gated recurrent unit (GRU) is first developed to extract prognostic data as features using few early samples. Chi-square distributions of the squared prediction error of the residuals and the variation of principal components calculated from the features are fused as a statistic series, where the critical threshold is calculated adaptively. Another GRU is proposed to forecast deterioration process using fused statistic series. The forthcoming transition can be estimated by the process with the critical threshold at the initial operation. The present AGRN is evaluated by lifecycle experiments of three bearings, all from public benchmark datasets. Results show that the proposed method is robust in offering maintenance response time for bearings before transiting to critical failures.
Chapter
Rapid advances in imaging technology have made it possible to collect large amounts of image data in a cost‐effective manner. As a result, images are widely adopted for quality control purposes in the manufacturing industry. In image‐based quality control, images from a production process are collected over time, and the information such as product geometry or surface finish extracted from these images is used to determine whether the manufactured products satisfy the quality requirements. This is a challenging high‐velocity high‐volume big data problem. First, image streams normally generate image data at a high rate, so it is imperative to process each image quickly. Second, images often have complicated spatial structures such as edges and singularities, which render many traditional process monitoring methods inapplicable. Third, a typical image contains tens of thousands of pixels, so the data is high‐dimensional. It has been shown in the literature that conventional multivariate control charts have limited power of detecting process shifts when the data dimension is high. In this expository article, we divide the image monitoring applications into two categories: (i) images with deterministic features and (ii) images with stochastic features. We introduce representative methods in the two categories and discuss their potential to solve the problems in image monitoring. Some recent research in color image monitoring is discussed as well. Suggestions for future research and possible applications of image monitoring methods beyond industrial quality control are given in the end.
Article
Supervised learning methods, commonly used for process monitoring, require labeled historical datasets for normal condition as well for each faulty condition, which demands significant effort in data mining. This article proposes a methodology combining principal component analysis (PCA) with the k-means clustering algorithm to automate detection and diagnosis of fault from unlabeled data. The k-means algorithm is used for fault detection and diagnosis by exploiting PCA for data mining. PCA is able to precisely detect and diagnose fault from large set of unlabeled historical data. The proposed method improves the online diagnosis by using clustering algorithm in the monitoring stage. Based on the Euclidean distance between each dataset and cluster centroid of the training data, the k-means clustering algorithm is able to decide if the process is at a normal state or belongs to a particular faulty state. To illustrate the effectiveness of the methodology, the proposed method is applied to two industrial processes: (i) a separator unit from an offshore gas processing platform and (ii) a distillation column of a crude refining unit. The results show that the proposed method is able to avoid the data labeling exercise and is effective in detecting and diagnosing fault in large-scale industrial processes.
Article
Although Phase I analysis of multivariate processes has been extensively discussed, the discussion on techniques for Phase I monitoring of high‐dimensional processes is still limited. In high‐dimensional applications, it is common to observe that a large number of components but only a limited number of them change at the same time. The shifted components are often sparse and unknown a priori in practice. Motivated by this, this article studies Phase I monitoring of high‐dimensional process mean vectors under an unknown sparsity level of shifts. The basic idea of the proposed monitoring scheme is to first employ the false discovery rate procedure to estimate the sparsity level of mean shifts, and then to monitor the mean changes based on the maximum of the directional likelihood ratio statistics over all the possible shift directions. The comparison results based on extensive simulations favor the proposed monitoring scheme. A real example is presented to illustrate the implementation of the new monitoring scheme.
Article
Latent variable approaches are among the state-of-the-art methodologies for industrial process monitoring. Their robustness, ease of implementation, theoretical maturity, and computational efficiency make them privileged candidates for industrial adoption, when compared to other alternatives. As a ubiquitous feature of industrial processes, system dynamics is incorporated in latent variable frameworks in two distinct ways: explicitly, in terms of the observed variables, or implicitly, in the latent variables domain. This modeling aspect has been absent from discussion in the technical literature, but ends up having an impact in the monitoring performance. In this work, we focus our analysis in two state-of-the-art dynamic latent variable classes of models, that typify each modeling perspective: dynamic principal component analysis with decorrelated residuals (DPCA-DR) and Dynamic-Inner Canonical Correlation Analysis (DiCCA). The benchmark system consists of a Biodiesel production unit, where process degradation also takes place and several types of faults are realistically simulated (code publicly available). We have identified some poorly known limitations of these state-of-the-art methods, such as the reduced sensitivity due to fault adaptation and their total inability to handle integrating systems. Furthermore, the results obtained, complemented with a theoretical analysis of the two methods, robustly point to the existence of an advantage of using DPCA-DR for detecting sensor faults.
Article
Full-text available
The boiler is an essential energy conversion facility in a thermal power plant. One small malfunction or abnormal event will bring huge economic loss and casualties. Accurate and timely detection of abnormal events in boilers is crucial for the safe and economical operation of complex thermal power plants. Data-driven fault diagnosis methods based on statistical process monitoring technology have prevailed in thermal power plants, whereas the false alarm rates of those methods are relatively high. To work around this, this paper proposes a novel fault detection and identification method for furnace negative pressure system based on canonical variable analysis (CVA) and eXtreme Gradient Boosting improved by genetic algorithms (GA-XGBoost). First, CVA is used to reduce the data redundancy and construct the canonical residuals to measure the prediction ability of the state variables. Then, the fault detection model based on GA-XGBoost is schemed using the constructed canonical residual variables. Specially, GA is introduced to determine the optimal hyperparameters of XGBoost and speed up the convergence. Next, this paper presents a novel fault identification method based on the reconstructed contribution statistics, considering the contribution of state space, residual space and canonical residual space. Besides, the proposed statistics renders different weights to the state vectors, the residual vectors and the canonical residual vectors to improve the sensitivity of faulty variables. Finally, the real industrial data from a boiler furnace negative pressure system of a certain thermal power plant is used to demonstrate the ability of the proposed method. The result demonstrates that this method is accurate and efficient to detect and identify the faults of a true boiler.
Article
A novel process monitoring method based on convolutional neural network (CNN) is proposed and applied to detect faults in industrial process. By utilizing the CNN algorithm, cross-correlation and autocorrelation among variables are captured to establish a prediction model for each process variable to approximate the first-principle of physical/chemical relationships among different variables under normal operating conditions. When the process is operated under pre-set operating conditions, prediction residuals can be assumed as noise if a proper model is employed. Once process faults occur, the residuals will increase due to the changes of correlation among variables. A principal component analysis (PCA) model based on the residuals is established to realize process monitoring. By monitoring the changes in main feature of prediction residuals, the faults can be promptly detected. Case studies on a numerical nonlinear example and data from two industrial processes are presented to validate the performance of process monitoring based on CNN.
Article
Full-text available
Polyethylene (PE) is the most widespread polymer and also the most studied by macromolecular scientists. In 1990, polyethylene world production was estimated at approximately 25 × 10 tonnes per year: 65% of this was lowdensity, made in high-pressure reactors, and 35% was high-density homopolymer and linear low-density polyethylene produced in low-pressure reactors.
Article
Full-text available
This work considers the application of several related multivariate data analysis techniques to the monitoring and modeling of dynamic processes. Included are the method of Principal Components Analysis (PCA), and the regression technique Continuum Regression (CR), which encompasses Principal Components Regression (PCR), Partial Least Squares (PLS) and Multiple Linear Regression (MLR), all of which are based on eigenvector decompositions. It is shown that proper application of PCA to the measurements from multivariate processes can facilitate the detection of failed sensors and process upsets. The relationship between PCA and the state-space process model form is shown, providing a theoretical basis for the use of PCA in dynamic systems. For processes with more measurements than sta...
Article
A very common problem in mining industry today is to obtain frequent and reliable on-line measurements of process variables, i.e information that to some extent express the quality of the material in the different process streams. In paradox to this modern control systems deliver raw data, often several hundred every minute, which are the base for the operator to make decisions. As a concequense of this the operator is missing some important real-time information and at the same time get overloaded in other. This makes it impossible for the operator to run the plant at an optimal point, and in the economic cut and thrust of today’s mineral industry the necessity to be competitive has never been greater.
Article
Multivariate process control problems are inherently more difficult than univariate problems. It is not always clear what type of multivariate statistic should be used, and the most statistically powerful techniques do not indicate the cause(s) of a signal. On the other hand, separate controls on the individual variables are more easily interpretable but may be substantially less powerful, particularly in the face of appreciable correlation between the measures. Previous research has demonstrated the effectiveness of methods that capitalize on the likely nature of a departure from control. If only one variable is likely to undergo a shift in mean or variance then charting of each variable adjusted by regression for all others is particularly effective.
Article
When p correlated process characteristics are being measured simultaneously, often individual observations are initially collected. The process data are monitored and special causes of variation are identified in order to establish control and to obtain a “clean” reference sample to use as a basis in determining the control limits for future observations. One common method of constructing multivariate control charts is based on Hotelling's T² statistic. Currently, when a process is in the start-up stage and only individual observations are available, approximate F and chi-square distributions are used to construct the necessary multivariate control limits. These approximations are conservative in this situation. This article presents an exact method, based on the beta distribution, for constructing multivariate control limits at the start-up stage. An example from the chemical industry illustrates that this procedure is an improvement over the approximate techniques, especially when the number of subgroups is small.
Article
A new type of control charts, which can automatically select some assignable causes to control, thus reduce the range of searching and achieve much better results, are proposed here. Based on these newer charts, it also presents a theory of diagnosis with control charts which has been used successfully in various organizations in China to diagnose concretely what is responsible for the nonconforming parts, and reduce production costs improving the quality of products or services.
Article
Multivariate control charts using Hotelling's T2 statistic are popular and easy to use but interpreting their signals can be a problem. Identifying which characteristic or group of characteristics is out of control when the chart signals often necessitates an examination of the univariate charts for each variable. It is shown in this paper that the interpretation of a signal from a T2 statistic is greatly aided if the corresponding value is partitioned into independent parts. Information on which characteristic is significantly contributing to the signal is readily available from this decomposition.
Article
Principal Components Analysis (PCA) and Partial Least Squares (PLS) are two multivariate techniques that can be used to analyze data from processes with many sensors. PCA can be used to determine groups of variables that are descriptive of common process variations and upsets. Multivariate Statistical Process Control (MSPC) charts can be produced using the combinations of variables determined by PCA. PLS can be used to obtain calibrations between process variables that can be developed into process state predictors and sensor fault detectors. The authors demonstrate these techniques using data from a Liquid Fed Ceramic Melter Process and shows how they can be used to enhance the feedback control of processes with many sensors.
Article
Due to the lack of on-line sensors many important quality variables in pulp and paper manufacture are only measured infrequently and off-line in a quality control laboratory. Hence, there is a great incentive to build inferential models from plant data that are capable of predicting these quality variables on a more frequent basis. Such models can be used to monitor the process operation or, with suitable precautions, to build inferential controllers for these variables. In certain situations these models can also be used to improve our understanding of the effect of various process variables. Although the preferable way to collect process data for building such models is through statistically designed plant experiments, normal process operating records provide a good historical data base. In this paper we investigate the use of artificial neural networks and partial least squares regression to build empirical models for Kappa number using historical data from a continuous Kamyr digester. The basic ideas behind the two approaches will be presented and their advantages and disadvantages discussed. The predictive abilities of the resulting models and their limitations are evaluated using additional data from the digester.
Article
Although quality is often measured by the joint level of several correlated random variables, it is common to find in practice, control decisions based on individual Shewhart charts. Hotelling (1947) proposed the use of a multivariate control chart based on his T2T^2 statistic assuming the underlying distribution of random variables is multivariate normal. However the lack of a simple method of `out of control' variable selection has until now made the routine use of T2T^2 charts impractical. In this paper we provide a simple test for selecting out of control variables and an interpretation of T2T^2 values. The ideas are motivated by regarding quality control as an example of Discriminant Analysis. These ideas are illustrated on a bivariate example. Then an example of variable selection with four response variables and two methods is given and we conclude by making some recommendations for their routine use.
Article
In this study a comprehensive mathematical model for high pressure tubular ethylene/vinyl acetate copolymerization reactors is developed. A fairly general reaction mechanism is employed to describe the complex free radical kinetics of copolymerization. Using the method of moments, a system of differential equations is derived to describe the conservation of total mass, energy, momentum and the development of molecular weight and compositional changes in a two-zone jacketed tubular reactor. In addition, the model includes a number of correlations describing the variation of physical, thermodynamic and transport properties of the variation of physical, thermodynamic and transport properties of the reaction mixture as a function of temperature, pressure, composition and molecular weight distribution of polymer. Numerical solution of the reactor model equations permits a realistic calculation of monomer and initiator concentrations, temperature and pressure profiles, number and weight average molecular weights, copolymer composition as well as the number of short and long chain branches per 1000 carbon atoms under typical industrial operating conditions. Simulation results are presented showing the effects of ethylene, vinyl acetate, initiator and chain transfer agent on the polymer quality and reactor operation. The results of this investigation show that, in principle, we can obtain a copolymer product of desired molecular weight and composition by controlling the process variables. The procedure developed in this work is general and can lead to a more systematic design, optimization and control of industrial high pressure ethylene copolymerization reactors.
Article
Cause-selecting control charts use incoming quality measurements and out-going quality measurements in an attempt to distinguish between in-coming quality problems and problems in the current operation of a manufacturing process. We examine the assumptions underlying this useful type of chart and its relationship with the multivariate T2 chart. We propose using prediction limits with cause-selecting charts to improve their statistical performance.
Article
The methods described in an earlier article devoted to control methods for two related variables are extended to the case of more than two related variables. The concept of matrix notation is introduced because of the resultant simplification in multivariate analysis and the original two-variable problem is restated in matrix form. The method of principal components is introduced both as a method of charscterizing a multivariate process and as a control tool associated with control procedures. These methods are illustrated with a numerical example from the field of ballistic missiles. Approximate multivariate techniques, designed to simplify the administration of these control programs, are also discussed.
Article
The multivariate profile (MP) chart is a new control chart for simultaneous display of univariate and multivariate statistics. It is designed to analyze and display extended structures of statistical process control data for various cases of grouping, reference distribution, and use of nominal specifications. For each group of observations, the scaled deviations from reference values are portrayed together as a modified profile plot symbol. The vertical location of the symbol is determined by the multivariate distance of the vector of means from the reference values. The graphical display in the MP chart enjoys improved visual characteristics as compared with previously suggested methods. Moreover, the perceptual tasks required by the use of the MP chart provide higher accuracy in retrieving the quantitative information. This graphical display is used to display other combined univariate and multivariate statistics, such as measures of dispersion, principal components, and cumulative sums
Article
Principal components and factor analysis are two techniques which are finding increasing application to quality engineers who are concerned with processes with more than one response variable. In this, the first of a three-part series, the concept of principal components is introduced. Estimations, significance tests, and residual analysis are presented along with two numerical examples. Parts two and three will be found in succeeding issues.
Article
By means of factor analysis (FA) or principal components analysis (PCA) a matrix Y with the elements yik is approximated by the modelHere the parameters α, β and θ express the systematic part of the data yik, “signal,” and the residuals ∊ik express the “random” part, “noise.”When applying FA or PCA to a matrix of real data obtained, for example, by characterizing N chemical mixtures by M measured variables, one major problem is the estimation of the rank A of the matrix Y, i.e. the estimation of how much of the data yik is “signal” and how much is “noise.”Cross validation can be used to approach this problem. The matrix Y is partitioned and the rank A is determined so as to maximize the predictive properties of model (I) when the parameters are estimated on one part of the matrix Y and the prediction tested on another part of the matrix Y.
Article
That we cannot make all pieces of a given kind of product identically alike is accepted as a general truth. It follows that the qualities of pieces of the same kind of product differ among themselves, or, in other words, the quality of product must be expected to vary. The causes of this variability are, in general, unknown. The present paper presents a scientific basis for determining when we have gone as far as it is economically feasible to go in eliminating these unknown or chance causes of variability in the quality of a product. When this state has been reached, the product is said to be controlled because it is then possible to set up limits within which the quality may be expected to remain in the future. By securing control, we attain the five economic advantages discussed in Part III.
Article
The problem of using time-varying trajectory data measured on many process variables over the finite duration of a batch process is considered. Multiway principal-component analysis is used to compress the information contained in the data trajectories into low-dimensional spaces that describe the operation of past batches. This approach facilitates the analysis of operational and quality-control problems in past batches and allows for the development of multivariate statistical process control charts for on-line monitoring of the progress of new batches. Control limits for the proposed charts are developed using information from the historical reference distribution of past successful batches. The method is applied to data collected from an industrial batch polymerization reactor.
Article
The details of a general multiblock partial least squares (PLS) algorithm based on one originally presented by Wold et al. have been developed and are completely presented. The algorithm can handle most types of relationships between the blocks and constitutes a significant advancement in the modeling of complex chemical systems. The algorithm has been programmed in FORTRAN and has been tested on two simulated multiblock problems, a three-block and a five-block problem. The algorithm combines the score vectors for all blocks predicting a particular block into a new block. This new block is used to predict the predicted block in a manner analogous to the two-block PLS. In a similar manner if one block predicts more than one other block, the score vectors of all predicted blocks are combined to form a new block, which is then predicted by the predictor block as in the two-block PLS. Blocks that both predict and are predicted are treated in such a way that both of these roles can be taken into account when calculating interblock relationships. The results of numerical simulations indicate that the computer program is operating properly and that the multiblock PLS produces meaningful and consistent results.
Article
Process computers routinely collect hundreds to thousands of pieces of data from a multitude of plant sensors every few seconds. This has caused a 'data overload' and due to the lack of appropriate analyses very little is currently being done to utilize this wealth of information. Operating personnel typically use only a few variables to monitor the plant's performance. However, multivariate statistical methods such as PLS (Partial Least Squares or Projection to Latent Structures) and PCA (Principal Component Analysis) are capable of compressing the information down into low dimensional spaces which retain most of the information. Using this method of statistical data compression a multivariate monitoring procedure analogous to the univariate Shewart Chart has been developed to efficiently monitor the performance of large processes, and to rapidly detect and identify important process changes. This procedure is demonstrated using simulations of two processes, a fluidized bed reactor and an extractive distillation column.
Article
The performance of a product often depends on several quality characteristics. These characteristics may have interactions. In answering the question “Is the process in control?”, multivariate statistical process control methods take these interactions into account. In this paper, we review several of these multivariate methods and point out where to fill up gaps in the theory. The review includes multivariate control charts, multivariate CUSUM charts, a multivariate MMA chart, and multivariate process capability indices. The most important open question from a practical point of view is how to detect the variables that caused an out-of-control signal. Theoretically, the statistical properties of the methods should be investigated more profoundly.
Article
Shewhart charts are direct plots of the data and they have the potential to detect departures from statistical stability of unanticipated kinds. However, when one can identify in advance a kind of departure specifically feared, then a more sensitive detection statistic can be developed for that specific possibility. In this paper Cuscore statistics are developed for this purpose which can be used as an adjunct to the Shewhart chart. These statistics use an idea due to Box and Jenkins which is in turn an application of Fisher's score statistic. This article shows how the resulting procedures relate to Wald-Barnard sequential tests and to Cusum statistics which are special cases of Cuscore statistics. The ideas are illustrated by a number of examples. These concern the detection in a noisy environment of (a) an intermittent sine wave, (b) a change in slope of a line, (c) a change in an exponential smoothing constant and (d) a change from a stationary to a non-stationary state in a process record.
Article
Multivariate statistical procedures for monitoring the progress of batch processes are developed. The only information needed to exploit the procedures is a historical database of past successful batches. Multiway principal component analysis is used to extract the information in the multivariate trajectory data by projecting them onto low-dimensional spaces defined by the latent variables or principal components. This leads to simple monitoring charts, consistent with the philosophy of statistical process control, which are capable of tracking the progress of new batch runs and detecting the occurrence of observable upsets. The approach is contrasted with other approaches which use theoretical or knowledge-based models, and its potential is illustrated using a detailed simulation study of a semibatch reactor for the production of styrene-butadiene latex.
Article
Schemes for monitoring the operating performance of large continuous processes using multivariate statistical projection methods such as principal component analysis (PCA) and projection to latent structures (PLS) are extended to situations where the processes can be naturally blocked into subsections. The multiblock projection methods allow one to establish monitoring charts for the individual process subsections as well as for the entire process. When a special event or fault occurs in a subsection of the process, these multiblock methods can generally detect the event earlier and reveal the subsection within which the event has occurred. More detailed diagnostic methods based on interrogating the underlying PCA/PLS models are also developed. These methods show those process variables which are the main contributors to any deviations that have occurred, thereby allowing one to diagnose the cause of the event more easily. These ideas are demonstrated using detailed simulation studies on a multisection tubular reactor for the production of low-density polyethylene.
Article
In this paper we develop the mathematical and statistical structure of PLS regression. We show the PLS regression algorithm and how it can be interpreted in model building. The basic mathematical principles that lie behind two block PLS are depicted. We also show the statistical aspects of the PLS method when it is used for model building. Finally we show the structure of the PLS decompositions of the data matrices involved.
Article
Multivariate statistical procedures for the analysis and monitoring of batch processes have recently been proposed. These methods are based on multiway principal component analysis (PCA) and partial least squares (PLS), and the only information needed to exploit them is a historical database of past batches. In this paper, these procedures are extended to allow one to use not only the measured trajectory data on all the process variables and information on measured final quality variables but also information on initial conditions for the batch such as raw material properties, initial ingredient charges and discrete operating conditions. Multiblock multiway projection methods are used to extract the information in the batch set-up data and in the multivariate trajectory data, by projecting them onto low dimensional spaces defined by the latent variables or principal components. This leads to simple monitoring charts, consistent with the philosophy of SPC, which are capable of tracking the progress of new batch runs and detecting the occurrence of observable upsets. Powerful procedures for diagnosing assignable causes for the occurrence of a fault by interrogating the underlying latent variable model for the contributions of the variables to the observed deviation are also presented. The approach is illustrated with databases from two industrial batch polymerization processes.
Article
With process computers routinely collecting measurements on large numbers of process variables, multivariate statistical methods for the analysis, monitoring and diagnosis of process operating performance have received increasing attention. Extensions of traditional univariate Shewhart, CUSUM and EWMA control charts to multivariate quality control situations are based on Hotelling's T2 statistic. Recent approaches to multivariate statistical process control which utilize not only product quality data (Y), but also all of the available process variable data (X) are based on multivariate statistical projection methods (Principal Component Analysis (PCA) and Partial Least Squares (PLS)). This paper gives an overview of these methods, and their use for the statistical process control of both continuous and batch multivariate processes. Examples are provided of their use for analysing the operations of a mineral processing plant, for on-line monitoring and fault diagnosis of a continuous polymerization process and for the on-line monitoring of an industrial batch polymerization reactor.
Article
A very important problem in industrial applications of PCA and PLS models, such as process modelling or monitoring, is the estimation of scores when the observation vector has missing measurements. The alternative of suspending the application until all measurements are available is usually unacceptable. The problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observation vectors are incomplete. Building the model with incomplete observations is not treated here, although the analysis given in this paper provides considerable insight into this problem. Several methods for estimating scores from data with missing measurements are presented, and analysed: a method, termed single component projection, derived from the NIPALS algorithm for model building with missing data; a method of projection to the model plane; and data replacement by the conditional mean. Expressions are developed for the error in the scores calculated by each method. The error analysis is illustrated using simulated data sets designed to highlight problem situations. A larger industrial data set is also used to compare the approaches. In general, all the methods perform reasonable well with moderate amounts of missing data (up to 20% of the measurements). However, in extreme cases where critical combinations of measurements are missing, the conditional mean replacement method is generally superior to the other approaches.
Article
Skagerberg, B., MacGregor, J.F. and Kiparisides, C., 1992. Multivariate data analysis applied to low density polyethylene reactors, Chemometrics and Intelligent Laboratory Systems, 14: 341–356.In this paper we discuss how partial least squares regression (PLS) can be applied to the analysis of complex process data. PLS models are here used to: (i) accomplish a better understanding of the underlying relations of the process; (ii) monitor the performance of the process by means of multivariate control charts; and (iii) build predictive models for inferential control. The strategies for applying PLS to process data are described in detail and illustrated by an example in which low-density polyethylene production is simulated.
Article
With process computers routinely collecting measurements on large numbers of process variables, multivariate statistical methods for the analysis, monitoring and diagnosis of process operating performance have received increasing attention. Recent approaches to multivariate statistical process control, which utilize not only the product quality data (as traditional approaches have done) but also the available process data, are based on multivariate projection methods (Principal Component Analysis, PCA, and Partial Least Squares, PLS). These methods have been rapidly accepted and utilized by industry. This paper gives a brief overview of these methods and illustrates their use for process monitoring and fault diagnosis with applications to a wide range of industrial batch and continuous processes. Emphasis is placed on the practical issues that arise when dealing with process data. Several of these issues are discussed and solutions are suggested for a successful outcome of the application of these methods in an industrial setting.
Article
A tutorial on the partial least-squares (PLS) regression method is provided. Weak points in some other regression methods are outlined and PLS is developed as a remedy for those weaknesses. An algorithm for a predictive PLS and some practical hints for its use are given.
Article
Step-down testing procedures for the Hotelling T2T^2 problem are considered. It is shown that the classical step-down procedure proposed by S. N. Roy, R. E. Bargmann and J. Roy is inadmissible among invariant procedures (for a suitable invariance group) in many situations, including all those for which more than two steps are contemplated. It is also noted that in most cases, the power of the step-down procedure decreases in at least one of the noncentrality parameters over part of the parameter space. Finally, several alternative admissible step-down procedures are proposed.
Article
Project (M. Eng.)--McMaster University, 1995. Includes bibliographical references (leaves 62-64).
Groningen Theses in Economics, Management and Organization
  • S J Wierda
Private Communication
  • P Nomikos
Partial Least Squares Modelling of Process Data at LKAB: Predicting Chemical Assays in Iron Ore for Process Control
  • K Tano
  • P.-O Samskog
  • J.-C Gärde
  • B Skagerberg
Multivariate Data Analysis Using PCA/PLS with Extensions to Performance Monitoring at Dofasco”. Presented at the Advanced Modelling and Control Seminar by the Association of Iron and Steel Engineers
  • V Vaculik
Analysis of Industrial FCCU data using PCA and
  • C F Slama
Multivariate Statistical Process Control. Groningen Theses in Economics, Management and Organization. Wolters-Noordhoff
  • S J Wierda