Article

Statistical Monitoring of Multivariable Dynamic Process with State-Space Models

Wiley
Aiche Journal
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Industrial continuous processes may have a large number of process variables and are usually operated for extended periods at fixed operating points under closed-loop control, yielding process measurements that are autocorrelated, cross-correlated, and collinear. A statistical process monitoring (SPM) method based on multivariate statistics and system theory is introduced to monitor the variability of such processes. The statistical model that describes the in-control variability is based on a canonical-variate (CV) state-space model that is an equivalent representation of a vector autoregressive moving-average time-series model. The CV state variables obtained from the state-space model are linear combinations of the past process measurements that explain the variability of the future measurements the most. Because of this distinctive feature, the CV state variables are regarded as the principal dynamic directions A T2 statistic based on the CV state variables is used for developing an SPM procedure. Simple examples based on simulated data and an experimental application based on a high-temperature short-time milk pasteurization process illustrate advantages of the proposed SPM method.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Besides, v is generally set to be a larger number when the tested system is nonlinear, so that a better linear approximation of the actual nonlinear relationship can be obtained. Similar to many existing methods [27], [44], v is determined using the autocorrelation analysis as described in [44]. ...
... Besides, v is generally set to be a larger number when the tested system is nonlinear, so that a better linear approximation of the actual nonlinear relationship can be obtained. Similar to many existing methods [27], [44], v is determined using the autocorrelation analysis as described in [44]. ...
... Specifically, a total of 12 features are retained, and the CPV of these features is about 80%. According to the autocorrelation analysis [44], the numbers of time lags in DPCA and CVDA are set to 3. In the dynamic distributed monitoring strategy [38], the process variables are divided into three blocks using sparse CA, and the global monitoring indices are adopted to provide a comparison. Based on the experimental results in Table I, the numbers of retained matrices in each convolutional layer are set to 12 for both M-PCA and M-CVA. ...
Article
Modern industrial plants generally consist of multiple manufacturing units, and the local correlation within each unit can be used to effectively alleviate the effect of spurious correlation and meticulously reflect the operation status of the process system. Therefore, the local correlation, which is called spatial information here, should also be taken into consideration when developing the monitoring model. In this study, a cascaded monitoring network (MoniNet) method is proposed to develop the monitoring model with concurrent analytics of temporal and spatial information. By implementing convolutional operation to each variable, the temporal information that reveals dynamic correlation of process data and spatial information that reflects local characteristics within individual operation unit can be extracted simultaneously. For each convolutional feature, a submodel is developed and then all the submodels are integrated to generate a final monitoring model. Based on the developed model, the operation status of the newly collected sample can be identified by comparing the calculated statistics with their corresponding control limits. Similar to the convolutional neural network (CNN), the MoniNet can also expand its receptive field and capture deeper information by adding more convolutional layers. Besides, the filter selection and submodel development in MoniNet can be replaced to generalize the proposed network to many existing monitoring strategies. The performance of the proposed method is validated using two real industrial processes. The illustration results show that the proposed method can effectively detect process anomalies by concurrent analytics of temporal and spatial information.
... Here, Σ ZZ and Σ YY are self-covariance matrices of input and output matrices, Σ ZY is a cross-covariance matrix between input and output matrices, and k represents the number of retained singular values; its value can be determined using the Cumulative Percentage Value (CPV) method [33]. Once projection matrices J and L are obtained, they are transferred for backup to the online monitoring stage. ...
... Reactor pressure kPa XMEAS (8) Reactor level % XMEAS (9) Reactor temperature • C XMEAS(10) Discharge rate (stream 9) km 3 /h XMEAS (11) Product separator temperature • C XMEAS (12) Product separator level % XMEAS (13) Product separator pressure kPa XMEAS (14) Product separator bottom flow rate (stream 10) m 3 /h XMEAS (15) Stripper tower level % XMEAS (16) Stripper tower level pressure kPa XMEAS (17) Stripper tower bottom flow rate (stream 11) m 3 /h XMEAS (18) Stripper tower temperature • C XMEAS (19) Stripper tower flow rate kg/h XMEAS (20) Compressor power kW XMEAS (21) Reactor cooling water outlet temperature • C XMEAS (22) Separator cooling water outlet temperature • C XMEAS (23) Component A (stream 6) mol% XMEAS (24) Component B (stream 6) mol% XMEAS (25) Component C (stream 6) mol% XMEAS (26) Component D (stream 6) mol% XMEAS (27) Component E (stream 6) mol% XMEAS (28) Component F (stream 6) mol% XMEAS (29) Component A (stream 9) mol% XMEAS (30) Component B (stream 9) mol% XMEAS (31) Component C (stream 9) mol% XMEAS (32) Component D (stream 9) mol% XMEAS (33) Component E (stream 9) mol% XMEAS (34) Component F (stream 9) mol% XMEAS (35) Component G (stream 9) mol% XMEAS (36) Component H (stream 9) mol% XMEAS (37) Component (5) Compressor recirculation valve % XMV (6) Discharge valve % XMV (7) Separator tank liquid flow rate m 3 /h XMV (8) Stripper tower liquid product flow rate m 3 /h XMV (9) Stripper tower water flow valve % XMV (10) Reactor cooling water flow rate m 3 /h XMV (11) Condenser cooling water flow rate m 3 /h ...
Article
Full-text available
In intelligent process monitoring and fault detection of the modern process industry, conventional methods mostly consider singular characteristics of systems. To tackle the problem of suboptimal incipient fault detection in nonlinear dynamic systems with non-Gaussian distributed data, this paper proposes a methodology named Gap-Mixed Kernel-Dynamic Canonical Correlation Analysis. Initially, the Gap metric is employed for data preprocessing, followed by fault detection utilizing the Mixed Kernel-Dynamic Canonical Correlation Analysis. Ultimately, fault identification is conducted through a contribution method based on the T2 statistic. Furthermore, a comparative analysis was conducted using Canonical Variate Analysis, Dynamic Canonical Correlation Analysis, and Mixed Kernel-Dynamic Canonical Correlation Analysis on the Tennessee Eastman process. Experimental results indicate varying degrees of improvements in the detection rate, false alarm rate, missed detection rate, and detection time compared to the comparative methods, demonstrating the industrial value and academic significance of the method.
... Besides, the value of parameter k is positively associated with the sampling rate. In the proposed method, the autocorrelation analysis is adopted for determining the number of time lags k, and the related information can be found in Ref. [32]. ...
... Specifically, they are set to 10, 16, and 16 in this experiment according to Ref. [23]. As for the proposed PM-MVS method, the number of time lags is set to 10, which is determined using autocorrelation analysis in Ref. [32]. Besides, the number of retained source signals is identified using the ratio of simplex volume, whose detailed information is presented in Fig. 7. ...
... Canonical correlation analysis (CCA) [35] is a widely used latent variable model for process monitoring. Several CCA-based dynamic process monitoring methods are proposed [36,37]. The potential theoretical difference between these CCA-based dynamic methods and LTSFA is discussed. ...
... The revised version is used in this study [46]. It contains 12 manipulated variables (XMV (1-12)) and 41 measurement variables which consist of 22 process variables (XMEAS ) and 19 quality variables (XMEAS (23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)). In this paper, we use 11 manipulated variables (XMV (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)) and all 22 process variables. ...
Article
Modern industrial processes are large-scale, highly complex systems with many units and equipment. The complex flow of mass and energy, as well as the compensation effects of closed-loop control systems, cause significant cross-correlation and autocorrelation between process variables. To operate the process systems stably and efficiently, it is crucial to uncover the inherent characteristics of both the variance structure and dynamic relationship. Compared with the original slow feature analysis (SFA) that can only model the one-step time dependence, long-term dependency slow feature analysis (LTSFA) proposed in this paper can understand the longer-term dynamics by an explicit expression of latent states of the process. An iterative algorithm is developed for the model parameter optimization and its convergency is proved. The model properties and theoretical comparison with existing dynamic models are presented. A process monitoring strategy is designed based on LTSFA. The results of two simulation case studies show that LTSFA has better system dynamics extraction capability, which reduces the violation rate of the residual for the 95% confidence interval from 40.4% to 3.2% compared to the original SFA, and can disentangle the quickly- and slowly-varying features. Several typical disturbances can be correctly identified by LTSFA. The monitoring results on the Tennessee Eastman process benchmark show the overall advantages of the proposed method both in the dynamic and nominal deviation detection and the monitoring accuracy
... Nevertheless, given the above T 2 metric includes lots of noise and maybe cause poor robustness performance, Negiz and Cinar [34] put forward a new statistical metric. The new T 2 metric is divided into inside-state statistics and out-of-state statistics for the milk pasteurization process. ...
... Let (33) equal to (34), the following equation can be expressed ...
Article
It is crucial to adopt an efficient process monitoring technique that ensures process operation safety and improves product quality. Toward this endeavor, a modified canonical variate analysis based on dynamic kernel decomposition (DKDCVA) approach is proposed for dynamic nonlinear process quality monitoring. Different from traditional canonical variate analysis and its expansive kernel methods, the chief intention of the our proposed method is to establish a partial-correlation nonlinear model between input dynamic kernel latent variables and output variables, and ensures the extracted feature information can be maximized. More specifically, the dynamic nonlinear model is orthogonally decomposed to obtain quality-related and independent subspace by singular value decomposition. From the perspective of quality monitoring, Hankel matrices of past and future vectors of quality-related subspace are derived in detail, and corresponding statistical metrics are constructed. Furthermore, given the existence of non-Gaussian process variables, kernel density estimation evaluates the upper control limit instead of traditional control limits. Finally, the experimental results conducted on a simple numerical example, the Tennessee Eastman process and the hot strip mill process indicate that the DKDCVA approach can be preferable to monitor abnormal operation for the dynamic nonlinear process.
... Venkatasubramanian [2] reviewed many statistical process control methods based on PCA and PLS and summarised their applications in various fields such as smelting, electrical engineering and aerospace. However, the use of traditional PCA and PLS methods is based on the assumption that the manufacturing process follows a Gaussian distribution, and scientists Negiza and Cinara [3] found that these PCA and PLS methods fail when the manufacturing process data are not Gaussian distributed. However, in reality, a large number of manufacturing process data does not obey the Gaussian distribution, and at the same time, these data often have non-linear correlation, manufacturing process data dynamics and other characteristics, which in all aspects makes the performance of PCA and other statistical process methods in many cases impaired or even not applicable. ...
Article
Full-text available
Nowadays, scholars have explored many methods for manufacturing process quality control. This paper presents the current research progress in these two areas in manufacturing process quality control by exploring the two areas of process monitoring and rule extraction, and analyzes the strengths and weaknesses of the current research. In the review of process monitoring, the paper introduces a variety of research methods based on different data characteristics of the manufacturing process, which makes the differentiation of different data characteristics and the applicability of the methods more obvious. In the research review on rule extraction, this paper will introduce the characteristics of excellent rule extraction algorithms and introduce various rule extraction algorithms and their applications. Finally, this paper expects to combine the two to put forward some suggestions on future research directions.
... , λ n ). The cumulative percentage value (CPV) method can be utilized to determine the system order n [31]. It is important to highlight that ...
Article
Full-text available
Incipient fault diagnosis is particularly important in process industrial systems, as its early detection helps to prevent major accidents. Against this background, this study proposes a combined method of mixed kernel principal components analysis and dynamic canonical correlation analysis (MK-DCCA). The robust generalization performance of this approach is demonstrated through experimental validation on a randomly generated dataset. Furthermore, comparative experiments were conducted on a CSTR Simulink model, comparing the MK-DCCA method with DCCA and DCVA methods, demonstrating its excellent detection performance for incipient faults in nonlinear and dynamic systems. Meanwhile, fault identification experiments were conducted, validating the high accuracy of the fault identification method based on contribution. The experimental findings demonstrate that the method possesses a certain industrial significance and academic relevance.
... Then, statistics are employed as the dissimilarity measure in each subspace to determine a control limit for normal variations. The most classical methods include principal component analysis (PCA), partial least squares [8], canonical variate analysis [9], and independent component analysis [10], which are applicable to monitor multivariate linear processes. To handle nonlinear processes, numerous variants of these MSPM methods, such as kernel PCA [11], have been developed by mapping original data into a higher-dimensional linearly separable space. ...
Article
Full-text available
The increasing scale of industrial processes has significantly motivated the development of data-driven fault detection and diagnosis techniques. The selection of representative fault-free modeling data from operation history is an important prerequisite to establishing a long-term effective process monitoring model. However, industrial data are characterized by a high dimension and multimode, and are also contaminated with both outliers and frequent random disturbances, making automatic modeling data selection a great challenge in industrial applications. In this work, an information entropy-based automatic selection strategy for modeling data is proposed, based on which a general real-time process monitoring framework is developed for a large-scale industrial methanol to olefin unit with multiple operating conditions. Modeling data representing normal operating conditions are automatically selected with only a few manually defined normal samples. A long-term effective process monitoring model is then established based on a multi-layer autoencoder, through which unexpected disturbances in real-time operation can be detected early and the root cause can be preliminarily diagnosed by contribution plots. The adjustment of operating conditions has also been considered through a model update strategy. Details of the proposed data selection strategy and modeling process have been provided to facilitate the industrial application of process monitoring systems by other researchers or companies.
... ; a and b denote the time windows of past and future data, which can be determined by auto-correlation analysis [27]; I k ∈ R denotes the input current of the battery. For a training set with L observations, the past and future matrices can be constructed as: ...
Article
Full-text available
In this research, a spatio-temporal inference system is proposed to detect and locate thermal abnormalities of battery systems. The proposed spatio-temporal inference system consists of three modules: spatio-temporal processing module, abnormality inference module, and spatial inference module. Based on the distributed temperatures on the battery system, the monitoring statistic can be developed in the spatio-temporal processing module. The abnormality inference module is constructed to detect the abnormality based on the derived statistic index. Then, the spatial Bayes model is designed to estimate the abnormality location. The Bayes risk analysis indicates that the proposed method has a bounded error. Experiments on a lithium-ion (Li-ion) battery cell and a battery pack demonstrate that the proposed spatio-temporal inference system can detect and locate the internal short circuit (ISC) fault before it develops into a thermal runaway.
... Even for traditional industrial processes, routine operation data are collected with multiple sensors for multivariable control and monitoring, [1][2][3][4] but the dimension of the dynamics in the data is low since the dynamics are far from being fully excited. Unfortunately, traditional data analytics such as principal component analysis (PCA) have largely ignored the dynamics in the data. ...
Article
Full-text available
In this article, a novel latent vector autoregressive (LaVAR) modeling algorithm with a canonical correlation analysis (CCA) objective is proposed to estimate a fully‐interacting reduced‐dimensional dynamic model. This algorithm is an advancement of the dynamic inner canonical correlation analysis (DiCCA) algorithm, which builds univariate latent autoregressive models that are noninteracting. The dynamic latent variable scores of the proposed algorithm are guaranteed to be orthogonal with a descending order of predictability, retaining the properties of DiCCA. Further, the LaVAR‐CCA algorithm solves multiple latent variables simultaneously with a statistical interpretation of the profile likelihood. The Lorenz oscillator with noisy measurements and an application case study on an industrial dataset are used to illustrate the superiority of the proposed algorithm. The reduced‐dimensional latent dynamic model has numerous potential applications for prediction, feature analysis, and diagnosis of systems with rich measurements.
... Kano et al. proposed moving window PCA and applied it to online process monitoring by monitoring changes in the correlation structure of process variables [22]. Considering that significant serial correlation contained in industrial process data cannot be extracted by PCA and PLS, canonical variate analysis (CVA) was employed to generate accurate state-space models from serially correlated data [39][40][41]. In addition, industrial data usually do not conform to multivariate normal distribution, which will influence the determination of the control limits of the monitoring statistics in PCA and PLS. ...
Article
Full-text available
Safe and stable operation plays an important role in the chemical industry. Fault detection and diagnosis (FDD) make it possible to identify abnormal process deviations early and assist operators in taking proper action against fault propagation. After decades of development, data-driven process monitoring technologies have gradually attracted attention from process industries. Although many promising FDD methods have been proposed from both academia and industry, challenges remain due to the complex characteristics of industrial data. In this work, classical and recent research on data-driven process monitoring methods is reviewed from the perspective of characterizing and mining industrial data. The implementation framework of data-driven process monitoring methods is first introduced. State of art of process monitoring methods corresponding to common industrial data characteristics are then reviewed. Finally, the challenges and possible solutions for actual industrial applications are discussed.
... In real industrial applications, it is generally determined by the number of variables collected from the same unit. q can be regarded as the number of time lags which is commonly used in dynamic PCA and dynamic ICA [38]. Specifically, it is determined using the autocorrelation analysis as described in Ref. [39]. ...
Article
Many conventional quality prediction models are directly developed based on the easy-to-measure variables, and thus the local information within individual unit may be buried by information of other units. In this study, a cascaded regression network (RegNet) is proposed to solve the aforementioned issue. Specifically, the features which are adopted to develop RegNet model are extracted in two steps, including variable-wise and unit-wise feature extractions. In variable-wise feature extraction, several adjacent variables and their corresponding time lags are integrated using convolutional filter. By this means, both local correlation and temporal information within each unit can be preserved. In the unit-wise feature extraction, the local information of each unit is adopted to further explore the global correlation between different operation units. Based on the obtained global features, a fully connected layer is designed to calculate the regression weight of the quality prediction model. It is noted that the architecture of RegNet can be readily generalized to many existing methods by replacing the convolutional filter and fully connected layer. The performance of the proposed method is illustrated using a simulated process and two real industrial processes, and the experimental results show that it can provide reliable prediction results for industrial applications.
... Canonical variable analysis is a linear dimensionality reduction technique widely used in multivariate statistical methods. It was first applied to fault detection by Negiz and Cinar in 1997 [9]. In the process of fault diagnosis of chiller, it needs to expand the dimension of input and output matrix of chiller to cover the historical data set and future data set of chiller in the process, maximize the correlation between the historical data set and future data set, and extract the main information of the process. ...
Article
Full-text available
The chiller plays an important role for providing comfort environment. Once, the incipient faults are missed, they may develop to be fatal faults and further lead to equipment damage and casualties. Nevertheless, the incipient fault in the running process of the chiller are easily neglected in noise. Moreover, the running variables of the chiller have dynamic characteristics, and each process variable is correlated with each other in each process, and a certain variable is interrelated at different times. To tackle these problems, we develop an improved canonical variable analysis (ICVA) method to detect the incipient fault in chiller units with significant dynamic characteristics. In the proposed method, the exponentially weighted moving average (EWMA) is first applied to filter the data. Then the canonical variable analysis is used to detect the fault. In this paper, ASHRAE RP-1043 experimental data are used to verify the proposed method. Simulation results show that compared with traditional CVA method, ICVA method has a higher fault detection rate for incipient fault.
... Based on extra instrument variables, Li and Qin [12] proposed indirect DPCA (IDPCA) to diminish the effect of noise. Negiz and Cinar [13] built a state-space model for process monitoring. The objective of their work is to search for a dynamic relationship between a future data set and a past data set. ...
Article
Full-text available
Dynamic principal component analysis (DPCA) and its nonlinear extension, dynamic kernel principal component analysis (DKPCA), are widely used in the monitoring of dynamic multivariate processes. In traditional DPCA and DKPCA, extended vectors through concatenating current process data point and a certain number of previous process data points are utilized for feature extraction. The dynamic relations among different variables are fixed in the extended vectors, i.e. the adoption of the dynamic information is not adaptively learned from raw process data. Although DKPCA utilizes a kernel function to handle dynamic and (or) nonlinear information, the prefixed kernel function and the associated parameters cannot be most effective for characterizing the dynamic relations among different process variables. To address these problems, this paper proposes a novel nonlinear dynamic method, called dynamic neural orthogonal mapping (DNOM), which consists of data dynamic extension, a nonlinear feedforward neural network, and an orthogonal mapping matrix. Through backpropagation and Eigen decomposition (ED) technique, DNOM can be optimized to extract key low-dimensional features from original high-dimensional data. The advantages of DNOM are demonstrated by both theoretical analysis and extensive experimental results on the Tennessee Eastman (TE) benchmark process. The results on the TE benchmark process show the superiority of DNOM in terms of missed detection rate and false alarm rate. The source codes of DNOM can be found in https://github.com/htz-ecust/DNOM.
... The authors of (Larimore et al. 1993) proposed a state-space method using canonical variable states for modelling linear and nonlinear time series. Negiz and Cinar (1997) used a CVA-based subspace-identification approach to describe a hightemperature short-time milk-pasteurization process. CVA was utilized in to predict performance deterioration and estimate the behaviour of a system under faulty operating conditions. ...
Chapter
Full-text available
The participation of multiple stakeholders in the innovation process is one of the assumptions of Responsible Innovation (RI). This partnership aims to broaden visions, in order to generate debate and engagement. The present study’s aim, based on a meta-synthesis, is to evaluate how stakeholder participation in RI takes place. Thus, qualitative case studies were identified that investigated the participation of stakeholders in responsible innovation. Those studies have shown that, although participation is achieved when innovation is already in the process of being implemented or already inserted in the market, it serves as a basis for modifications, both in the developed product and in the paradigm of innovation. Based on the concept of Responsible Innovation and its dimensions, the role of stakeholders in the context of innovation is restricted to consultative participation. The agents that stimulate their participation are academic researchers and researchers linked to multi-institutional projects. We have noticed that the studies favour the participation of multiple stakeholders like policymakers (including funding agencies, regulators and executives), business/industry representatives (internal or outsourced innovation departments and/or some R & D base), civil society organizations (such as foundations, associations, social movements, community organizations, charities, media), as well as researchers and innovators (affiliates of various institutions and organizations at different levels). One point that stands out is the change of vision of one stakeholder over the other. Although the difficulty is pointed out in the dialogue, it is possible, by inserting them collectively into the discussion, that the different stakeholders will develop a better understanding of the different points of view. The present study has discovered that RI is treated as a result and not as a process.
Article
Full-text available
Rapid advancements in technology and Artificial Intelligence have increased the volume of scientific research, making it challenging for researchers and scholars to keep pace with the evolving literature and state‐of‐the‐art techniques and methods. Traditional review papers offer a way to mitigate these difficulties but are often time‐consuming and labor‐intensive. This article introduces a novel AI‐assisted narrative review methodology that integrates advanced text retrieval and visualization techniques, enhanced with geometric features, to address this. The proposed approach relies on the automatic identification of research topics/clusters within a large different document corpus of different time periods. This approach not only facilitates the systematic exploration of trends over time but also serves as a valuable adjunct, enabling experts to focus on specific, homogeneous areas within scientific fields/clusters. Initially, the methodology in its generality and mapping of the evolution of emerging topics are described, revealing the temporal dynamics and interconnections within the literature of time series anomalies. Subsequently, the proposed method is applied to time series data and an in‐depth exploration of the identified dominant cluster is presented. The cluster involves advanced techniques and models for anomaly detection in time series analysis. Focusing on such a homogeneous subfield enables the derivation of a wealth of characteristics and outcomes regarding the evolution of this topic, revealing its temporal dynamics and trends. The review process demonstrates the effectiveness of the proposed AI‐driven approach in literature reviews and provides researchers with a powerful tool to synthesize and interpret complex, dynamically changing, advanced scientific fields.
Article
Due to various reasons, outliers, ambient noise and missing data inevitably exist in the industrial processes, and thus the robustness is important when establishing monitoring models. In this study, a robust dissimilarity analytics model (RDAM) is established with Laplace distribution to detect process anomalies in noisy environment. Because of the heavy-tailed characteristic of Laplace distribution, the proposed RDAM method is more robust to ambient noise and outliers when compared to Gaussian distribution-based models. Besides, the missing data problem is also considered and solved in the model development procedure. Using the variational Bayesian inference, the model parameters and latent variables of the RDAM model can be estimated. After that, a monitoring strategy is designed based on the obtained results with both static and dynamic statistics. By this means, both the static deviation of the current sample and the temporal correlation within the process data can be effectively revealed. A simulated example and a real low-pressure heater process are adopted to illustrate the performance of the proposed RDAM method. Specifically, the proposed RDAM method is robust to the ambient noise and missing values, and it has better detection sensitivity for the process anomalies than the selected comparison methods.
Article
Online monitoring is essential for the safety of Industrial IoT (IIoT). Most existing methods seek low-dimensional representations to assess the overall operation status. However, we reveal that the existing methods face some unsolved and interrelated limitations, including coarse granularity, tight boundary, and weak extrapolation. This article proposes a federated episodic learning method for IIoT monitoring that simultaneously enhances interpretability, robustness, and extrapolation. The method centers on a dual-level normality bank with a normality contrastive separation network and an episodic training strategy, designed within a cloud-edge collaborative manner. To solve the coarse granularity issue, we propose a dual-level normality bank from both condition-level and variable-level perspectives, which facilitates fine-grained pattern matching and improves interpretability. To address the tight boundary issue, we propose a normality contrastive separation network, which utilizes prior fault knowledge to construct negative samples and encourages models to focus on fault-related representations, thus improving robustness. To tackle the weak extrapolation issue, we design an episodic training strategy, which develops a client alternation policy to construct refining sets and makes inferences using patterns from adjacent working conditions. It fully exploits the relation of adjacent working conditions and improves extrapolation for unseen conditions with theoretical guarantees. Extensive experiments on two clusters validate the method’s superior interpretability, robustness, and extrapolation.
Article
Gas–liquid two-phase flow is a complex dynamic and nonlinear process that is widely encountered in many process industries. Accurate flow state identification is crucial for ensuring operation safety and economic benefits. However, obtaining training samples for certain flow states can be difficult due to safety requirements and high costs. Therefore, a zero-shot learning (ZSL) based flow state identification strategy is proposed from the perspective of attribute description and attribute transfer, in which the common attribute space is constructed by the semantic description of flow state categories. The attribute-relevant features are extracted by the proposed supervised deep slow and steady feature analysis (SD-S 2^{\mathbf{2}} FA) under the supervision of attributes. In SD-S 2^{\mathbf{2}} FA, an extended Siamese network is designed to extract slow and steady features (S 2^{\mathbf{2}} Fs), in which three 1D convolutional neural networks (1D-CNN) represent the nonlinear feature embedding function, and the Siamese architecture can capture the long-term temporal coherence. Since the state attributes are shared by all the flow states, the identification for unseen flow states can be realized by attribute prediction and attribute transfer. The effectiveness and superiority of the proposed method is demonstrated through the gas–liquid two-phase flow experiment.
Article
In this work, fault detection and isolation (FDI) of industrial automation systems with a closed-loop configuration is under consideration. Specifically, the mean of the input and output vectors is time-varying with the variation of the reference vectors. This brings a great challenge to the existing multivariate analysis-based methods, which are lack of consideration of closed-loop dynamics. To this end, a stable image representation (SIR)-aided dynamic canonical correlation analysis (SD-CCA)-based FDI method is proposed. In this method, residual generation is performed in two steps. Residual vectors of the closed-loop dynamic are first generated based on the identified data-driven SIR to remove the time-varying mean. Then, an SD-CCA-based residual generator is established, which enhances the fault detectability by considering the correlation between zero-mean input and output. Finally, by maximizing the fault direction angle, an optimal fault isolation method based on the fault direction angle of SD-CCA is proposed. It is followed by a sensitivity analysis of the proposed method, furthermore, whose performance is evaluated by comparing with several state-of-the-art methods on a numerical simulation and a real chiller system. Results show that the proposed method has a better FDI performance than the compared methods.
Chapter
Incipient fault detection is particularly important in process industrial systems, as its early detection helps to prevent major accidents. Against this background, this study proposes a combined method of Mixed Kernel Principal Component Analysis and Dynamic Canonical Correlation Analysis (MK-DCCA). Comparative experiments were conducted on a CSTR Simulink model, comparing the MK-DCCA method with DCCA and DCVA methods, demonstrating its excellent monitoring performance in detecting incipient faults in nonlinear dynamic systems. Furthermore, fault identification experiments were conducted, validating the high accuracy of the accompanying contribution graph method.
Article
SPC with positive autocorrelation is well known to result in frequent false alarms if the autocorrelation is ignored. The autocorrelation is a nuisance and not a feature that merits modeling and understanding. This paper proposes exhaustive systematic sampling, which is similar to Bayesian thinning except no observations are dropped, to create a pooled variance estimator that can be used in Shewhart control charts with competitive performance. The expected value and variance are derived using quadratic forms that is nonparametric in the sense no distribution or time series model is assumed. Practical guidance for choosing the systematic sampling interval is offered to choose a value large enough to be approximately unbiased and not too big to inflate variance. The proposed control charts are compared to time series residual control charts in a simulation study that validates using the empirical reference distribution control limits to preserve stated in-control false alarm probability and demonstrates similar performance.
Article
We propose a new systematic approach for conducting decentralized SPM based on the functional decomposition of the system’s causal network. The methodology consists of first inferring the causal network from normal operating data of the system under study, after which the functional modules are identified by exploring the graph topology and finding the strongly connected “communities”. The interaction between functional modules is also taken into account (macro-causality), by extending the original communities with the Markov-blankets of the connection nodes, giving rise to “extended communities”. Two hierarchical monitoring schemes are proposed for distributed monitoring: CNET-C (Causal Network-Centralized) and CNET-D (Causal Network-Distributed). Results demonstrate the increased sensitivity in fault detection of the proposed methodologies compared to conventional non-causal methods and centralized causal methods that monitor the complete network. The proposed approaches also lead to a more effective, unambiguous, and conclusive fault diagnosis activity.
Article
Recently, data-driven fault diagnosis techniques, especially multivariate statistical process monitoring methods have been extensively investigated and widely applied to industries. Whether the monitored system is under closed-loop or open-loop control has a strong impact on the development of fault diagnosis strategies, but this point has not been taken seriously into consideration. This work aims to provide an effective data-driven method for sensor fault diagnosis under closed-loop control with modified slow feature analysis (SFA). The SFA is first revisited and compared with the well-known principal component analysis approach. Through theoretical analysis and an intuitive example, the influence of feedback control on sensor fault diagnosis is demonstrated, and to achieve successful detection a strategy of incorporating more variables for monitoring is suggested. Then, based on SFA, a new sensor fault detection and classification method is developed. Its detectability analysis is carried out, based on which improved methods are proposed for efficient detection of incipient sensor faults. Finally, case studies on the continuous stirred tank reactor benchmark process demonstrate the effectiveness of the proposed method.
Article
System dynamics are inevitable in industrial processes due to factors such as ambient disturbances and controller tuning. Accurate modeling of these dynamics are of key importance for subsequent process analysis and anomaly detection, and dynamic latent variable methods are widely adopted since they retain good interpretability. However, only dynamic cross-correlations are modeled in existing methods, leaving a large portion of quality information unexploited. In this work, an efficient dynamic auto-regressive canonical correlation analysis (EDACCA) method is proposed with a modified auto-regressive exogenous model to extract dynamics in both auto-correlations and cross-correlations. The flexibility and efficiency of EDACCA are improved with the design of weighting parameters and the economic singular value decomposition. EDACCA is further adapted for multi-step ahead (MS) prediction and missing data imputation. Two industrial processes are employed to evaluate the prediction performance and imputation performance of EDACCA. Note to Practitioners —Different sampling rates are usually set for process and quality variables in industrial processes, which leads to less quality samples. Meanwhile, system dynamics are not fully exploited for dynamic predictive modeling in most existing algorithms. The focus of this study is to develop a customized data imputation method for different data volume of process and quality data. An efficient dynamic auto-regressive canonical correlation analysis (EDACCA) is designed to extract temporal relations between process and quality variables, which is also adapted for multi-step-ahead prediction purpose. An EDACCA based data imputation method is also proposed to impute incomplete data caused by irregular sampling rates.
Article
The development of the Internet of Things, cloud computing, and artificial intelligence has given birth to industrial artificial intelligence (IAI) technology, which enables us to obtain fine perception and in-depth understanding capabilities for the operating conditions of industrial processes, and promotes the intelligent transformation of modern industrial production processes. At the same time, modern industry is facing diversified market demand instead of ultra-large-scale demand, resulting in typical variable conditions, which enhances the nonstationary characteristics of modern industry, and brings great challenges to the monitoring of industrial processes. In this regard, this paper analyzes the complex characteristics of nonstationary industrial operation, reveals the effects on operating condition monitoring, and summarizes the difficulties faced by varying condition monitoring. Furthermore, by reviewing the recent 30 years of development of data-driven methods for industrial process monitoring, we sorted out the evolution of nonstationary monitoring methods, and analyzed the features, advantages and disadvantages of the methods at different stages. In addition, by summarizing the existing related research methods by category, we hope to provide reference for monitoring methods of nonstationary process. Finally, combined with the development trend of industrial artificial intelligence technologies, some promising research directions are given in the field of nonstationary process monitoring.
Article
Full-text available
By considering autocorrelation among process data, canonical variate analysis (CVA) can noticeably enhance fault detection performance. To monitor nonlinear dynamic processes, a kernel CVA (KCVA) model was developed by performing CVA in the kernel space generated by kernel principal component analysis (KPCA). The Gaussian kernel is widely adopted in KPCA for nonlinear process monitoring. In Gaussian kernel-based process monitoring, a single learner is represented by a certain selected kernel bandwidth. However, the selection of kernel bandwidth plays a pivotal role in the performance of process monitoring. Usually, the kernel bandwidth is determined manually. In this paper, a novel ensemble kernel canonical variate analysis (EKCVA) method is developed by integrating ensemble learning and kernel canonical variate analysis. Compared to a single learner, the ensemble learning method usually achieves greatly superior generalization performance through the combination of multiple base learners. Inspired by the ensemble learning method, KCVA models are established by using different kernel bandwidths. Further, two widely used T 2 and Q monitoring statistics are constructed for each model. To improve process monitoring performance, these statistics are combined through Bayesian inference. A numerical example and two industrial benchmarks, the continuous stirred-tank reactor process and the Tennessee Eastman process, are carried out to demonstrate the superiority of the proposed method.
Article
Chemical industrial processes are always accompanied by multiple operating conditions, which brings great challenges for multivariate statistical process monitoring methods to extract general characteristics from multimode data, especially for time-varying characteristics in transitions between two modes. In this work, a novel statistical process monitoring method based on the dissimilarity of process variable correlation (DISS-PVC) is proposed. The proposed method aims to monitor multiple stable modes and between-mode transitions simultaneously with no prior knowledge of the number of operating modes. Unlike traditional methods oriented to monitoring process variables, the proposed method is applied to monitor the correlation of process variables based on the idea that variable correlation should always conform to a certain process internal mechanism, no matter in which stable or transition mode. Mutual information is first employed to quantitate variable correlation with a moving-window approach. Cosine similarity between eigenvalues of mutual information matrices is selected as a dissimilarity index to evaluate the difference in variable correlation between two data sets and perform fault detection. The effectiveness of the proposed method is verified on the benchmark Tennessee Eastman (TE) process and an industrial continuous catalytic reforming heat exchange unit.
Article
The number and diversity of Process Analytics applications is growing fast, impacting areas ranging from process operations to strategic planning or supply chain management. However, this field has not reached yet a maturity level characterized by a stable, organized and consolidated body of knowledge for handling the main classes of problems that need to be faced. Data-Driven Process Systems Engineering and Process Analytics only recently received wider recognition, becoming a regular presence in journals and conferences. As a tribute to the groundbreaking Process Analytics contributions of George Stephanopoulos, namely through his academic tree, to which we are proud to belong, this article aims to contribute to the systematization and consolidation of this field in the broad PSE scope, starting from a fundamental understanding of the key challenges facing it, and constructing from them a workflow that can flexibly be adapted to handle different problems, aimed at supporting value creation through good decision-making. In this path, we base our foresight and conceptual framework on the authors’ experience, as well as on contributions from other researchers that, across the world, have been collectively pushing forward Data-Driven Process Systems Engineering.
Article
Fault diagnosis is essential for troubleshooting and maintenance of industrial processes that operate dynamically. Traditional reconstruction-based fault diagnosis methods, however, are mostly developed for static processes and are ineffective for faults with similar directions. In this paper, a new hierarchical fault diagnosis strategy that incorporates reconstruction and dynamic time warping is proposed for the feeding anomaly diagnosis of an industrial cone crusher. A novel fault-magnitude-estimation method for dynamic processes is proposed based on the dynamic relations captured by dynamic latent variable (DLV) predictions. A combined index is developed based on the prediction residuals which exclude normal and predictable variations to improve sensitivity to faults. Fault-magnitude-estimation-based dynamic time warping is proposed to evaluate the shape similarity of faults in order to further isolate the fault candidates with similar directions. The reconstructed magnitude is utilized to extract shape features of the faults. The advantages are demonstrated using a Monte-Carlo simulation example of a dynamic process. Finally, the proposed method is applied successfully to diagnose feeding anomalies of an industrial cone crusher.
Article
Process dynamic behaviors resulting from closed-loop control and the inherence of processes are ubiquitous in industrial processes and bring a considerable challenge for process monitoring. Many methods have been developed for dynamic process monitoring, of which the dynamic latent variables (DLV) model is one of the most practical and promising branches. This paper provides a timely retrospective study of typical methods to fill the void in the systematic analysis of DLV methods for dynamic process monitoring. First, several classical DLV methods are briefly reviewed from three aspects, including original ideas, the determination of parameters, and offline statistics design. Second, a discussion on the relationships of the discussed methods has been established to make a clear understanding of process dynamics explained by each method. Third, five cases of a three-phase flow process are provided to illustrate the effectiveness of the methods from the application viewpoint. Finally, future research directions on dynamic process monitoring have also been provided. The primary objective of this paper is to summarize the prevalent DLV methods for dynamic process monitoring and thus highlight a valuable reference for further improvement on DLV models and the selection of algorithms in practical applications.
Article
Missing data widely exist in industrial processes and lead to difficulties in modelling, monitoring, fault diagnosis, and control. In this paper, we propose a nonlinear method to handle the missing data problem in the offline modelling stage or/and the online monitoring stage of statistical process monitoring. We provide a fast incremental nonlinear matrix completion (FINLMC) method for missing data imputation, which enables us to use kernel methods such as kernel principal component analysis (KPCA) to monitor nonlinear multivariate processes even when there are missing data. We also provide theoretical analysis for the effectiveness of the proposed method. Experiments show that the proposed method can reduce the false alarm rate and improve the fault detection rate in nonlinear processing monitoring with missing data. The proposed method FINLMC can also be used to solve missing data in other problems such as classification and process control.
Article
A model predictive control (MPC) formulation for a mammalian cell fed-batch bioreactor processes is developed. A nonlinear fundamental model for the bioreactor is used to generate a database of historical runs comprising of the measurement variables and the manipulated input feed flow rate to the bioreactor. The database is used with subspace identification methods to develop a state-space model of the process. The identified model is used to design various MPC formulations with different objective criteria, including the conventional trajectory-tracking objective function and a novel terminal objective for maximizing the product yield at completion of a run. Case studies involving the simulated bioreactor process demonstrate the efficacy of the MPC algorithms subject to unknown disturbances, random variations in the inlet feed glucose and glutamine concentrations, and measurement noise. Compared to the traditional proportional-integral control algorithm, the trajectory-tracking predictive control algorithm is able to better track the reference glucose concentration set-point with an improvement of 5.1% in the tracking error. The critical quality attribute predictive control algorithm designed to maximize the product yield results in a 3.9% increase in the product concentration at the completion of the run.
Article
A model predictive control (MPC) system based on latent variables (LV) model generated by using partial least squares (PLS) method is developed. The difference in the performance of MPCs that use recursively updated LV models based on autoregressive time series modeling (with exogenous inputs - ARX) and PLS is studied. The effect of signal noise on MPC performance is also investigated for both types of models. MPC performance is evaluated by regulating the blood glucose concentration (BGC) of people with Type 1 diabetes mellitus (T1DM) in simulation studies. Signal noise in glucose concentration sensor data, delays caused by insulin absorption and action, and disturbances caused by consumption of meals make the regulation of BGC difficult. The proposed controller is evaluated with 10 in-silico adult subjects of the UVa/Padova simulator with different levels of signal noise. The results illustrate the effectiveness of the MPC based on LV model. The average time for BGC in the safe range (70-180 mg/dL) for the LV-based MPC is 83.23% compared to 79.68% for the MPC based on ARX model when intravenous BGC values are used. The average time in safe range decreases to 76.04% and 71.92%, respectively, when using the generic CGM sensor of the simulator. It is reduced further to 71.93% and 67.20% when additional noise is added to CGM readings.
Article
Process Systems Engineering (PSE) is now a mature field with a well-established body of knowledge, computational-oriented frameworks and methodologies designed and implemented for addressing chemical processes related problems spanning a wide range of scales in time and space. A common feature of many PSE approaches relies in their mostly deductive nature, based on a deep understanding of the underlying Chemical Engineering Science. Given the current data-intensive industrial and societal contexts, new sources of process or product information are now easily made available and should be exploited to complement and expand the classical PSE paradigm with inductive data-driven reasoning and knowledge discovery methodologies. In this article, based upon our over 25 years of research and teaching experience in the field, we discuss the scope and trends of this PSE evolution, refer to several relevant Data-Centric PSE approaches, and identify the main components, applications and future opportunities of this PSE 4.0 perspective.
Article
Due to the interaction of process variables, process data is in essence graph-structured with non-Euclidean nature. Hence, learning the graph representation in a low-dimensional Euclidean space will be helpful for gaining the true insights underlying the industrial process. In this study, a meticulous process monitoring method (PM-MCF) is proposed based on multiscale convolutional feature extraction. For the proposed method, the interactions between different process variables are identified using causality inference. According to the obtained graph structure, convolutional filters are specifically designed for each process variable. In this way, the local correlation within directly related variables and their corresponding dynamic information can be effectively extracted. Besides, with the increasing of convolutional layers, more variables can be involved through the interaction relationship to explore a larger reception field. Based on the obtained feature matrices, sub-models are developed to calculate the monitoring statistics and their corresponding control limits. Finally, the decisions of all the sub-models are integrated to identify the operation status of the process system. It is noted that the proposed PM-MCF can be readily generalized to other existing methods by replacing the selected filter and the developed sub-models. The monitoring performance of the proposed method is illustrated using process data collected from a thermal power plant. Experimental results show that the proposed method can accurately detect the process anomalies using the extracted causal relationship.
Chapter
Canonical variate analysis is a family of multivariate statistical process monitoring tool for the analysis of paired sets of variables. Canonical variate analysis has been employed to extract relations between two sets of variables when the relations have been considered to be non-linear, when the process is non-stationary and when the dimensionality needs to be reduced to benefit human interpretation. This tutorial provides the theoretical background of canonical variate analysis. Together with the industrial examples, this study discusses the applicability of the extensions of canonical variate analysis to diagnosis and prognosis. We hope that this overview can serve as a hands-on tool for applying canonical variate analysis in condition monitoring of industrial processes.
Article
As a highly complex and time-varying process, gas-water two-phase flow is commonly encountered in industries. It has a variety of typical flow states and transition flow states. Accurate identification and monitoring of flow states is not only beneficial to further study of two-phase flow but also helpful for stable operation and economic efficiency of process industry. Combining canonical variate analysis (CVA) and Gaussian mixture model (GMM), a strategy called multi-CVA-GMM is proposed for flow state monitoring in gas-water two-phase flow. CVA is used to extract flow state features from the perspective of correlation between historical data and future data, which solves the cross correlation and temporal correlation of multi-sensor measurement data. GMM calculates the possibility that the current flow state belongs to each typical flow pattern and judges the current flow state by probability indicators. It is conducive to follow-up use of Bayesian inference probability and Mahalanobis distance-based (BID) indicator for flow state monitoring, which avoids repeated traversal of multiple CVA-GMM models and improves the efficiency of the monitoring process. The probability indicators can also be used to analyze transition flow states. The method combining the probabilistic idea of GMM with the deterministic idea of multimodal modeling can accurately identify the current flow state and effectively monitor the evolution of flow state. The multi-CVA-GMM method is validated by using the measured data of the horizontal flow loop of gas-water two-phase flow experimental facility, and its effectiveness is proved.
Article
This paper is concerned with data science and analytics as applied to data from dynamic systems for the purpose of monitoring, prediction, and inference. Collinearity is inevitable in industrial operation data. Therefore, we focus on latent variable methods that achieve dimension reduction and collinearity removal. We present a new dimension reduction expression of state space framework to unify dynamic latent variable analytics for process data, dynamic factor models for econometrics, subspace identification of multivariate dynamic systems, and machine learning algorithms for dynamic feature analysis. We unify or differentiate them in terms of model structure, objectives with constraints, and parsimony of parameterization. The Kalman filter theory in the latent space is used to give a system theory foundation to some empirical treatments in data analytics. We provide a unifying review of the connections among the dynamic latent variable methods, dynamic factor models, subspace identification methods, dynamic feature extractions, and their uses for prediction and process monitoring. Both unsupervised dynamic latent variable analytics and the supervised counterparts are reviewed. Illustrative examples are presented to show the similarities and differences among the analytics in extracting features for prediction and monitoring.
Chapter
An adaptive multivariate process modelling approach is developed to improve the accuracy of traditional canonical variate analysis (CVA) in predicting the performance of industrial rotating machines under faulty operating conditions. An adaptive forgetting factor is adopted to update the covariance and cross-covariance matrices of past and future measurements. The forgetting factor is adjusted according to the Euclidean norm of the residual between the predicted model outputs and the actual measurements. The approach was evaluated using condition monitoring data obtained from an operational industrial gas compressor. The results show that the proposed method can be effectively used to predict the performance of industrial rotating machines under faulty operating conditions.
Article
Full-text available
This paper is concerned with the treatment of residuals associated with principal component analysis. These residuals are the difference between the original observations and the predictions of them using less than a full set of principal components. Specifically, procedures are proposed for testing the residuals associated with a single observation vector and for an overall test for a group of observations. In this development, it is assumed that the underlying covariance matrix is known; this is reasonable for many quality control applications where the proposed procedures may be quite useful in detecting outliers in the data. A numerical example is included.
Book
Full-text available
Practically all modern control systems are based upon microprocessors and complex microcontrollers that yield high performance and functionality. This volume focuses on the design of computer-controlled systems, featuring computational tools that can be applied directly and are explained with simple paper-and-pencil calculations. The use of computational tools is balanced by a strong emphasis on control system principles and ideas. Extensive pedagogical aids include worked examples, MATLAB macros, and a solutions manual (see inside for details). The initial chapter presents a broad outline of computer-controlled systems, followed by a computer-oriented view based on the behavior of the system at sampling instants. An introduction to the design of control systems leads to a process-related view and coverage of methods of translating analog designs to digital control. Concluding chapters explore implementation issues and advanced design methods.
Article
When p correlated process characteristics are being measured simultaneously, often individual observations are initially collected. The process data are monitored and special causes of variation are identified in order to establish control and to obtain a “clean” reference sample to use as a basis in determining the control limits for future observations. One common method of constructing multivariate control charts is based on Hotelling's T² statistic. Currently, when a process is in the start-up stage and only individual observations are available, approximate F and chi-square distributions are used to construct the necessary multivariate control limits. These approximations are conservative in this situation. This article presents an exact method, based on the beta distribution, for constructing multivariate control limits at the start-up stage. An example from the chemical industry illustrates that this procedure is an improvement over the approximate techniques, especially when the number of subgroups is small.
Book
model's predictive capability? These are some of the questions that need to be answered in proposing any time series model construction method. This book addresses these questions in Part II. Briefly, the covariance matrices between past data and future realizations of time series are used to build a matrix called the Hankel matrix. Information needed for constructing models is extracted from the Hankel matrix. For example, its numerically determined rank will be the di­ mension of the state model. Thus the model dimension is determined by the data, after balancing several sources of error for such model construction. The covariance matrix of the model forecasting error vector is determined by solving a certain matrix Riccati equation. This matrix is also the covariance matrix of the innovation process which drives the model in generating model forecasts. In these model construction steps, a particular model representation, here referred to as balanced, is used extensively. This mode of model representation facilitates error analysis, such as assessing the error of using a lower dimensional model than that indicated by the rank of the Hankel matrix. The well-known Akaike's canonical correlation method for model construc­ tion is similar to the one used in this book. There are some important differ­ ences, however. Akaike uses the normalized Hankel matrix to extract canonical vectors, while the method used in this book does not normalize the Hankel ma­ trix.
Chapter
Any data table produced in a chemical investigation can be analysed by bilinear projection methods, i. e. principal components and factor analysis and their extensions. Representing the table rows (objects) as points in a p-dimensional space, these methods project the point swarm of the data set or parts of it down on a F-dimensional subspace (plane or hyperplane). Different questions put to the data table correspond to different projections. This provides an efficient way to convert a data table to a few informative pictures showing the relations between objects (table rows) and variables (table columns). The methods are presented geometrically and mathematically in parallell with chemical illustrations. more dangerous in the long run than methods that are conservative with respect to the amount of extracted information.
Article
In this article it is shown that the class of all realizations possessing the same power spectral density can be uniquely characterized by giving an invariant system description. A new transformation group is introduced and shown to leave the spectral density unchanged. The action of this group must be considered when attempting to specify a stochastic realization from spectral densities or equivalently covariance sequences.
Article
When performing quality control in a situation in which measures are made of several possibly related variables, it is desirable to use methods that capitalize on the relationship between the variables to provide controls more sensitive than those that may be made on the variables individually. The most common methods of multivariate quality control that assess the vector of variables as a whole are those based on the Hotelling T between the variables and the specification vector. Although T is the optimal single-test statistic for a general multivariate shift in the mean vector, it is not optimal for more structured mean shifts-for example, shifts in only some of the variables. Measures based on quadratic forms (like T ) also confound mean shifts with variance shifts and require quite extensive analysis following a signal to determine the nature of the shift. This article proposes Shewhart and cumulative sum (CUSUM) controls based on the vector Z of scaled residuals from the regression of each variable on all others. Each component of Z is the (Neyman-Pearson) optimal single-test statistic for testing whether that variable is shifted in mean. The Shewhart charts plot the components of Z, and the CUSUM charts are based on accumulation of components of Z, leading one to anticipate good performance by the charts. This is verified by some average run length calculations. The vector Z also has the valuable interpretive property that signals given are for shifts in the mean, or shifts in the variance, of particular variables rather than global signals indicating some unspecified departure from control.
Article
Tukey introduced a family of random variables defined by the transformation Z = [U - (1 - U)]/λ where U is uniformly distributed on [0, 1]. Some of its properties are described with emphasis on properties of the sample range. The rectangular and logistic distributions are members of this family and distributions corresponding to certain values of λ give useful approximations to the normal and t distributions. Closed form expressions are given for the expectation and coefficient of variation of the range and numerical values are computed for n = 2(1)6(2)12, 15, 20 for several values of λ. It is observed that Plackett's upper bound on the expectation of the range for samples of size n is attained for a λ distribution with λ = n − 1.
Article
The use of partial least squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed. Consecutive estimates (rank 1,2,)({\text{rank }}1,2,\cdots ) are obtained using the residuals from previous rank as a new dependent variable y. The PLS method is equivalent to the conjugate gradient method used in Numerical Analysis for related problems. To estimate the “optimal” rank, cross validation is used. Jackknife estimates of the standard errors are thereby obtained with no extra computation. The PLS method is compared with ridge regression and principal components regression on a chemical example of modelling the relation between the measured biological activity and variables describing the chemical structure of a set of substituted phenethylamines.
Article
Traditionally, control charts are developed assuming that the sequence of process observations to which they are applied are uncorrelated. Unfortunately, this assumption is frequently violated in practice. The presence of autocorrelation has a serious impact on the performance of control charts, causing a dramatic increase in the frequency of false alarms. This paper presents methods for applying statistical control charts to autocorrelated data. The primary method is based on modeling the autocorrelative structure in the original data and applying control charts to the residuals. We show that the exponentially weighted moving average (EWMA) statistic provides the basis of an approximate procedure that can be useful for autocorrelated data. Illustrations are provided using real process data.
Article
Cascade and multivariable control of a high temperature short time (HTST) pasteurization system were tested and compared with the performance of singleloop feedback control. Multivariable control was implemented on the basis of computations of product temperatures that yield equivalent lethality at a residence time of 15 s at 161 °F in the holding tube. Both cascade and multivariable controllers reduced product temperature fluctuations and overshoot compared to single-loop feedback control. Multivariable control was based on on-line computation of equivalent total lethality and it permitted operation at variable flow rates or at the most desirable temperatures for product quality and functionality.
Article
The problem of discrete-time stochastic model reduction (approximation) is considered. Using the canonical correlation analysis approach of Akaike (1975), a new order-reduction algorithm is developed. Furthermore, it is shown that the inverse of the reduced-order realization is asymptotically stable. Next, an explicit relationship between canonical variables and the linear least-squares estimate of the state vector is established. Using this, a more direct approach for order reduction is presented, and also a new design for reduced-order Kalman filters is developed. Finally, the uniqueness and symmetry properties for the new realization—the balanced stochastic realization—along with a simulation result, are presented.
Article
The second-order moment structure of time series models is used to derive a canonical analysis in time series modelling. Consistency properties of certain canonical correlations and the corresponding eigenvectors are shown. Based on these properties, a canonical correlation approach for tentative order determination in building autoregressive-moving average models is proposed. This approach can handle directly nonstationary as well as stationary processes and it also provides consistent estimates of the auto-regressive parameters involved. The asymptotic distribution of the identification statistic is discussed.
Article
The vectors p, q and w of the partial least squares (PLS) model are shown to be computable from the input covariance matrix and the input/output covariance matrix of a system without reference to any actual data. This approach is given in terms of the singular value decompositions of the input/output covariance matrix and its residual matrices for each component. Such an approach may be useful for studying the properties of the PLS model.
Article
The problem of using time-varying trajectory data measured on many process variables over the finite duration of a batch process is considered. Multiway principal-component analysis is used to compress the information contained in the data trajectories into low-dimensional spaces that describe the operation of past batches. This approach facilitates the analysis of operational and quality-control problems in past batches and allows for the development of multivariate statistical process control charts for on-line monitoring of the progress of new batches. Control limits for the proposed charts are developed using information from the historical reference distribution of past successful batches. The method is applied to data collected from an industrial batch polymerization reactor.
Article
This paper introdLlces the normal probability plot correlation coefficient as a test statistic in complete samples for the composite hypothesis of normality. The proposed test statistic is conceptnally simple, is compntationally convenient, and is readily extendible to testing non-normal distributional hypotheses. An empirical power strldy shows that the normal probability plot correlation coefficient, compares favorably with 7 other normal test statistics. Percent points are tabulated for n = 3(l)50(5)100.
Article
Process computers routinely collect hundreds to thousands of pieces of data from a multitude of plant sensors every few seconds. This has caused a 'data overload' and due to the lack of appropriate analyses very little is currently being done to utilize this wealth of information. Operating personnel typically use only a few variables to monitor the plant's performance. However, multivariate statistical methods such as PLS (Partial Least Squares or Projection to Latent Structures) and PCA (Principal Component Analysis) are capable of compressing the information down into low dimensional spaces which retain most of the information. Using this method of statistical data compression a multivariate monitoring procedure analogous to the univariate Shewart Chart has been developed to efficiently monitor the performance of large processes, and to rapidly detect and identify important process changes. This procedure is demonstrated using simulations of two processes, a fluidized bed reactor and an extractive distillation column.
Article
Detecting out-of-control status and diagnosing disturbances leading to the abnormal process operation early are crucial in minimizing product quality variations. Multivariate statistical techniques are used to develop detection methodology for abnormal process behavior and diagnosis of disturbances causing poor process performance. Principal components and discriminant analysis are applied to quantitatively describe and interpret step, ramp and random-variation disturbances. All disturbances require high-dimensional models for accurate description and cannot be discriminated by biplots. Diagnosis of simultaneous multiple faults is addressed by building quantitative measures of overlap between models of single faults and their combinations. These measures are used to identify the existence of secondary disturbances and distinguish their components. The methodology is illustrated by monitoring the Tennessee Eastman plant simulation benchmark problem subjected to different disturbances. Most of the disturbances can be diagnosed correctly, the success rate being higher for step and ramp disturbances than random-variation disturbances.
Article
In this paper, we present two novel algorithms to realize a finite dimensional, linear time-invariant state-space model from input-output data. The algorithms have a number of common features. They are classified as one of the subspace model identification schemes, in that a major part of the identification problem consists of calculating specially structured subspaces of spaces defined by the input-output data. This structure is then exploited in the calculation of a realization. Another common feature is their algorithmic organization: an RQ factorization followed by a singular value decomposition and the solution of an overdetermined set (or sets) of equations. The schemes assume that the underlying system has an output-error structure and that a measurable input sequence is available. The latter characteristic indicates that both schemes are versions of the MIMO Output-Error State Space model identification (MOESP) approach. The first algorithm is denoted in particular as the (elementary MOESP scheme). The subspace approximation step requires, in addition to input-output data, knowledge of a restricted set of Markov parameters. The second algorithm, referred to as the (ordinary MOESP scheme), solely relies on input-output data. A compact implementation is presented of both schemes. Although we restrict our presentation here to error-free input-output data, a framework is set up in an identification context. The identification aspects of the presented realization schemes are treated in the forthcoming Parts 2 and 3.
Article
Multivariate statistical procedures for monitoring the progress of batch processes are developed. The only information needed to exploit the procedures is a historical database of past successful batches. Multiway principal component analysis is used to extract the information in the multivariate trajectory data by projecting them onto low-dimensional spaces defined by the latent variables or principal components. This leads to simple monitoring charts, consistent with the philosophy of statistical process control, which are capable of tracking the progress of new batch runs and detecting the occurrence of observable upsets. The approach is contrasted with other approaches which use theoretical or knowledge-based models, and its potential is illustrated using a detailed simulation study of a semibatch reactor for the production of styrene-butadiene latex.
Article
Schemes for monitoring the operating performance of large continuous processes using multivariate statistical projection methods such as principal component analysis (PCA) and projection to latent structures (PLS) are extended to situations where the processes can be naturally blocked into subsections. The multiblock projection methods allow one to establish monitoring charts for the individual process subsections as well as for the entire process. When a special event or fault occurs in a subsection of the process, these multiblock methods can generally detect the event earlier and reveal the subsection within which the event has occurred. More detailed diagnostic methods based on interrogating the underlying PCA/PLS models are also developed. These methods show those process variables which are the main contributors to any deviations that have occurred, thereby allowing one to diagnose the cause of the event more easily. These ideas are demonstrated using detailed simulation studies on a multisection tubular reactor for the production of low-density polyethylene.
Article
In this paper we develop the mathematical and statistical structure of PLS regression. We show the PLS regression algorithm and how it can be interpreted in model building. The basic mathematical principles that lie behind two block PLS are depicted. We also show the statistical aspects of the PLS method when it is used for model building. Finally we show the structure of the PLS decompositions of the data matrices involved.
Article
Measurements from industrial processes are often serially correlated. The impact of this correlation on the performance of the cumulative sum and exponentially weighted moving average charting techniques is investigated in this paper. It is shown that serious errors concerning the “state of statistical process control” may result if the correlation structure of the observations is not taken into account. The use of time series methods for coping with serially correlated observations is outlined. Paper basis weight measurements are used to illustrate the time series methodology.Les mesures prises sur des procédés industriels sont souvent corrélées en série. Nous étudions dans cet article l'impact de ce type de corrélation sur la performance de la somme cumulative et sur les techniques de présentation de la moyenne mobile pondérée exponentiellement. On montre qu'il peut exister des erreurs importantes concernant l'«état du contrôle de procédé statistique» Si la structure de corrélation des observations n'est pas prise en considération. On décrit l'utilisation des méthodes de séries chronologiques pour des observations corrélées en série. La méthode des séries chronologiques est illustrée avec des mesures de poids de base de papier.
Chapter
New results are developed that give accuracy confidence bands based upon the parameter estimation error of system identification including the bias of model under-fitting. In this context, bias means the underestimation of model accuracy due to selection of the model state order less than the true system order. It is shown that the optimal accuracy model is given by the model minimizing the Akaike information criterion (AIC), and that the bias is indicated by the behavior of the AIC as a function of model order beyond the minimum order. The resulting confidence bands are simultaneous so that they guarantee the system frequency response function lies entirely within the band for all frequencies with the stated probability. Such bands are then appropriate for guaranteeing robust stability and performance of robust controllers.
Conference Paper
The canonical variate analysis (CVA) approach for system identification, filtering, and adaptive control is developed. The past/future Markov property provides a starting point for defining a reduced-order prediction problem. The solution is a canonical variate analysis that is characterized by a generalized singular value decomposition. State-space model estimation requires only simple regression, and state order selection involves the optimal Akaike information criterion procedure. The CVA method extends to time-varying and abruptly changing systems. A reduce-rank stochastic model predictive control problem is shown to be equivalent to the CVA problem. Also discussed are computational aspects, applications, and an example illustrating the method. New extensions to the identification of general nonlinear systems are briefly discussed. The CVA method provides an approach giving reliable, automatic implementation of identification, filtering, and control for online adaptive control
Conference Paper
The wealth of process information generated from sensor readings can be used to detect bias changes, drift and/or higher levels of noise in various process sensors. New multivariate statistical techniques permit frequent audits of process sensors. These methods are based on the evaluation of residuals generated by utilizing plant models developed with principal components analysis (PCA) or partial least squares (PLS) methods. The fact that the prediction of each variable in the process involves all the other process variables (PLS) and even itself (PCA), may cause false alarms even though the related sensors function properly. A multipass PLS regression technique is proposed to eliminate the false alarms. The sensor with the highest corruption is discarded from both the calibration and the test data when a sensor failure is detected. This eliminates the effect of the corrupted data on the prediction of the remaining process variables and prevents false alarms. The technique is applied to a High Temperature Short Time (HTST) Pasteurization Pilot plant with six temperature and one flow rate measurements.
Conference Paper
Very general reduced order filtering and modeling problems are phased in terms of choosing a state based upon past information to optimally predict the future as measured by a quadratic prediction error criterion. The canonical variate method is extended to approximately solve this problem and give a near optimal reduced-order state space model. The approach is related to the Hankel norm approximation method. The central step in the computation involves a singular value decomposition which is numerically very accurate and stable. An application to reduced-order modeling of transfer functions for stream flow dynamics is given.
Article
In this paper we extend previous work by ourselves and other researchers in the use of principal component analysis (PCA) for statistical process control in chemical processes. PCA has been used by several authors to develop techniques to monitor chemical processes and detect the presence of disturbances [1–5]. In past work, we have developed methods which not only detect disturbances, but isolate the sources of the disturbances [4]. The approach was based on static PCA models, T2 and Q charts [6], and a model bank of possible disturbances. In this paper we use a well-known ‘time lag shift’ method to include dynamic behavior in the PCA model. The proposed dynamic PCA model development procedure is desirable due to its simplicity of construction, and is not meant to replace the many well-known and more elegant procedures used in model identification. While dynamic linear model identification, and time lag shift are well known methods in model building, this is the first application we are aware of in the area of statistical process monitoring. Extensive testing on the Tennessee Eastman process simulation [7] demonstrates the effectiveness of the proposed methodology.
Article
Multivariate statistical methods for the analysis, monitoring and diagnosis of process operating performance are becoming more important because of the availability of on-line process computers which routinely collect measurements on large numbers of process variables. Traditional univariate control charts have been extended to multivariate quality control situations using the Hotelling T2 statistic. Recent approaches to multivariate statistical process control which utilize not only product quality data (Y), but also all of the available process variable data (X) are based on multivariate statistical projection methods (principal component analysis, (PCA), partial least squares, (PLS), multi-block PLS and multi-way PCA). An overview of these methods and their use in the statistical process control of multivariate continuous and batch processes is presented. Applications are provided on the analysis of historical data from the catalytic cracking section of a large petroleum refinery, on the monitoring and diagnosis of a continuous polymerization process and on the monitoring of an industrial batch process.
Article
A modification of the Partial Least-Squares (PLS) modelling procedure is presented which permits incorporation of a dynamic transformation into the standard algebraic form of this model. This makes it possible to use the resulting dynamic PLS model for control system design by employing precompensators and postcompensators constructed from the input and output loading matrices and basing the controller for the compensated plant on the dynamic inner relation of the modified PLS model.
Article
A method of identification of linear input-output models using canonical variate analysis (CVA) is developed for application to chemical processes. This approach yields both a process model and a nonparametric description of model uncertainty, utilizing CVA for selection of a state coordinate system that optimally relates past inputs and outputs to future outputs. Regression procedures are then used for estimation of the state-space model parameters, and the Akaike Information Criterion (AIC) is used to determine the model order. The primary computations involve singular value decompositions which are numerically stable and accurate.The effectiveness of the CVA approach is first evaluated with simulated chemical processes that exhibit most of the practical problems encountered by existing system identification methods: nonlinear dynamics, unknown model orders and time delays, nonminimum phase dynamics, partial stiffness (requiring two time-scale approaches when other identification methods are used), low input excitation in some frequency bands, and measurement and process noise. The CVA methodology is then applied to the identification of models for a pilot-scale distillation column.
Article
Recently a great deal of attention has been given to numerical algorithms for subspace state space system identification (N4SID). In this paper, we derive two new N4SID algorithms to identify mixed deterministic-stochastic systems. Both algorithms determine state sequences through the projection of input and output data. These state sequences are shown to be outputs of non-steady state Kalman filter banks. From these it is easy to determine the state space system matrices. The N4SID algorithms are always convergent (non-iterative) and numerically stable since they only make use of QR and Singular Value Decompositions. Both N4SID algorithms are similar, but the second one trades off accuracy for simplicity. These new algorithms are compared with existing subspace algorithms in theory and in practice.
Article
Multivariate statistical procedures for monitoring the progress of batch processes are developed. Multi-way partial least squares (MPLS) is used to extract the information from the process measurement variable trajectories that is more relevant to the final quality variables of the product. The only information needed is a historical database of past successful batches. New batches can be monitored through simple monitoring charts which are consistent with the philosophy of statistical process control. These charts monitor the batch operation and provide on-line predictions of the final product qualities. Approximate confidence intervals for the predictions from PLS models are developed. The approach is illustrated using a simulation study of a styrene-butadiene batch reactor.