Conference PaperPDF Available

Model Free Diagnosis of Pneumatic Systems using Machine Learning

Authors:
Conference Paper

Model Free Diagnosis of Pneumatic Systems using Machine Learning

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
Model Free Diagnosis of Pneumatic Systems using Machine Learning
Wolfgang Ertel, Robin Lehmann,∗∗, Ralf Medow∗∗, Matthias Finkbeiner∗∗ , Andreas Meyer∗∗
Institute for Artificial Intelligence, Hochschule Ravensburg-Weingarten
http://iki.hs-weingarten.de
Festo AG & Co. KG, Esslingen Berkheim∗∗
http://www.festo.de
E-Mail: ertel@hs-weingarten.de
We address the task of model free fault detection in arbitrary pneumatic systems based on continuous air flow
measurements and present a universal diagnostic module that treats the pneumatic system as a blackbox. This
module can be applied to arbitrarily complex systems for which no mathematical models exist. We use machine
learning algorithms for acquiring the diagnostic knowledge. The diagnostic module is trained on air-flow data
of the pneumatic system in normal operation using the one-class-learning algorithm neighbour-data-description
(NNDD). We achieve excellent classification results with zero error rate on a real pneumatic system.
Keywords: Model free diagnosis, machine learning, pattern matching, pneumatic systems, airflow.
Target Audience: Condition Monitoring and Diagnosis, Energy Management, Simulation and Validation.
1 Introduction
High energy costs, efficiency requirements and awareness of climate change make energy efficiency a core task
for industrial businesses. In this context, monitoring and diagnosis of relevant parameters has high priority. Festo
is currently developing an energy efficiency module (short E2
M). The module (a prototype is shown in Figure 1)
can be attached to the air-supply pipe of an arbitrary pneumatic system. It continually monitors the volume
flow rate of the supply air and two pressure values. Based on these measurements it has to minimize the air
consumption with the goal of saving energy. The current version of the E2
Mswitches off the system as soon as
the air-flow falls below a certain threshold. This simple threshold decision shall now be replaced by an intelligent
classifier that, based on the air-flow pattern, detects whether the behaviour of the system deviates from its normal
operation. In case of a deviation from the norm, an appropriate action, like e.g. notification of the human operator
is taken.
Figure 1: The E2
Mcomprising a control unit, sensors for air-flow and pressure, two valves and a sound absorber.
A classical approach for developing a diagnostic algorithm would be to model the behaviour of the system, for
example by means of a simulation or with mathematical formulas. A different approach uses simple sensors such
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
as mechanical switches for detecting the end positions of a pneumatic cylinder in as many parts of the system as
possible. Unfortunately, typically mathematical modelling is impossible for complex systems and monitoring of
individual parts is very expensive. Even for simple systems this requires a human effort of many person-months
or years.
As an important requirement, the improved E2
Mwith the intelligent classifier must allow for an easy integration
in an arbitrary pneumatic system. Thus, we pursue an inductive approach. The E2
M“gets to know” the pneu-
matic system with statistical methods or machine learning algorithms and as a consequence is able to classify its
states of operation into good and bad.1For this purpose it must be able to detect deviations of the actual air-flow
pattern from that of normal operation. In addition (for lack of a mathematical model) the required knowledge
for this binary classification task has to be determined from operation data of the system. For such a scenario
machine learning (a sub-area of artificial intelligence) techniques provide the appropriate means /1/.
A particular challenge in this project was the following requirement from Festo. During the learning phase, when
the E2
Macquires its knowledge about the system to be monitored, it shall get access to states of the system
during normal operation as well as to faulty, i.e. abnormal, operation modes. It turned out however, that this is
not feasible and thus the E2
Mcan only be trained with data from normal operation. The reasons for this lack of
faulty training data are:
Collecting training data of faulty operation on a real pneumatic production system is typically too expensive
because of costly downtime of the system for each error to be monitored.
Pneumatic production systems are typically unique for a certain task in only one particular company and
no engineer has enough insight into the potentially infinite set of possible faults that might occur in such a
system. Thus, selecting only a few faults ad hoc normally leads to a non-representative statistical distribu-
tion of errors that the E2
Mlearns. This can lead to bad classification results. In such a scenario it is much
better to only train the diagnostic module with data from the normal operation.
Therefore during the learning phase, the learning algorithm has no access to faulty operation modes of the system.
But after learning, it has to distinguish between normal and faulty operation. As a solution for this problem we
will apply a modification of the so called one-class-learning (OCL) algorithm nearest neighbour-data-description
(NNDD) /2/.
The diagnostic module presented here only gets data from an air-flow sensor. This sensor monitors the total
air-flow through the system. Repeatedly, after a fixed time-interval, a fixed length time series of air-flow values
is used to calculate a vector of numeric features as input for the OCL classification task. Then a feature based
classifier is trained with OCL (e.g. during the installation of a new pneumatic system) to detect deviations of the
air-flow pattern from that of the normal operation.
2 Feature based Classification
If the air-flow time series would be a deterministic periodic function of time, classification of the measured time
series into normal and abnormal behaviour of the monitored system would be as easy as computing the norm of
the difference of two functions. However, this assumption is not valid. As can be seen on Figure 2, the air-flow
curve shows a certain pattern, but it is far from periodic. Thus, we have to use different techniques.
We decided to use classical machine learning algorithms to train a feature based classifier that has to decide
whether the observed air-flow pattern deviates from the normal pattern which was trained during an initial oper-
ation of the system. In the first step such a feature based classifier computes a vector of numeric features from
a finite interval of the air-flow time series. The features can be statistical parameters such as mean, standard
deviation etc.. In the second step the classifier uses the feature vector to output the class of the observed pattern.
As an extension, instead of a binary classification into good and bad patterns, the classifier can be trained to
output the type of fault in the observed system. Possible faults in a pneumatic system could be:
1Although the experiments presented here are restricted to this binary classification task, the method used can easily be generalized to
more than two classes.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
500
700
900
35000 35050 35100 35150
airflow [litres/min]
time [sec]
400
600
800
45100 45150 45200 45250
airflow [litres/min]
time [sec]
Figure 2: Air-flow time series of a real production system in normal fault free operation. Instead of a deterministic
periodic behaviour certain fuzzy patterns appear in the graphs.
Leakage: Leakage in central parts of the system or in the air-supply pipe leads to permanently increased air-flow.
Leaks in particular sub-circuits lead to temporary deviations in the air-flow pattern.
Blocked Cylinder
Wearing of particular parts or the whole system.
Speed variations: The whole system or parts of it perform faster or slower as normally.
...
Please note, however, that such a classification of the fault type causes a much higher effort for the operator of
the system because he has to provide training data for all classes to be distinguished, i.e. for all possible faults of
the system. Even though a classification of faults into many categories is highly desirable, in this first study, we
restrict our experiments to binary classification.
2.1 Model free diagnosis of industrial production systems
There exist numerous machine learning algorithms for training a classifier based on a file of training data. On
input of such a file, the classifier generates a function that can distinguish two or more classes /3/. Among the well
known supervised learning algorithms we find neural networks, decision tree learning, the pseudoinverse method
which is equivalent to linear least squares regression, nearest neighbour methods and support vector machines.
For all supervised learning algorithms, the training data need to be labeled. This means, a numeric class label has
to be attached to every feature vector. Thus, for fault diagnosis of a technical system, at every point in time in the
training data, the state of the system (good or faulty) has to be provided, either automatically or manually.
In industrial routine, training of the classifier may happen during initiation of the system or later on demand. Cap-
turing training data for the normal fault free operation is easy. However, as already mentioned in the introduction,
collection of training data of a system with errors is problematic.
Therefore we decided not to use classical two (or more) class classification algorithms. Rather, we applied one-
class-learning algorithms (Section 3) which get along with data of one class only during training. This solves our
problem. We can train the classifier with only data of the normal operation of the system. No data of faulty states
are required.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
2.2 Features
The air-flow time series is a list of pairs of numbers (ti, xi)i=1,... with tia time value and xithe air-flow value at
that time. The values of a short time interval in the curves in Figure 2 are:
time [sec] air-flow [l/min]
.
.
..
.
.
35017.846 694
35017.971 685
35018.111 685
35018.221 689
35018.346 700
35018.471 700
35018.612 709
.
.
..
.
.
For a fixed time interval of such a time series and its derivative2we compute the following nine numeric standard
features.
minimum, 1. quartile, median, 3. quartil, maximum, arithmetic mean, standard deviation, skewness,
kurtosis
Skewness is a figure for the deviation of the empirical density from an axially symmetric function. Kurtosis tells
us how peaked the density is, i.e. how close the density is to a delta function.
Furthermore, we compute the periodogram /4/. Similar to the discrete Fourier transform of a discrete time series,
the periodogram is an estimator for the spectral density of the potentially infinitely long time series based on a
finite vector of values x1, . . . , xn. The spectral density represents the power density of all frequencies νjthat
occur in the time series. The periodogram ˆ
I(νj)is defined as
ˆ
I(νj) = 1
n
n
X
t=1
e2πitνjxt
2
=1
n
n
X
t=1
xtcos(2πtνj)!2
+ n
X
t=1
xtsin(2πtνj)!2
(1)
for νj=j/n and j= 0, . . . , n 1. Since the absolute time values are not relevant here, the formula can be
simplified to
I(j) = ˆ
I(j/n) = 1
n
n
X
t=1
xtcos(2πtj/n)!2
+ n
X
t=1
xtsin(2πtj/n)!2
(2)
As the values xtare real, the periodogram is symmetric with respect to j= (n1)/2. Therefore we need to
consider the values I(0), . . . , I(b(n1)/2c)only. With nvalues per data block, at most n/2frequencies may
occur in the periodogram. If e.g. 400 values cover a time interval of 50 seconds, the maximal frequency is
200
50 sec. = 4 Hertz.
Theoretically the 200 values of the periodogram could be used as additional features. However, this does not
make much sense, since many of the features are either correlated or zero. Thus, we use principal component
analysis (PCA) to compress these 200 values to the four3principal components (/1, 5/). These four principal
components are then directly used as features. Together with the two times nine statistical features this yields a
set of 22 features.
2The derivative is calculated numerically from two adjacent values.
3Why do we use four principal components? This figure was determined with PCA by comparing the eigenvalues of the covariance matrix
of the periodograms of a air-flow time series of a typical industrial pneumatic system. It turned out that all eigenvalues except the first four
can be neglected.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
3 One-Class-Learning
As already mentioned, in many industrial applications for training a diagnostic classifier no negative but abundant
positive data are available. Thus, we need a learning algorithm that can be trained on only positive data from the
normal operation of the pneumatic system. After learning, however, the classifier must be able to distinguish
between normal operation and error states of the system.
One-Class-Learning as a sub-area of supervised learning offers a number of algorithms as a solution for this task.
In statistics, these and related techniques are known as outlier detection /6/. We selected a modified version of
nearest neighbour data description (NNDD) due to its simplicity and high accuracy /2/.
NNDD is a lazy-learning technique. It is called “lazy” because during learning the feature vectors are only
normalized and stored. The actual nearest-neighbour algorithm is not used until new feature vectors have to be
classified. The normalization of all features ensures equal a priori weight for all features. Without normalization
a feature with a range of [10000,+10000] dominates the classification compared to one with range [0,104].
Therefore all features are first scaled linearly to the interval [0,1].
3.1 Nearest Neighbour Data Description (NNDD)
Given a training set X= (x1,...,xn)of nfeature vectors, a new vector qis accepted if
D(q,NN(q)) γ¯
D, (3)
that is, if its distance to a nearest neighbour is not greater than a threshold γ¯
D. This is a modification of the
NNDD algorithm presented in /2/. D(x,y)is a metric. We use the euclidian norm and define
D(x,y) := kxyk2.
The function
NN(q) = argmin
xX
{D(q,x)}
determines the nearest neighbour of qin Xand
¯
D=1
n
n
X
i=1
D(xi,NN(xi))
is the distance from each point to its nearest neighbour averaged over all points in X. In Equation 3 we could
use γ= 1, but with suboptimal classification results. We determine γvia cross validation (Section 3.2) such that
the error rate of the NNDD-classifier is minimized. In Figure 3 for a set of two dimensional data points (black)
the areas of positive classification are colored in red.
3.2 Cross validation
Cross validation is used to vary the parameter γ(or some other parameter of the learning algorithm), train the
algorithm on a set Xof training data and then test the trained classifier on an independent set of test data. Finally,
γwill be set to the value that minimizes the error on the test data. The k-fold crossvalidation algorithm works as
follows:
1. Partition the set of data vectors into kblocks of about equal size X=X1. . . Xk.
2. For all parameter values in a certain set (e.g. γ= 0 . . . 100)
For i= 1 to k
Train on X\Xi, evaluate error on Xi.
Compute the average error over all kruns.
3. Select the smallest value for γ=γ1with minimum error.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
NNDD with γ= 0.5
Figure 3: On a set of 16 two dimensional data points (black dots) we applied NNDD and classified 10000 random
points. The red points belong to the positive class (the class of the training data) and the blue ones to the negative
class.
For each value of γ, first the data set Xis split in kblocks of equal size. Then the algorithm is repeatedly trained
on all ksubsets of k1blocks and tested on the remaining block. The error is averaged over the ksubsets and
γ=γ1with minimal error is selected.
4 Experiments
In order to evaluate our OCL algorithm, we need a pneumatic system which we can run in normal mode and in
different error modes with defined errors. This was not possible on real production systems. Therefore we used
the laboratory system shown in Figure 4 which can work in two different modes. In the fill-mode the rotatable
arm grabs a cup from the hole on the left, fills it with water, then closes it with a lid and puts it on the carousel to
the right. This is being repeated until the carousel is filled with four cups. In the empty-mode repeatedly all cups
are taken from the carousel, drained into the hole (rear middle) and then disposed into the hole to the left.
Figure 4: The laboratory system. A video of the system in operation can be found on http://www.hs- weingarten.
de/~ertel/labdev3440917.avi.
On this system we recorded data. In fill-mode and empty-mode air-flow time series in normal operation and with a
number of different faults were recorded. In Figure 5 air-flow measurements in normal mode (top) and with small
leak (bottom) are shown. The figure shows significant differences between two recordings in normal operation
(top left versus top right), i.e. the operation is not deterministic. It also shows the differences between normal
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
Nr. Class Name
2: air-flow: 1. quartile
3: median
4: 3. quartile
5: maximum
6: arithmet. mean
7: standard deviation
8: skewness
9: kurtosis
Nr. Class Name
15: first derivative of air-flow: arithmet. mean
16: standard deviation
17: skewness
18: kurtosis
19: PCA of periodogram: 1st principal component
20: 2nd principal component
21: 3rd principal component
22: 4th principal component
Table 1: The 16 features used in the experiments.
operation (top) and operation with small leak (bottom). For the human observer it is not easy to distinguish
normal operation and small leak from these air-flow patterns.
Air-flow time series in fill-mode, normal operation.
Air-flow time series in fill-mode with small leak in module 1.
Figure 5: Air-flow measurements in normal mode (top) and with small leak in module 1 (bottom). Please note
the differences between two recordings in normal operation (top) and the differences between normal operation
and operation with small leak (top versus bottom).
4.1 Evaluation method
For training the OCL algorithm we used data from normal operation only (Figure 5 top). For our experiments we
used the subset of 16 features shown in Table 1 out of the 22 features defined above. We removed some of the
features because they turned out to be redundant.
The block-length of a time interval for computing a feature vector was set to approximately one period of the
machines operation. This block-length is a parameter that has to be set manually. Our experiments show that the
results are not very sensitive to this parameter as long as it is not too small. As a heuristic rule, the block length
should ideally be the length of one period. If there is no obvious perdiodicity, the block length should be longer
than about three times a period. Too short block-length can lead to bad statistics.
In the following experiments we set the block-length to 226 time-steps (28,25 seconds) in fill-mode and to 184
time-steps (23 seconds) in empty-mode. We collected fault free data in fill-mode and in empty-mode and data
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
with seven types of fault implemented in the system. These faults were produced by small or big leaks in two
different modules of the system or by placing a throttle valve in two different modules causing a delay of the
system operation. In some experiments we added combinations of these faults to the system.
To evaluate the classification accuracy we used the error measures
FP =relative number of false positive classifications =| negative data with positive classification |
| negative data |
and
FN =relative number of false negative classifications =| positive data with negative classification |
| positive data |
The obvious goal is to achieve FN=0and FP=0. Since a false negative classification may lead to a needless
shutdown of the system, for the operator of the system this type of error typically is much more expensive than a
false positive classification, i.e. the ignorance of a fault.4
4.2 Results
To judge the quality of the classifier we implemented various faults in the laboratory system and recorded the
air-flow time series. The various recorded files together with the number of blocks are listed in Table 2. From
each block we then compute one feature vector.
fill-mode empty-mode
data type n[blocks] n[blocks]
fault free 50 46
leak 1 16 17
leak 2 16 17
small leak module 1 16 17
delay module 1 21 22
small leak module 2 16 17
delay module 2 19 20
small leak module 1, delay module 2 18
Table 2: The data sets with different types of faults and their size nmeasured in blocks. For each block a feature
vector was computed.
First we trained on the fault free data and minimized the threshold γ=γ1by crossvalidation such that barely all
training data get a positive classification. Then we performed tests on the data with different types of error. The
surprisingly good results are listed in Table 3. All positive (training data) and all negative data (test data) were
correctly classified, that is FP =FN = 0. To judge the safety of the classification, we increased the threshold
γuntil the first negative data point was classified as positive (false positive). This value γ2is now related to γ1.
The bigger γ21, the safer is the classification. The ratio γ21corresponds to the width of the margin between
the sets of positive and negative data in 16-dimensional feature space. In all tests except those with small leak in
module 1 this ratio is above two. This means that NNDD classifies very safely.
To judge the classification accuracy, please note the tiny differences between the air-flow time series in normal
operation and the time series with small leak in module 1 in Figure 5 (top two graphs versus the two graphs on
the bottom). There is almost no difference in the graphs, but NNDD can separate the classes.
In order to make the learning task harder, we combined fault free data with data from runs with various faults,
trained the classifier on them and then tested the accuracy on data with different faults. We wanted to know
whether the classifier for example is able to discriminate between two different leaks. The results are listed in
Table 4. The data used for training and test are described in the first and second column. Again NNDD can
4This holds of course only for faults that do not damage the system.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
fill-mode empty-mode
FP FN γ21FP FN γ21
fault free 0 γ1= 1.6 0 γ1= 1.7
leak 1 0 0 7.25 0 0 10.0
leak 2 0 0 8.38 0 0 8.24
small leak module 1 0 0 1.25 0 0 3.05
delay module 1 0 0 3.38 0 0 2.88
small leak module 2 0 0 3.31 0 0 4.65
delay module 2 0 0 7.19 0 0 3.24
small leak module 1 and delay module 2 0 0 2.38
Table 3: Results of the NNDD classifier trained on the fault free data and tested on data generated with the
different faults in the system.
perfectly separate all data sets with zero error. According to the high ratio of γ21in row two of the table, the
task of separating leaks 1 and 2 seems to be quite easy for NNDD. The even more complex task in row three
seems to be quite hard, but still can be solved with zero error.
training data (positive class) test data (negative class) FP FN γ21
fill-mode fault free, empty-mode fault
free
fill-mode and empty-mode, both with
leak 1 und leak 2
0 0 3.5
fill-mode leak 1, empty-mode leak 1 fill-mode leak 2, empty-mode leak 2 0 0 6.32
fault free, leak 1, fill-mode and empty-
mode
leak 2, fill-mode and empty-mode 0 0 1.21
Table 4: Results of the NNDD classifier on the laboratory system. Training on various mixed data sets, test on
different mixed data sets.
5 Summary and Conclusion
This interdisciplinary research between artificial intelligence and mechanical engineering has shown a promising
way towards model free adaptive diagnosis of arbitrary pneumatic systems. But this is not all. It can be applied
to many other systems and machines. For example for the diagnosis of electrical systems, the air-flow time series
simply has to be replaced by the electrical current.
Many questions have been answered. Others are still open and there are new questions and work to do such as:
Experiments on real production systems must be performed. The presented algorithms have to be implemented
and tested on an embedded computer in the E2
Mmodule. Other algorithms have to be compared with NNDD.
And the feature set has to be optimized via cross validation.
Finally the authors want to thank Festo and the state of Baden-Württemberg for funding a sabbatical of the first
author during which this work was done.
The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany
Nomenclature
Variable Description Unit
XTraining set 1
xfeature vector 1
qfeature vector (query) 1
DDistance Metric 1
¯
Daverage nearest neighbour distance 1
γclassification threshold 1
γ1minimal classification threshold 1
γ2maximal classification threshold 1
References
/1/ C.M. Bishop. Pattern recognition and machine learning. Springer New York:, 2006.
/2/ D.M.J. Tax. One-class classification. PhD thesis, Delft University of Technology, 2001. http://resolver.
tudelft.nl/uuid:e588fc3e-7503-4013-9b6a-73c7b7f6b173.
/3/ W. Ertel. Introduction to Artificial Intelligence. Springer-Verlag, London, 2011.
/4/ P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer, 2009.
/5/ W. Ertel. Advanced mathematics for engineers. Lecture notes, Hochschule Ravensburg-Weingarten, 2013.
http://www.hs-weingarten.de/~ertel/vorlesungen/mae/matheng-skript-1314.pdf.
/6/ V. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intelligence Review,
22(2):85–126, 2004.
... Another important remark about this method is its independence of several aspects related to grasping, such as, object properties as mass, shape, texture or geometry and robotic variables, such as speed, acceleration, etc. Additionally, according to (Ertel, Lehmann, Medow, Finkbeiner, & Meyer, 2014), using the air flow analogously as this paper used the motor current, this method could also be tested on pneumatic driven gripper systems used in the industry. ...
Article
Full-text available
Grasping is one of the most common tasks related to robotics and manipulation which has received an extensive amount of contributions from the research community. From a design point of view, the robotic gripper systems are generally manufactured using a significant amount of small moving parts, in order to establish a balance between size, weight and performance. This balance leads to designs and components that are less robust than those of, for example, pneumatic grippers. To the best of our knowledge, most of the literature related to robotic grasping concentrates and focuses on grasping from a cognitive perspective. However, in order to ensure the execution of grasping tasks over extended periods of time, reducing down times and increasing gripper availability, even in demanding scenarios without access to maintenance, other phenomena such as component tear and degradation have to be monitored and analysed. This paper proposes an unsupervised learning model based approach for the estimation of the degradation states and the detection of abnormal working conditions of the actuator components for a class of robotic anthropomorphic hand. The approach allows an easy implementation and establishes the basis for the development of remaining useful life estimation algorithms for the components of other gripper systems. Our proposed architecture consists of an automatic degradation estimator and working condition detector, based on an unsupervised model combining K-means and Gaussian Mixture Models. The model estimates the hand's actuators degradation and determines its working condition from the online data collected during grasping tasks considering different objects. The proposed method was experimentally tested on a real Schunk SVH Hand used to assist humans during the assembly process in the automobile industry.
Article
Full-text available
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.