Content uploaded by Wolfgang Ertel

Author content

All content in this area was uploaded by Wolfgang Ertel on Feb 04, 2017

Content may be subject to copyright.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

Model Free Diagnosis of Pneumatic Systems using Machine Learning

Wolfgang Ertel∗, Robin Lehmann∗,∗∗, Ralf Medow∗∗, Matthias Finkbeiner∗∗ , Andreas Meyer∗∗

Institute for Artiﬁcial Intelligence, Hochschule Ravensburg-Weingarten∗

http://iki.hs-weingarten.de

Festo AG & Co. KG, Esslingen Berkheim∗∗

http://www.festo.de

E-Mail: ertel@hs-weingarten.de

We address the task of model free fault detection in arbitrary pneumatic systems based on continuous air ﬂow

measurements and present a universal diagnostic module that treats the pneumatic system as a blackbox. This

module can be applied to arbitrarily complex systems for which no mathematical models exist. We use machine

learning algorithms for acquiring the diagnostic knowledge. The diagnostic module is trained on air-ﬂow data

of the pneumatic system in normal operation using the one-class-learning algorithm neighbour-data-description

(NNDD). We achieve excellent classiﬁcation results with zero error rate on a real pneumatic system.

Keywords: Model free diagnosis, machine learning, pattern matching, pneumatic systems, airﬂow.

Target Audience: Condition Monitoring and Diagnosis, Energy Management, Simulation and Validation.

1 Introduction

High energy costs, efﬁciency requirements and awareness of climate change make energy efﬁciency a core task

for industrial businesses. In this context, monitoring and diagnosis of relevant parameters has high priority. Festo

is currently developing an energy efﬁciency module (short E2

M). The module (a prototype is shown in Figure 1)

can be attached to the air-supply pipe of an arbitrary pneumatic system. It continually monitors the volume

ﬂow rate of the supply air and two pressure values. Based on these measurements it has to minimize the air

consumption with the goal of saving energy. The current version of the E2

Mswitches off the system as soon as

the air-ﬂow falls below a certain threshold. This simple threshold decision shall now be replaced by an intelligent

classiﬁer that, based on the air-ﬂow pattern, detects whether the behaviour of the system deviates from its normal

operation. In case of a deviation from the norm, an appropriate action, like e.g. notiﬁcation of the human operator

is taken.

Figure 1: The E2

Mcomprising a control unit, sensors for air-ﬂow and pressure, two valves and a sound absorber.

A classical approach for developing a diagnostic algorithm would be to model the behaviour of the system, for

example by means of a simulation or with mathematical formulas. A different approach uses simple sensors such

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

as mechanical switches for detecting the end positions of a pneumatic cylinder in as many parts of the system as

possible. Unfortunately, typically mathematical modelling is impossible for complex systems and monitoring of

individual parts is very expensive. Even for simple systems this requires a human effort of many person-months

or years.

As an important requirement, the improved E2

Mwith the intelligent classiﬁer must allow for an easy integration

in an arbitrary pneumatic system. Thus, we pursue an inductive approach. The E2

M“gets to know” the pneu-

matic system with statistical methods or machine learning algorithms and as a consequence is able to classify its

states of operation into good and bad.1For this purpose it must be able to detect deviations of the actual air-ﬂow

pattern from that of normal operation. In addition (for lack of a mathematical model) the required knowledge

for this binary classiﬁcation task has to be determined from operation data of the system. For such a scenario

machine learning (a sub-area of artiﬁcial intelligence) techniques provide the appropriate means /1/.

A particular challenge in this project was the following requirement from Festo. During the learning phase, when

the E2

Macquires its knowledge about the system to be monitored, it shall get access to states of the system

during normal operation as well as to faulty, i.e. abnormal, operation modes. It turned out however, that this is

not feasible and thus the E2

Mcan only be trained with data from normal operation. The reasons for this lack of

faulty training data are:

•Collecting training data of faulty operation on a real pneumatic production system is typically too expensive

because of costly downtime of the system for each error to be monitored.

•Pneumatic production systems are typically unique for a certain task in only one particular company and

no engineer has enough insight into the potentially inﬁnite set of possible faults that might occur in such a

system. Thus, selecting only a few faults ad hoc normally leads to a non-representative statistical distribu-

tion of errors that the E2

Mlearns. This can lead to bad classiﬁcation results. In such a scenario it is much

better to only train the diagnostic module with data from the normal operation.

Therefore during the learning phase, the learning algorithm has no access to faulty operation modes of the system.

But after learning, it has to distinguish between normal and faulty operation. As a solution for this problem we

will apply a modiﬁcation of the so called one-class-learning (OCL) algorithm nearest neighbour-data-description

(NNDD) /2/.

The diagnostic module presented here only gets data from an air-ﬂow sensor. This sensor monitors the total

air-ﬂow through the system. Repeatedly, after a ﬁxed time-interval, a ﬁxed length time series of air-ﬂow values

is used to calculate a vector of numeric features as input for the OCL classiﬁcation task. Then a feature based

classiﬁer is trained with OCL (e.g. during the installation of a new pneumatic system) to detect deviations of the

air-ﬂow pattern from that of the normal operation.

2 Feature based Classiﬁcation

If the air-ﬂow time series would be a deterministic periodic function of time, classiﬁcation of the measured time

series into normal and abnormal behaviour of the monitored system would be as easy as computing the norm of

the difference of two functions. However, this assumption is not valid. As can be seen on Figure 2, the air-ﬂow

curve shows a certain pattern, but it is far from periodic. Thus, we have to use different techniques.

We decided to use classical machine learning algorithms to train a feature based classiﬁer that has to decide

whether the observed air-ﬂow pattern deviates from the normal pattern which was trained during an initial oper-

ation of the system. In the ﬁrst step such a feature based classiﬁer computes a vector of numeric features from

a ﬁnite interval of the air-ﬂow time series. The features can be statistical parameters such as mean, standard

deviation etc.. In the second step the classiﬁer uses the feature vector to output the class of the observed pattern.

As an extension, instead of a binary classiﬁcation into good and bad patterns, the classiﬁer can be trained to

output the type of fault in the observed system. Possible faults in a pneumatic system could be:

1Although the experiments presented here are restricted to this binary classiﬁcation task, the method used can easily be generalized to

more than two classes.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

500

700

900

35000 35050 35100 35150

airflow [litres/min]

time [sec]

400

600

800

45100 45150 45200 45250

airflow [litres/min]

time [sec]

Figure 2: Air-ﬂow time series of a real production system in normal fault free operation. Instead of a deterministic

periodic behaviour certain fuzzy patterns appear in the graphs.

Leakage: Leakage in central parts of the system or in the air-supply pipe leads to permanently increased air-ﬂow.

Leaks in particular sub-circuits lead to temporary deviations in the air-ﬂow pattern.

Blocked Cylinder

Wearing of particular parts or the whole system.

Speed variations: The whole system or parts of it perform faster or slower as normally.

...

Please note, however, that such a classiﬁcation of the fault type causes a much higher effort for the operator of

the system because he has to provide training data for all classes to be distinguished, i.e. for all possible faults of

the system. Even though a classiﬁcation of faults into many categories is highly desirable, in this ﬁrst study, we

restrict our experiments to binary classiﬁcation.

2.1 Model free diagnosis of industrial production systems

There exist numerous machine learning algorithms for training a classiﬁer based on a ﬁle of training data. On

input of such a ﬁle, the classiﬁer generates a function that can distinguish two or more classes /3/. Among the well

known supervised learning algorithms we ﬁnd neural networks, decision tree learning, the pseudoinverse method

which is equivalent to linear least squares regression, nearest neighbour methods and support vector machines.

For all supervised learning algorithms, the training data need to be labeled. This means, a numeric class label has

to be attached to every feature vector. Thus, for fault diagnosis of a technical system, at every point in time in the

training data, the state of the system (good or faulty) has to be provided, either automatically or manually.

In industrial routine, training of the classiﬁer may happen during initiation of the system or later on demand. Cap-

turing training data for the normal fault free operation is easy. However, as already mentioned in the introduction,

collection of training data of a system with errors is problematic.

Therefore we decided not to use classical two (or more) class classiﬁcation algorithms. Rather, we applied one-

class-learning algorithms (Section 3) which get along with data of one class only during training. This solves our

problem. We can train the classiﬁer with only data of the normal operation of the system. No data of faulty states

are required.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

2.2 Features

The air-ﬂow time series is a list of pairs of numbers (ti, xi)i=1,... with tia time value and xithe air-ﬂow value at

that time. The values of a short time interval in the curves in Figure 2 are:

time [sec] air-ﬂow [l/min]

.

.

..

.

.

35017.846 694

35017.971 685

35018.111 685

35018.221 689

35018.346 700

35018.471 700

35018.612 709

.

.

..

.

.

For a ﬁxed time interval of such a time series and its derivative2we compute the following nine numeric standard

features.

minimum, 1. quartile, median, 3. quartil, maximum, arithmetic mean, standard deviation, skewness,

kurtosis

Skewness is a ﬁgure for the deviation of the empirical density from an axially symmetric function. Kurtosis tells

us how peaked the density is, i.e. how close the density is to a delta function.

Furthermore, we compute the periodogram /4/. Similar to the discrete Fourier transform of a discrete time series,

the periodogram is an estimator for the spectral density of the potentially inﬁnitely long time series based on a

ﬁnite vector of values x1, . . . , xn. The spectral density represents the power density of all frequencies νjthat

occur in the time series. The periodogram ˆ

I(νj)is deﬁned as

ˆ

I(νj) = 1

n

n

X

t=1

e2πitνjxt

2

=1

n

n

X

t=1

xtcos(2πtνj)!2

+ n

X

t=1

xtsin(2πtνj)!2

(1)

for νj=j/n and j= 0, . . . , n −1. Since the absolute time values are not relevant here, the formula can be

simpliﬁed to

I(j) = ˆ

I(j/n) = 1

n

n

X

t=1

xtcos(2πtj/n)!2

+ n

X

t=1

xtsin(2πtj/n)!2

(2)

As the values xtare real, the periodogram is symmetric with respect to j= (n−1)/2. Therefore we need to

consider the values I(0), . . . , I(b(n−1)/2c)only. With nvalues per data block, at most n/2frequencies may

occur in the periodogram. If e.g. 400 values cover a time interval of 50 seconds, the maximal frequency is

200

50 sec. = 4 Hertz.

Theoretically the 200 values of the periodogram could be used as additional features. However, this does not

make much sense, since many of the features are either correlated or zero. Thus, we use principal component

analysis (PCA) to compress these 200 values to the four3principal components (/1, 5/). These four principal

components are then directly used as features. Together with the two times nine statistical features this yields a

set of 22 features.

2The derivative is calculated numerically from two adjacent values.

3Why do we use four principal components? This ﬁgure was determined with PCA by comparing the eigenvalues of the covariance matrix

of the periodograms of a air-ﬂow time series of a typical industrial pneumatic system. It turned out that all eigenvalues except the ﬁrst four

can be neglected.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

3 One-Class-Learning

As already mentioned, in many industrial applications for training a diagnostic classiﬁer no negative but abundant

positive data are available. Thus, we need a learning algorithm that can be trained on only positive data from the

normal operation of the pneumatic system. After learning, however, the classiﬁer must be able to distinguish

between normal operation and error states of the system.

One-Class-Learning as a sub-area of supervised learning offers a number of algorithms as a solution for this task.

In statistics, these and related techniques are known as outlier detection /6/. We selected a modiﬁed version of

nearest neighbour data description (NNDD) due to its simplicity and high accuracy /2/.

NNDD is a lazy-learning technique. It is called “lazy” because during learning the feature vectors are only

normalized and stored. The actual nearest-neighbour algorithm is not used until new feature vectors have to be

classiﬁed. The normalization of all features ensures equal a priori weight for all features. Without normalization

a feature with a range of [−10000,+10000] dominates the classiﬁcation compared to one with range [0,10−4].

Therefore all features are ﬁrst scaled linearly to the interval [0,1].

3.1 Nearest Neighbour Data Description (NNDD)

Given a training set X= (x1,...,xn)of nfeature vectors, a new vector qis accepted if

D(q,NN(q)) ≤γ¯

D, (3)

that is, if its distance to a nearest neighbour is not greater than a threshold γ¯

D. This is a modiﬁcation of the

NNDD algorithm presented in /2/. D(x,y)is a metric. We use the euclidian norm and deﬁne

D(x,y) := kx−yk2.

The function

NN(q) = argmin

x∈X

{D(q,x)}

determines the nearest neighbour of qin Xand

¯

D=1

n

n

X

i=1

D(xi,NN(xi))

is the distance from each point to its nearest neighbour averaged over all points in X. In Equation 3 we could

use γ= 1, but with suboptimal classiﬁcation results. We determine γvia cross validation (Section 3.2) such that

the error rate of the NNDD-classiﬁer is minimized. In Figure 3 for a set of two dimensional data points (black)

the areas of positive classiﬁcation are colored in red.

3.2 Cross validation

Cross validation is used to vary the parameter γ(or some other parameter of the learning algorithm), train the

algorithm on a set Xof training data and then test the trained classiﬁer on an independent set of test data. Finally,

γwill be set to the value that minimizes the error on the test data. The k-fold crossvalidation algorithm works as

follows:

1. Partition the set of data vectors into kblocks of about equal size X=X1∪. . . ∪Xk.

2. For all parameter values in a certain set (e.g. γ= 0 . . . 100)

•For i= 1 to k

–Train on X\Xi, evaluate error on Xi.

•Compute the average error over all kruns.

3. Select the smallest value for γ=γ1with minimum error.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

NNDD with γ= 0.5

Figure 3: On a set of 16 two dimensional data points (black dots) we applied NNDD and classiﬁed 10000 random

points. The red points belong to the positive class (the class of the training data) and the blue ones to the negative

class.

For each value of γ, ﬁrst the data set Xis split in kblocks of equal size. Then the algorithm is repeatedly trained

on all ksubsets of k−1blocks and tested on the remaining block. The error is averaged over the ksubsets and

γ=γ1with minimal error is selected.

4 Experiments

In order to evaluate our OCL algorithm, we need a pneumatic system which we can run in normal mode and in

different error modes with deﬁned errors. This was not possible on real production systems. Therefore we used

the laboratory system shown in Figure 4 which can work in two different modes. In the ﬁll-mode the rotatable

arm grabs a cup from the hole on the left, ﬁlls it with water, then closes it with a lid and puts it on the carousel to

the right. This is being repeated until the carousel is ﬁlled with four cups. In the empty-mode repeatedly all cups

are taken from the carousel, drained into the hole (rear middle) and then disposed into the hole to the left.

Figure 4: The laboratory system. A video of the system in operation can be found on http://www.hs- weingarten.

de/~ertel/labdev3440917.avi.

On this system we recorded data. In ﬁll-mode and empty-mode air-ﬂow time series in normal operation and with a

number of different faults were recorded. In Figure 5 air-ﬂow measurements in normal mode (top) and with small

leak (bottom) are shown. The ﬁgure shows signiﬁcant differences between two recordings in normal operation

(top left versus top right), i.e. the operation is not deterministic. It also shows the differences between normal

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

Nr. Class Name

2: air-ﬂow: 1. quartile

3: median

4: 3. quartile

5: maximum

6: arithmet. mean

7: standard deviation

8: skewness

9: kurtosis

Nr. Class Name

15: ﬁrst derivative of air-ﬂow: arithmet. mean

16: standard deviation

17: skewness

18: kurtosis

19: PCA of periodogram: 1st principal component

20: 2nd principal component

21: 3rd principal component

22: 4th principal component

Table 1: The 16 features used in the experiments.

operation (top) and operation with small leak (bottom). For the human observer it is not easy to distinguish

normal operation and small leak from these air-ﬂow patterns.

Air-ﬂow time series in ﬁll-mode, normal operation.

Air-ﬂow time series in ﬁll-mode with small leak in module 1.

Figure 5: Air-ﬂow measurements in normal mode (top) and with small leak in module 1 (bottom). Please note

the differences between two recordings in normal operation (top) and the differences between normal operation

and operation with small leak (top versus bottom).

4.1 Evaluation method

For training the OCL algorithm we used data from normal operation only (Figure 5 top). For our experiments we

used the subset of 16 features shown in Table 1 out of the 22 features deﬁned above. We removed some of the

features because they turned out to be redundant.

The block-length of a time interval for computing a feature vector was set to approximately one period of the

machines operation. This block-length is a parameter that has to be set manually. Our experiments show that the

results are not very sensitive to this parameter as long as it is not too small. As a heuristic rule, the block length

should ideally be the length of one period. If there is no obvious perdiodicity, the block length should be longer

than about three times a period. Too short block-length can lead to bad statistics.

In the following experiments we set the block-length to 226 time-steps (28,25 seconds) in ﬁll-mode and to 184

time-steps (23 seconds) in empty-mode. We collected fault free data in ﬁll-mode and in empty-mode and data

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

with seven types of fault implemented in the system. These faults were produced by small or big leaks in two

different modules of the system or by placing a throttle valve in two different modules causing a delay of the

system operation. In some experiments we added combinations of these faults to the system.

To evaluate the classiﬁcation accuracy we used the error measures

FP =relative number of false positive classiﬁcations =| negative data with positive classiﬁcation |

| negative data |

and

FN =relative number of false negative classiﬁcations =| positive data with negative classiﬁcation |

| positive data |

The obvious goal is to achieve FN=0and FP=0. Since a false negative classiﬁcation may lead to a needless

shutdown of the system, for the operator of the system this type of error typically is much more expensive than a

false positive classiﬁcation, i.e. the ignorance of a fault.4

4.2 Results

To judge the quality of the classiﬁer we implemented various faults in the laboratory system and recorded the

air-ﬂow time series. The various recorded ﬁles together with the number of blocks are listed in Table 2. From

each block we then compute one feature vector.

ﬁll-mode empty-mode

data type n[blocks] n[blocks]

fault free 50 46

leak 1 16 17

leak 2 16 17

small leak module 1 16 17

delay module 1 21 22

small leak module 2 16 17

delay module 2 19 20

small leak module 1, delay module 2 18 –

Table 2: The data sets with different types of faults and their size nmeasured in blocks. For each block a feature

vector was computed.

First we trained on the fault free data and minimized the threshold γ=γ1by crossvalidation such that barely all

training data get a positive classiﬁcation. Then we performed tests on the data with different types of error. The

surprisingly good results are listed in Table 3. All positive (training data) and all negative data (test data) were

correctly classiﬁed, that is FP =FN = 0. To judge the safety of the classiﬁcation, we increased the threshold

γuntil the ﬁrst negative data point was classiﬁed as positive (false positive). This value γ2is now related to γ1.

The bigger γ2/γ1, the safer is the classiﬁcation. The ratio γ2/γ1corresponds to the width of the margin between

the sets of positive and negative data in 16-dimensional feature space. In all tests except those with small leak in

module 1 this ratio is above two. This means that NNDD classiﬁes very safely.

To judge the classiﬁcation accuracy, please note the tiny differences between the air-ﬂow time series in normal

operation and the time series with small leak in module 1 in Figure 5 (top two graphs versus the two graphs on

the bottom). There is almost no difference in the graphs, but NNDD can separate the classes.

In order to make the learning task harder, we combined fault free data with data from runs with various faults,

trained the classiﬁer on them and then tested the accuracy on data with different faults. We wanted to know

whether the classiﬁer for example is able to discriminate between two different leaks. The results are listed in

Table 4. The data used for training and test are described in the ﬁrst and second column. Again NNDD can

4This holds of course only for faults that do not damage the system.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

ﬁll-mode empty-mode

FP FN γ2/γ1FP FN γ2/γ1

fault free – 0 γ1= 1.6– 0 γ1= 1.7

leak 1 0 0 7.25 0 0 10.0

leak 2 0 0 8.38 0 0 8.24

small leak module 1 0 0 1.25 0 0 3.05

delay module 1 0 0 3.38 0 0 2.88

small leak module 2 0 0 3.31 0 0 4.65

delay module 2 0 0 7.19 0 0 3.24

small leak module 1 and delay module 2 0 0 2.38 – – –

Table 3: Results of the NNDD classiﬁer trained on the fault free data and tested on data generated with the

different faults in the system.

perfectly separate all data sets with zero error. According to the high ratio of γ2/γ1in row two of the table, the

task of separating leaks 1 and 2 seems to be quite easy for NNDD. The even more complex task in row three

seems to be quite hard, but still can be solved with zero error.

training data (positive class) test data (negative class) FP FN γ2/γ1

ﬁll-mode fault free, empty-mode fault

free

ﬁll-mode and empty-mode, both with

leak 1 und leak 2

0 0 3.5

ﬁll-mode leak 1, empty-mode leak 1 ﬁll-mode leak 2, empty-mode leak 2 0 0 6.32

fault free, leak 1, ﬁll-mode and empty-

mode

leak 2, ﬁll-mode and empty-mode 0 0 1.21

Table 4: Results of the NNDD classiﬁer on the laboratory system. Training on various mixed data sets, test on

different mixed data sets.

5 Summary and Conclusion

This interdisciplinary research between artiﬁcial intelligence and mechanical engineering has shown a promising

way towards model free adaptive diagnosis of arbitrary pneumatic systems. But this is not all. It can be applied

to many other systems and machines. For example for the diagnosis of electrical systems, the air-ﬂow time series

simply has to be replaced by the electrical current.

Many questions have been answered. Others are still open and there are new questions and work to do such as:

Experiments on real production systems must be performed. The presented algorithms have to be implemented

and tested on an embedded computer in the E2

Mmodule. Other algorithms have to be compared with NNDD.

And the feature set has to be optimized via cross validation.

Finally the authors want to thank Festo and the state of Baden-Württemberg for funding a sabbatical of the ﬁrst

author during which this work was done.

The 9th International Fluid Power Conference, 9. IFK, March 24-26, 2014, Aachen, Germany

Nomenclature

Variable Description Unit

XTraining set 1

xfeature vector 1

qfeature vector (query) 1

DDistance Metric 1

¯

Daverage nearest neighbour distance 1

γclassiﬁcation threshold 1

γ1minimal classiﬁcation threshold 1

γ2maximal classiﬁcation threshold 1

References

/1/ C.M. Bishop. Pattern recognition and machine learning. Springer New York:, 2006.

/2/ D.M.J. Tax. One-class classiﬁcation. PhD thesis, Delft University of Technology, 2001. http://resolver.

tudelft.nl/uuid:e588fc3e-7503-4013-9b6a-73c7b7f6b173.

/3/ W. Ertel. Introduction to Artiﬁcial Intelligence. Springer-Verlag, London, 2011.

/4/ P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer, 2009.

/5/ W. Ertel. Advanced mathematics for engineers. Lecture notes, Hochschule Ravensburg-Weingarten, 2013.

http://www.hs-weingarten.de/~ertel/vorlesungen/mae/matheng-skript-1314.pdf.

/6/ V. Hodge and J. Austin. A survey of outlier detection methodologies. Artiﬁcial Intelligence Review,

22(2):85–126, 2004.