Page 1

This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted

PDF and full text (HTML) versions will be made available soon.

Ischemia Episode Detection in ECG using Kernel Density Estimation, Support

Vector Machine and Feature Selection

BioMedical Engineering OnLine 2012, 11:30doi:10.1186/1475-925X-11-30

Jinho Park (jinho@gist.ac.kr)

Witold Pedrycz (wpedrycz@ualberta.ca)

Moongu Jeon (mgjeon@gist.ac.kr)

ISSN

1475-925X

Article type

Research

Submission date

21 January 2012

Acceptance date

23 May 2012

Publication date

15 June 2012

Article URL

http://www.biomedical-engineering-online.com/content/11/1/30

This peer-reviewed article was published immediately upon acceptance. It can be downloaded,

printed and distributed freely for any purposes (see copyright notice below).

Articles in BioMedical Engineering OnLine are listed in PubMed and archived at PubMed Central.

For information about publishing your research in BioMedical Engineering OnLine or any BioMed

Central journal, go to

http://www.biomedical-engineering-online.com/authors/instructions/

For information about other BioMed Central publications go to

http://www.biomedcentral.com/

BioMedical Engineering OnLine

© 2012 Park et al. ; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 2

Ischemia Episode Detection in ECG using Kernel Density Esti-

mation, Support Vector Machine and Feature Selection

Jinho Park1, Witold Pedrycz2and Moongu Jeon1∗

1School of Information and Communications, Gwangju Institute of Science and Technology, 1 Oryong-dong, Buk-gu,

Gwangju, Republic of Korea

2Department of Electrical and Computer Engineering, University of Alberta, Canada and Systems Research Institute, Polish

Academy of Sciences, Warsaw, Poland

Email: Jinho Park - jinho@gist.ac.kr;

Witold Pedrycz - wpedrycz@ualberta.ca;

Moongu Jeon∗- mgjeon@gist.ac.kr;

∗Corresponding author

Abstract

Background

Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in

electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic

disease. To this end, we propose a new method, which employs wavelets and simple feature selection.

Methods

For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in

90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based

on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for

differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized

and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point.

We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers

to those features.

Results

We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods.

Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic

ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively.

The SVM classifier detects 355 ischemic ST episodes.

Conclusions

We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing

baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature

extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features

were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed

KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any

numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select

a single parameter.

1

Page 3

Keywords

Myocardial ischemia, Discrete wavelet transform, Kernel density estimation, Support vector machine, QRS

complex detection, ECG baseline wandering removal

Background

Coronary artery disease is one of the leading causes of death in modern world. This disease mainly results from

atherosclerosis and thrombosis, and it manifests itself as coronary ischemic syndrome [1].

When a patient experiences coronary ischemic syndrome, his or her electrocardiogram (ECG) shows some peculiar

appearances. Each segment of ECG can be divided into P, Q, R, S and T waves as shown in Figure 1 where QRS

complex and T wave represent ventricular depolarization and repolarization, respectively. In most cases of normal

ECG, the ST segment has the same electric potential as the PR segment. When myocardial ischemia is present,

however, the electric potential of the ST segment is elevated or depressed with respect to the potential of the PR

segment [1,2]. When ischemia occurs, the PR segment is altered, or the ST segment deviates from normal level. If

the PR segment moved instead of the ST segment, this looks as if the ST segment itself were modified. This is

because the PR segment provides a kind of reference voltage level [1].

Figure 1 Normal ECG and ST segment elevation. (a) Normal ECG is divided into P, Q, R, S and T parts. Q, R

and S parts are called QRS complex in total. (b) This ECG waveform shows ST segment elevation

The ST segment deviation is mainly due to injury current in myocardial cells [1]. If the coronary artery becomes

blocked by blood clot, some myocytes are affected to be unresponsive to depolarization, or to repolarize earlier than

adjacent myocytes. In this case, voltage gradient can occur in the myocytes, and this comes to appear as

ST-segment deviation in ECG [1]. Figure 2 shows two cases when the voltage level of the ST segment deviates

from its normal position. The left column of the figure shows the distribution of electric charges around myocytes

when the heart is in resting state. This is related to the PR segment in ECG. The right column shows the distribution

of electric charges right after the ventricles contracted. This is related to the QRS complex and the ST segment in

ECG. The shaded region represents the area being affected by myocardial ischemia. In the case of the upper row in

Figure 2, there is no voltage gradient at first. After the ventricles contracted, however, the voltage gradient comes to

appear because the injured area did not respond to electric depolarization. In the second case of the bottom row,

there is no voltage gradient right after the ventricles contracted. In the left figure, however, there was initial voltage

gradient, and this makes the PR segment to be modified. The PR segment acts as a reference voltage level when we

judge whether the ST segment deviated from normal position. The modified PR segment makes us conclude that

there was a ST segment deviation [1].

Figure 2 Cause of ST segment deviation [1]. Left column shows distribution of electric charges before the

ventricles contracts. The right column shows the charge distribution after the ventricles contracted. Shaded area

represents that the area was affected by ischemia

There are several approaches to detect ischemic ST deviations. Some researchers used the entropy. Rabbani et al.

used the fact that signal perturbation of normal people is lower than the perturbation of ischemic patients. They

computed entropy measure of wavelet subband of ECG signal, and classified the ECG by examining which signal

exhibited a more chaotic perturbation [3]. Lemire et al. calculated signal entropy at various frequency levels. They

computed the entropy in each wavelet scale [4]. Some used adaptive neuro-fuzzy inference system. Pang et al. used

Karhunen-Lo` eve transform to extract several feature values. They classified ECG signal by an adaptive neuro-fuzzy

inference system [5]. Tonekabonipour et al. used multi-layer perceptron and radial basis function to detect ischemic

episode. They classified ECG signals by adaptive neuro-fuzzy network [6]. There are many papers which used

artificial neural network. Stamkopoulos et al. used nonlinear principal component analysis to analyze complex

data. They classified ECG signal by radial basis function neural network [7]. Maglaveras et al. used neural network

optimized with a backpropagation algorithm [8]. Afsar et al. used Karhunen-Lo` eve transform to find feature values,

and classified an input ECG by using a neural network [9]. Papaloukas et al. used artificial neural network which

was trained by Bayesian regularization method [10]. There are papers studied some other approaches. Bulusu et al.

determined morphological features of ECG, and classified the ECG data by support vector machine. Andreao et al.

2

Page 4

used hidden Markov models to analyze ECG segments. They detected ischemia episode by using median filter and

linear interpolation [11]. Faganeli and Jager tried to distinguish ischemic ST episode and non-ischemic ST episode

caused by heart rate change. To this end, they computed heart rate values, Mahanalobis distance of

Karhunen-Lo` eve transform coefficients and Legendre orthonormal polynomial coefficients [12]. Exarchos et al.

used decision tree. They formed decision rules comprising specific thresholds, and developed a fuzzy model to

classify ischemic ECG signals [13]. Garcia et al. considered root mean square of difference between the input

signal and the average signal composed of first 100 beats. They adopted an adaptive amplitude threshold to classify

ECG signal [14]. Murugan and Radhakrishnan used ant-miner algorithm to detect ischemic ECG beats. They

calculated several feature values such as ST segment deviation from input ECG signal [15]. Bakhshipour et al.

analyzed coefficients resulted from wavelet transform. They examined the relative quotient of the coefficients at

each decomposition level of the wavelet transform [16].

We approached this problem by extracting feature values from a ECG waveform. We first found time positions of

QRS complexes, and then determined values of the three features. We calculated the feature values for each heart

beat, and averaged their values in five successive beats. After that, we classified them by the methods of kernel

density estimation and support vector machine.

We show techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete

wavelet transform. With these explicit methods of dealing with ECG, we could discriminate ischemic ST episode

from normal ECG. We did not adopt implicit methods such as artificial neural networks or decision trees, because

we considered it was important to utilize explicit features for processes of decision making. The artificial neural

network has a kind of black box nature in its hidden layers [17], and a decision tree is apt to include several

numerical thresholds [13].

Methods

Materials

We used the European ST-T database from Physionet. European ST-T database has 90 records which are

two-channel and each two hours in duration [18,19]. Each record in this database has a different number of ST

episodes. Overall there are 367 ischemic ST episodes in the database. Sampling frequency of each ECG data is 250

Hz.

We excluded 5 records because these had some problems. The records e0133, e0155, e0509, and e0611 had no

ischemic ST episodes. The record e0163 had so limited ST episode whose length was just 31 seconds.

Removing baseline wandering

The ST segments in ECG can be strongly affected by baseline wandering [20]. Main causes of the baseline

wandering are respiration and electrode impedance change due to perspiration [20,21]. The frequency content of

the baseline wandering is usually in a range below 0.5 Hz [20,21].

We use discrete wavelet transform to remove baseline wandering in ECG. We transform signal vector into two

sequences of coefficients, approximation and detail coefficients sequences [22]. We do this in each step in an

iterative fashion, until we get an input signal whose length is smaller than the length of the filter which characterizes

the wavelet. In our case, we used Daubechies8 wavelet with filter length of 8. The resulting approximation

coefficient sequence becomes the input signal to the next discrete wavelet transform as shown in Figure 3(a) [22].

Figure 3 Removing baseline wandering in ECG. (a) Discrete wavelet transform of ecg (n) to find coefficient

sequences hk(n),gk(n),gk−1(n),··· ,g1(n). The⃗0(n) means zero sequence. (b) Top: input ECG, ecg (n),

bottom: wandering baseline in ECG, baseline(n). (c) Top: ecg (n), bottom: ecg (n) − baseline(n). When k is (d)

too small or (e) too large, top: ecg (n), middle: baseline(n), bottom: ecg (n) − baseline(n)

In each step, the coefficient sequence implies a band of frequencies. If the sampling frequency of a discrete ECG

signal ecg (n) is x, we can determine a continuous and band-limited signal within frequency limits of[0,x

2

]by

Nyquist sampling theorem [23]. Therefore if we have transformed the input signal ecg (n) into the approximation

3

Page 5

coefficient sequence h1(n) and detail coefficient sequence g1(n), then the frequency content of g1(n) is fromx

x

2, and the frequency content of h1(n) is belowx

approximation coefficient sequence hk(n), and the detail coefficient sequences gk(n),gk−1(n),··· ,g1(n), the

frequency contents of gk(n),gk−1(n),··· ,g1(n) become

respectively [24,25].

4to

4. In this regard, if we have transformed the ecg (n) into the

[

x

2k+1,x

2k

]

,

[

x

2k,

x

2k−1

]

,··· ,

[

x

22,x

2

]

To remove baseline wandering, we should choose appropriate wavelet scale. We follow argument similar to that

presented by Arvinti et al. except that they used stationary wavelet transform instead of its discrete

counterpart [26]. We remove signal components whose frequency content is less than 1/2 Hz [20,21]. If we have

transformed the ECG signal ecg (n) into coefficient sequences hk(n),gk(n),gk−1(n),··· ,g1(n), the frequency

contents of hk(n) and gk(n) become0,

2k+1

we choose k as

becomes less than 1/2 Hz. Thus, we assign zero sequence⃗0(n) to all the detail coefficient sequences

gk(n),gk−1(n),··· ,g1(n), and calculate inverse transform of hk(n),⃗0(n),⃗0(n),··· ,⃗0(n) to form the

baseline(n) in the bottom of Figure 3(b). If we subtract baseline(n) from ecg (n), we obtain the flattened signal

like the one shown in Figure 3(c).

[

x

]

and

[

x

2k+1,x

2k

]

respectively, where x is the sampling frequency. If

x

2k+1≤1

2, k =⌈log2x⌉, the frequency content of the approximation coefficient sequence hk(n)

If we select a wrong wavelet scale k to find coefficient sequences of ecg (n), we obtain disappointing results. The

flattened signal in Figure 3(c) is obtained when k is⌈log2250⌉= 8, where 250 is the sampling frequency expressed

baseline(n), resulted from the inverse discrete wavelet transform of h4(n),⃗0(n),⃗0(n),··· ,⃗0(n). This middle

waveform is too detailed, so the bottom waveform ecg (n) − baseline(n) was negatively affected. When we select

k = 12, see Figure 3(e), the bottom waveform was not different from the input waveform ecg (n).

in Hz. When select k = 4 to use h4(n),g4(n),··· ,g1(n), we obtain a plot in Figure 3(d). The middle waveform,

We adopt a discrete wavelet transform to retain the details of the ECG waveform because filtering by some cut-off

frequency can deteriorate the quality of the ECG waveforms [27].

Detecting QRS complexes

We have to select an appropriate wavelet scale to capture proper time positions of QRS complexes. We will deal

with only the flattened ECG waveform ecg (n) − baseline(n) referred in the previous section. We will denote it as

fecg (n).

First, we determine the sequences of wavelet coefficients of the fecg (n) obtaining

hk(n),gk(n),gk−1(n),··· ,g1(n) where k =⌈log2x⌉, x is sampling frequency. We assign zero to all the

coefficients),⃗0(n) (detail coefficients, onward), ··· ,⃗0(n),gj(n),⃗0(n),··· ,⃗0(n) to obtain pulse(n). To find a

protruding segment, that is, a QRS complex, we compute the score for each wavelet scale j,

?????

We select the wavelet scale j which produces the largest drop of scorej− scorej+1(j ≥ 2). The bottom

waveform in Figure 4(b) shows the time positions of QRS complexes when selecting this suitable wavelet scale.

coefficient sequences except one, gj(n). Then, we calculate inverse transform of⃗0(n) (approximation

scorej=

∑

l

fecg (l)

|pulse(l)|

m|pulse(m)|

∑

?????.

Figure 4 Selection of wavelet scale to find the time positions of QRS complexes. (a) Discrete wavelet transform

and inverse transform. (b) Top: A flattened ECG waveform, fecg (n). Middle: waveform resulted from the inverse

transform, pulse(n). Bottom: fecg (n)|pulse(n)|

After finding the locations of QRS complexes, we choose QRS onset and offset points in each QRS complex. We

search QRS onset point in backward direction from a peak point in each QRS complex. We take the QRS onset

point if the point is at the place of changing direction of rising and falling of fecg (n) twice. In the same way, we

4

Page 6

take the QRS offset point in forward direction from the peak point.

Algorithm 1 shows a process of removing baseline wandering and detecting QRS complexes.

Feature formation for classification problems

We deal with the flattened waveform, fecg (n), to obtain the values of the features. We take voltage level of QRS

onset point as the reference from which we measure voltage deviation [2,28]. We denote the mean value of electric

potentials at QRS onset points as fecg (QRS onset). We consider this value as an effective zero voltage, so we

measure voltage deviation from the fecg (QRS onset).

To form the first feature, we sum up all the voltage deviation from QRS offset point to T wave peak point as shown

in Figure 5(a) and (b).

T peak

∑

The second feature is similar to the first feature with an exception of the ending position of the sum. We terminate

the summation as we reach the first point, F, at which the voltage becomes equal to the reference voltage

fecg (QRS onset), see Figure 5. When doing this, we add the signed values of the voltage deviation to find

whether the area is lower or higher with respect to the reference voltage. Then we divide the value by the voltage at

QRS peak point. The second feature value is given as follows.

i=QRS offset

The third feature is a slope from the QRS onset point to the QRS offset point.

????

and arrange the three mean values as (feature1,feature2,feature3).

feature1=

i=QRS offset

??fecg (i) − fecg (QRS onset)??

feature2=

F

∑

(fecg (i) − fecg (QRS onset))

/|fecg (QRS peak)|

feature3=

fecg (QRS offset) − fecg (QRS onset)

QRS offset − QRS onset

????

We calculate these three feature values for each heart beat. Then we average these values in five successive beats,

Figure 5 Features used in the classification process. (a), (b) Area between QRS offset and T peak with respect to

the reference mean voltage fecg (QRS onset). (a) ST segment elevation. (b) ST segment depression. (c)

Normalized and signed sum of voltage deviations from the QRS offset to the first point F at which voltage becomes

equal to the reference voltage. (d) Slope from the QRS onset point to the QRS offset point. Markers ⃝, ? and △

designate QRS onset, peak and offset points respectively

Algorithm 2 shows the pseudo-code of computing feature values.

Classification by kernel density estimation

We approximate probability density at a point by considering the other points. Let us assume we have

d-dimensional points {x1,x2,··· ,xn}. We can estimate the probability density at a point y as p(y) =1

is a small volume around y, and K is a number of enclosed points in the volume V [29]. We replace the termK

d-dimensional Gaussian function as follows [30].

n

K

Vwhere V

Vby

p(y) =1

n

K

V

=1

n

n

∑

i=1

1

(√2π

)d ??∑??1/2e−1

2(y−xi)T∑−1(y−xi)

If we assume that the covariance matrix∑is a diagonal matrix with each diagonal element b2

j(1 ≤ j ≤ d), the

5

Page 7

Algorithm 1 A procedure to find time positions of QRS onset, peak and offset points. This procedure includes the

method of removing baseline wandering in ECG. nBeats stands for the number of QRS peaks. It is the length of

the sequences idx QRS Onset(n), idx QRS Peak(n) and idx QRS Offset(n).

Input: Sampling Hz, ecg (n)

Output: idx QRS Onset(n), idx QRS Peak(n), idx QRS Offset(n)

k ←⌈log2Sampling Hz⌉

for i = 1 to k do

gi(n) ←⃗0(n) {//⃗0(n) means zero sequence.}

end for

Inverse wavelet transform (IDWT) of hk(n),gk(n),gk−1(n),··· ,g1(n) into baseline(n)

fecg (n) ← ecg (n) − baseline(n)

DWT of fecg (n) into hk(n),gk(n),gk−1(n),··· ,g1(n)

hk(n) ←⃗0(n)

gk(n) ←⃗0(n)

for i = 1 to k − 1 do

g′

gi(n) ←⃗0(n)

end for

for i = 1 to k − 1 do

gi(n) ← g′

IDWT of hk(n),gk(n),gk−1(n),··· ,g1(n) into pulse(n)

scorei←

gi(n) ←⃗0(n)

end for

chosen scale ← argmax2≤i≤k−2{scorei− scorei+1}

gchosen scale(n) ← g′

IDWT of hk(n),gk(n),gk−1(n),··· ,g1(n) into pulse(n)

needle(n) ← |fecg (n)pulse(n)|

Make idx QRS Peak(n) by searching for local maxima of needle(n)

for i = 1 to nBeats do

if fecg (idx QRS Peak(i)) > 0 then

j ← 1

while fecg (idx QRS Peak(i) − j) ≤ fecg (idx QRS Peak(i) − j + 1) do

j ← j + 1

end while

while fecg (idx QRS Peak(i) − j) > fecg (idx QRS Peak(i) − j + 1) do

j ← j + 1

end while

idx QRS Onset(i) ← idx QRS Peak(i) − j

j ← 1

while fecg (idx QRS Peak(i) + j − 1) ≥ fecg (idx QRS Peak(i) + j) do

j ← j + 1

end while

while fecg (idx QRS Peak(i) + j − 1) < fecg (idx QRS Peak(i) + j) do

j ← j + 1

end while

idx QRS Offset(i) ← idx QRS Peak(i) + j

else

··· {//When QRS complex protrudes downward, code is same with reversing directions of inequality signs.}

end if

end for

Discrete wavelet transform (DWT) of ecg (n) into hk(n),gk(n),gk−1(n),··· ,g1(n)

i(n) ← gi(n)

i(n)

???∑

lfecg (l)

|pulse(l)|

m|pulse(m)|

∑

???

chosen scale(n)

6

Page 8

Algorithm 2 A procedure to compute feature values. nBeats denotes the number of QRS peaks. It is the length of

the sequences idx QRS Onset(n), idx QRS Peak(n) and idx QRS Offset(n). nclis equal to nBeats/5.

Input: fecg (n), idx QRS Onset(n), idx QRS Peak(n), idx QRS Offset(n)

Output:x(cl)

12

ncl

{//cl can be S (ST episode) or N (normal).}

(∑nBeats

mean idx diff2←

{//mean idx diff1and mean idx diff2are truncated into integers.}

fecg (QRS onset) ←

for i = 1 to nBeats do

k ← idx QRS Peak(i) + mean idx diff2

feature1(i) ←∑T peak

feature2(i) ←

m ← idx QRS Peak(i) − mean idx diff1

feature3(i) ←

end for

for i = 1 to nBeats/5 do

[

x(cl)

i

5

[

end for

{

,x(cl)

,··· ,x(cl)

}

mean idx diff1←

i=1

(idx QRS Peak(i) − idx QRS Onset(i))

(idx QRS Offset(i) − idx QRS Peak(i))

(∑nBeats

??fecg (j) − fecg (QRS onset)??

???fecg(k)−fecg(m)

∑5i

j=5i−4feature2(j)

∑5i

)

/nBeats

(∑nBeats

i=1

)

/nBeats

i=1

fecg (idx QRS Onset(i))

)

/nBeats

j=k

j=k

(∑F

(fecg (j) − fecg (QRS onset)))

k−m

/|fecg (idx QRS Peak(i))|

???

x(cl)

i

[

x(cl)

i

]

]

1←1

]

3←1

5

j=5i−4feature1(j)

∑5i

j=5i−4feature3(j)

2←1

5

probability density at the point y is given as follows [31].

p(y) =1

n

n

∑

i=1

1

(√2π

)d

(b1b2···bd)

e

−1

2

∑d

j=1

([y]j−[xi]j

bj

)2

We classify a test point by examining posterior probabilities in which the test point belongs to two classes, normal

or ischemic ST episode. We assume we have nSpoints

x(S)

{

part, respectively. Each point is described by three components (feature1,feature2,feature3).

{

1,x(S)

2,··· ,x(S)

nS

}

, and nNpoints

x(N)

1

,x(N)

2

,··· ,x(N)

nN

}

. The first and the second set designate training sets of ischemic ST episode and normal

We compute posterior probability in which the test point y belongs to each class by Bayes’ theorem as follows [29].

P (class | y) =

P (class)p(y | class)

P (class = N)p(y | class = N) + P (class = S)p(y | class = S)

The prior probability P (class) is given as P (class = N) = nN/(nN+ nS) or P (class = S) = nS/(nN+ nS).

The likelihood p(y | class = N) and p(y | class = S) reads as

p(y | class = N) =

1

nN

(√2π

)3(

b(N)

1

b(N)

2

b(N)

3

)

nN

∑

i=1

e

−1

2

∑3

j=1

[y]j−

[

j

x(N)

i

]

j

b(N)

2

,

p(y | class = S) =

1

nS

(√2π

)3(

b(S)

1b(S)

2b(S)

3

)

nS

∑

i=1

e

−1

2

∑3

j=1

[y]j−

[

j

x(S)

i

]

j

b(S)

2

.

The quantities b(N)

i

and b(S)

i

(1 ≤ i ≤ 3) are called kernel bandwidths. We calculate these bandwidths for each class

7

Page 9

(N or S) and component (1 ≤ i ≤ 3). These kernel bandwidths impact accuracy of kernel density estimation [32].

{

For each component (1 ≤ i ≤ 3) of the feature vector, we calculate the mean value of differences as follows.

1

ncl(ncl− 1)/2

We have ncltraining points

x(cl)

1

,x(cl)

2

,··· ,x(cl)

ncl

}

where cl denotes class, N (normal) or S (ischemic ST episode).

mean(cl)

i

=

ncl

∑

j=1

ncl

∑

k=j+1

???

[

x(cl)

j

]

i−

[

x(cl)

k

]

i

???

We choose half of the mean,1

(1 ≤ i ≤ 3).

2mean(cl)

i

, as kernel bandwidth b(cl)

i

for each class cl (N or S), and component i

Classification with the use of support vector machine

Let us assume we have ncltraining points

(feature1,feature2,feature3) in a three-dimensional feature space. We construct support vector machine

classifier by solving the following optimization problem [33]

{

subject to

t(cl)

i

wT· ϕ

{

x(cl)

1

,x(cl)

2

,··· ,x(cl)

ncl

}

. Each point is described as

min

w,b,ξ

1

2wT· w + C∑ncl

x(cl)

i

j=1ξj

)

}

((

)

+ b

≥ 1 − ξi,ξi≥ 0.

The target label t(cl)

between the slack variable (ξi) penalty and the margin (wT· w) [29]. The dual form of the above classifier reads as

follows

{∑ncl

subject to

j

i

is specified as 1 (normal) or -1 (ischemic ST episode). The parameter C controls the trade-off

max

α

j=1αj−1

∑ncl

)

2αT· Hα

αj= 0,

}

j=1t(cl)

0 ≤ αj≤ C

where the matrix H is expressed as

(

new pattern y, we examine decision function, sgn

{

classification rate.

Hij≡ t(cl)

i

t(cl)

j

K

x(cl)

i

,x(cl)

j

)

= t(cl)

i

t(cl)

j

ϕ

(

x(cl)

i

· ϕ

(∑ncl

(

j=1t(cl)

x(cl)

j

)

= t(cl)

αjK

i

t(cl)

j

x(cl)

j

e−1

3

)

???x(cl)

+ b

i

−x(cl)

)

j

???

2

[33]. When we classify a

j

(

,y

. Whenever the input training set

x(cl)

1

,x(cl)

2

,··· ,x(cl)

ncl

}

was changed, we varied the parameter C to find its value which produces the highest

Experiments setting

We used kernel density estimation and support vector machine methods to evaluate the proposed approach. We

completed the experiment for each channel and record available in the European ST-T database. First, we trained

the classifier based on a subset of ST episodes and normal ECG. Then we tested how well the feature values

discriminated the two classes, ST episode and normal. When we formed the ST episode data, we used all the

ischemic ST episodes except ST deviations data resulted from non-ischemic causes such as position related changes

in the electrical axis of the heart. To preserve balance between ST episode and normal ECG data, we collected

normal data from the beginning of each record as much as the amount of ST episode data. When dividing the data

into training and test sets, we assigned one tenth of data to the training data, and the rest to the test data. In the cases

of e0106 lead 0, e0110, e0136, e0170, e0304, e0601, and e0615 records, we constructed the training data of one

third of all data and test data of two thirds because these records had much small ischemic ST episode data. To

avoid ambiguous region between ischemic ST episode and normal ECG, we removed 10 seconds amount of ECG

data from each side of the boundary.

When we classify a test set {yi}, four quantities are computed: true positive (TP), false negative (FN), false positive

(FP), and true negative (TN). TP is a number of ischemic events correctly detected. FN is a number of erroneously

rejected (missed) ischemic events. FP is a number of non-ischemic, that is, normal parts which the classifier

8

Page 10

erroneously detected as ischemic events. TN is a number of normal parts which our classifier correctly rejected as

non-ischemic events [34]. These are numbers of corresponding yipoints which were obtained by averaging three

feature values of successive five beats in Algorithm 2. The sensitivity and specificity are expressed in a usual

fashion, Se = TP/(TP + FN) and Sp = TN/(TN + FP) respectively [6].

We tested the classifiers by counting how many ST episodes were correctly caught, out of 367 episodes in the 85

records of European ST-T database. For an interval of ischemic ST episode data, we formed n test points

{y1,y2,··· ,yn} from the data (Algorithm 2), and classified each test point and then counted numbers of two

classes, “ischemic” and “normal”. If the number of class “ischemic” was larger than n/2, we declared the interval

to be an ischemic ST episode. The experiments were completed for 367 ischemic ST episodes.

We compared the results of kernel density estimation (KDE) and support vector machine (SVM) methods with

those formed by artificial neural network (ANN). The corresponding ANN classifier exhibits the following

topology. The input layer has three nodes which accept feature1, feature2and feature3respectively. The

output layer has two nodes which have target values (1,0) and (0,1) in the cases of “ischemia” and “normal”

classes, respectively. We initialized bias weights as 0, and assigned random values between -1.0 and 1.0 to the

weights of the network. The learning was carried out by running the backpropagation method [17] for 3000

iterations. We used a sigmoid activation function 1/(1 + e−x)and set learning rate 0.01. We adopted various

the number in each parenthesis represents a number of nodes in the corresponding hidden layer. We used stochastic

(incremental) gradient descent method to alleviate some drawbacks of the standard gradient descent method,

see [17].

topologies of hidden layers such as 3 → (5) → (5) → 2, 3 → (6) → 2, 3 → (7) → 2 and 3 → (8) → 2 where

Results

KDE with various kernels

We can use various kernels in kernel density estimation. If we have training points {x1,x2,··· ,xn} and a test point

y, the probability density at y is given as follows [35].

p(y) =1

n

n

∑

i=1

kG

b1b2b3e−1

2u2

i

(Gaussian),

p(y) =1

n

n

∑

(

i=1

kR

b1b2b31{|ui|≤1}

(Rectangular),

p(y) =1

n

n

∑

i=1

kE

b1b2b3

1 − u2

i

)

1{|ui|≤1}

(Bartlett-Epanechnikov),

p(y) =1

n

n

∑

n

∑

∑

i=1

kB

b1b2b3

(

(

1 − u2

i

)21{|ui|≤1}

)31{|ui|≤1}

(Byweight),

p(y) =1

n

i=1

kTriw

b1b2b3

1 − u2

i

(Triweight),

p(y) =1

n

n

i=1

kTria

b1b2b3

(1 − ui)1{|ui|≤1}

(Triangular).

Here kG, kR, kE, kB, kTriwand kTriaare constants, and u2

three feature values. The indicator function 1{|ui|≤1}is given as follows.

iis given as u2

i≡∑3

j=1

([y]j−[xi]j

bj

)2

because we use

1{|ui|≤1}=

{

1

0

(if |ui| ≤ 1)

(otherwise)

Table 1 shows classification results for various kernels. In all cases we used Daubechies8 wavelet to produce

9

Page 11

training and test sets. We took each bandwidth b(cl)

1 ≤ i ≤ 3. The “detect” means how many ST episodes our classifier correctly detected, out of total 367 episodes.

The “factor” in this table specifies how we multiplied on the mean(cl)

this factor from 0.1 to 3.0, and selected the one for which a sum of sensitivity and specificity values attains a

maximum. Because the Gaussian kernel produced best results, in the sequel we will use the Gaussian kernel.

Table 2 shows the results with respect to various kernel bandwidths.

i

= mean(cl)

i

· factor for class cl, ischemic or normal, and

i

to form the kernel bandwidth b(cl)

i

. We varied

Table 1 Classification results with respect to various kernels

kernel factor

Gaussian 0.5

Rectangular1.5

Epanechnikov1.7

Byweight 2.0

Triweight2.1

Triangular1.8

Se.

0.939

0.892

0.904

0.912

0.916

0.908

Sp.

0.912

0.913

0.915

0.916

0.916

0.917

TP TN

21441

21460

21522

21533

21542

21554

FP

2075

2056

1994

1983

1974

1962

FN

1794

3185

2811

2600

2471

2714

detect

349

329

335

333

336

334

27600

26209

26583

26794

26923

26680

Table 2 Classification results of Gaussian kernels with respect to various bandwidths

factorSe. Sp.

0.20.9430.867

0.30.9440.893

0.40.9420.905

0.50.939 0.912

0.6 0.9340.915

0.70.929 0.916

0.8 0.9240.916

TPTN

20399

20996

21279

21441

21526

21550

21529

FP

3117

2520

2237

2075

1990

1966

1987

FN

1666

1649

1697

1794

1941

2076

2246

detect

353

352

351

349

343

338

337

27728

27745

27697

27600

27453

27318

27148

Results for KDE, SVM and ANN with various wavelets

We examined the classifiers to find out how their performance depends on the mother wavelets which were used to

produce training and test sets in Algorithm 1. We used 7 wavelets, Haar, Daubechies4, Daubechies8,

Daubechies10, Coiflet6, Coiflet12 and Coiflet18 [22,36]. The number forming a part of the name of each wavelet

designates the length of filter which characterizes corresponding wavelet. Figure 6 shows selected shapes of

wavelet functions except for the Haar wavelet which is given as

{

−1

Haar (t) =

1

(0 ≤ t ≤ 1/2)

(1/2 ≤ t ≤ 1).

Table 3 shows the classification results obtained for KDE. The kernel bandwidth is expressed as b(cl)

for each class cl and 1 ≤ i ≤ 3. We used Gaussian kernel.

i

= mean(cl)

i

/2

Table 3 Classification results for KDE with respect to various wavelets

waveletSe.

Haar0.915

Daubechies4 0.936

Daubechies80.939

Daubechies100.942

Coiflet60.934

Coiflet12 0.932

Coiflet180.937

Sp.

0.893

0.906

0.912

0.916

0.900

0.914

0.919

TPTN

20245

21130

21441

21585

21837

21757

21612

FP

2420

2186

2075

1969

2430

2041

1911

FN

2418

1886

1794

1710

2027

2045

1859

detect

339

343

349

348

349

349

349

25906

27488

27600

27862

28586

27846

27721

10

Page 12

Figure 6 Shapes of various wavelets. (a) Daubechies4, (b) Daubechies8, (c) Daubechies10, (d) Coiflet6, (e)

Coiflet12 and (f) Coiflet18

Table 4 shows the classification results for the KDE with respect to various bandwidths and wavelets. The first

column for each wavelet item represents the sum of sensitivity and specificity. The second column shows how

many ST episodes were correctly detected. We used the kernel bandwidths b(cl)

cl and 1 ≤ i ≤ 3. The sum of sensitivity and specificity becomes maximum when the bandwidth b(cl)

b(cl)

ii

/2.

i

= mean(cl)

i

· factor for each class

i

is around

≈ mean(cl)

Table 4 Classification results for KDE versus selected values of bandwidths and types of wavelets

factorHaarDaub4Daub8

0.2 1.7703481.797348 1.811353

0.3 1.7973501.8253501.837 352

0.41.808345 1.838 3441.847351

0.5 1.8083391.8423431.851349

0.61.806336 1.840339 1.849343

0.7 1.800 3311.836 3361.846338

0.81.7923251.8293341.839337

Daub10

1.817

1.843

1.856

1.859

1.857

1.853

1.848

Coif6Coif12

1.812

1.833

1.844

1.846

1.843

1.837

1.831

Coif18

1.825

1.848

1.857

1.856

1.854

1.849

1.842

355

356

353

348

343

338

335

1.794

1.822

1.833

1.834

1.832

1.829

1.824

356

354

354

349

345

340

334

360

357

355

349

346

341

339

354

353

354

349

346

344

340

Table 5 shows the classification results obtained for SVM. The parameter C controls the trade-off between the slack

variable (ξi) penalty and the margin (wT· w). We examined the classification accuracy versus the values of C

changing from 0.1 to 300.0 in step of 0.1, and selected the one that made the sum of sensitivity and specificity

maximal.

Table 5 Classification results for SVM for various wavelets

wavelet

C

Haar 291.3

Daubechies4242.9

Daubechies8 245.5

Daubechies10174.3

Coiflet6288.2

Coiflet1252.8

Coiflet1823.4

Se.

0.924

0.937

0.941

0.943

0.933

0.929

0.936

Sp.

0.907

0.923

0.923

0.927

0.918

0.918

0.927

TPTN

20547

21519

21712

21838

22284

21858

21805

FP

2118

1797

1804

1716

1983

1940

1718

FN

2161

1847

1736

1678

2042

2134

1888

detect

345

349

355

349

348

348

352

26163

27527

27658

27894

28571

27757

27692

Table 6 shows the classification results obtained by ANN. The number in parenthesis represents the number of

nodes in the corresponding hidden layer. The first, second and third column express sensitivity, specificity and the

“detect” respectively. We experimented 10 times, and averaged the results because we obtained different results

each time due to the random initialization of weights.

Table 6 Results for ANN classifiers with respect to various wavelets and sizes of hidden layers

wavelet3 → (5) → (5) → 2

Haar0.8510.920311.80.8660.916

Daub40.866 0.932304.20.8810.930

Daub80.8640.931 307.20.878 0.929

Daub10 0.866 0.939312.5 0.882 0.935

Coif60.8480.930 311.20.8680.920

Coif120.8550.936310.50.8680.933

Coif180.8740.938 311.80.8830.937

3 → (6) → 23 → (7) → 2

0.8670.917

0.8800.932

0.875 0.931

0.8850.935

0.8630.923

0.8660.935

0.884 0.936

3 → (8) → 2

0.8670.916

0.8810.931

0.8800.932

0.8820.937

0.8720.927

0.8740.935

0.8860.937

319.2

313.7

317.9

321.8

319.3

318.2

319.2

319.1

315.2

319.4

325.5

319

319.6

320.2

320.7

317.8

319.4

325.1

321

319.8

323.5

Tables 3, 5 and 6 show the Daubechies8 and Daubechies10 wavelets give us superior results. The shapes of these

two wavelets are similar to typical ECG waveforms [37,38]. From now on, we use the Daubechies8 wavelet

exclusively.

11

Page 13

Effects of baseline wandering in ECG

Table 7 shows the classification results by KDE, SVM and ANN when we did not remove baseline wandering in

ECG. If we compare this table with the Tables 3, 5 and 6, we get to know it is essential to remove baseline

wandering in Algorithm 1. In the Table 7, we selected the kernel bandwidths b(cl)

b(cl)

ii

/2, (1 ≤ i ≤ 3). We used the ANN classifier with sizes of layers expressed as 3 → (7) → 2. The

results of ANN were obtained by averaging results for 10 repetition of the experiments. The parameter C of the

SVM classifier was 297.9.

i

in the KDE classifier as

= mean(cl)

Table 7 Classification results for KDE, SVM and ANN without removal of baseline wandering

Se. Sp.TP

KDE0.8520.83722132

SVM 0.8700.835 22605

ANN0.7850.82720388.8

TN

17211

17161

17005.7

FP

3342

3392

3547.3

FN

3839

3366

5582.2

detect

328

326

296.1

If we use unsuitable wavelet scale like the one in Figure 3 to remove baseline wandering, it becomes difficult to

obtain good results. As the sampling frequency was 250 Hz, we selected the wavelet scale⌈log2250⌉= 8 in

bandwidth setting in KDE and layer composition of ANN classifier were same as the Table 7. The middle row of

wavelet scale 8 in Table 8 was our choice in Algorithm 1. Each entry in the row of wavelet scale 8 has counterparts

in “Daubechies8” rows in Tables 3, 5 and 6.

Algorithm 1. Table 8 shows the classification results when wrong wavelet scales were selected. The kernel

Table 8 Classification results for KDE, SVM and ANN with incorrectly selected wavelet scales to remove

baseline wandering

KDE

Se.Sp.detecttrade-off

scale 60.8510.785319 288.2

scale 70.9310.905344232.8

scale 8 0.9390.912349 245.5

scale 90.9290.906347110.4

scale 10 0.9210.896340 97.5

SVM

Se.

0.842

0.932

0.941

0.930

0.916

ANN

Sp.

0.864

0.933

0.931

0.918

0.896

Sp.

0.818

0.915

0.923

0.918

0.907

detect

318

349

355

350

343

Se.

0.748

0.876

0.875

0.859

0.838

detect

277.5

320.7

319.4

320.7

313.2

Effects of simulated noise

We examined performance of the classifiers when we added simulated noise into the original ECG signal. We

modeled the noise as the sum of wandering baseline and AC power line 60 Hz noise.

Let us assume we have original signal data, ecg (i) (1 ≤ i ≤ n). First, we compute mean value and standard

deviation of the ECG signal as m =(∑n

(

where a is an amplification factor and b is an angular frequency of the added baseline. Here Samp Freq means

sampling frequency which was 250 Hz in our case. We varied a from 0.1 to 1.0 in step of 0.1, and selected b to be

equal to 2, 4 or 6.

i=1ecg (i))/n and s =

(

√(∑n

)

i=1(ecg (i))2)/n − m2. Then we form a new

+1

Samp Freq

signal ecg′(i) by

ecg′(i) = ecg (i) + s · a ·

sin

b ·

i

Samp Freq

2cos

(

2π60 ·

i

))

Figure 7 shows the original ECG and its noise-impacted version. Tables 9, 10 and 11 show the experimental results

for the noisy ECG signal. The first, second and third column in each b item represent the sensitivity, specificity and

the “detect” respectively. The kernel bandwidth is set as b(cl)

composition of the ANN classifier was 3 → (7) → 2. The first column in each b item in Table 10 includes the

trade-off parameter C which produced best results.

i

= mean(cl)

i

/2 for the KDE classifier. The layer

12

Page 14

Table 9 Classification results for KDE versus varying intensity of noise

ab=2

0.1 0.9370.903346

0.2 0.9330.893346

0.30.9290.884 342

0.40.9250.864346

0.5 0.9160.858346

0.60.9060.866343

0.7 0.8910.856 340

0.80.8870.850 339

0.90.8810.849341

1.00.870 0.844335

b=4

0.902

0.892

0.873

0.852

0.848

0.832

0.815

0.799

0.798

0.787

b=6

0.904

0.879

0.856

0.834

0.817

0.803

0.794

0.793

0.778

0.779

0.934

0.927

0.915

0.903

0.882

0.872

0.859

0.852

0.837

0.832

346

341

340

344

333

329

325

325

312

317

0.933

0.916

0.908

0.885

0.871

0.858

0.848

0.838

0.837

0.824

346

337

338

327

333

321

322

317

301

319

Table 10 Classification results for SVM with varying intensity of the simulated noise

ab=2

0.1 79.00.9360.920 350

0.2143.00.9290.914347

0.384.1 0.9230.910 348

0.4124.50.919 0.899348

0.565.1 0.9090.895347

0.696.70.9030.887 343

0.7119.30.8880.880 339

0.8 201.8 0.8930.863 334

0.9278.20.888 0.857335

1.0211.70.879 0.859324

b=4 b=6

254.6

168.5

50.9

51.3

86.7

67.3

27.2

24.9

16.7

58.2

0.940

0.929

0.915

0.903

0.882

0.879

0.868

0.851

0.841

0.835

0.915

0.904

0.891

0.869

0.859

0.844

0.832

0.829

0.816

0.806

352

349

341

341

333

330

329

320

308

311

147.4

33.9

72.3

52.0

79.4

71.6

36.6

27.1

71.0

20.3

0.936

0.922

0.902

0.888

0.872

0.865

0.856

0.848

0.850

0.837

0.916

0.888

0.875

0.848

0.828

0.809

0.796

0.785

0.777

0.776

349

339

337

330

328

323

313

310

315

317

Table 11 Classification results for ANN with varying intensity of the simulated noise

ab=2

0.10.8800.929 322.2

0.20.867 0.928322.8

0.30.8630.929319.3

0.40.856 0.917320.5

0.50.8300.918320.1

0.60.8180.916 314.9

0.70.803 0.910309.6

0.80.8140.898 308.6

0.9 0.7980.892303.4

1.00.777 0.890297.1

b=4

0.921

0.922

0.906

0.887

0.874

0.858

0.865

0.839

0.838

0.807

b=6

0.922

0.905

0.895

0.874

0.850

0.833

0.811

0.801

0.795

0.788

0.870

0.844

0.837

0.823

0.811

0.803

0.759

0.753

0.723

0.724

320.8

315.1

313.4

312.3

309.1

299.2

281.9

288.2

273.9

265.3

0.874

0.840

0.811

0.784

0.778

0.762

0.761

0.745

0.728

0.728

320.3

314.2

303.4

300.6

300.3

292

283.2

280.7

261.7

269.3

Figure 7 ECG signal affected by synthetic noise. (a) Original signal. (b) Noise-affected signal when a is 1.0 and

b is 6.0

Comparison with others’ works

To compare our approach with others’ works, we tested the classifiers on 10 selected records, e0103, e0104, e0105,

e0108, e0113, e0114, e0147, e0159, e0162 and e0206. Table 12 shows results of comparison. The papers by

Papaloukas et al. [10], Goletsis et al. [39], Exarchos et al. [13] and Murugan et al. [15] in Table 12 dealt with the 10

records.

13

Page 15

Table 12 Results of comparative analysis

Researcher

Papaloukas et al. [10]

Goletsis et al. [39]

Exarchos et al. [13]

Murugan et al. [15]

Present work by KDE

Present work by SVM

Sensitivity

0.90

0.912

0.912

0.923

0.945

0.957

Specificity

0.90

0.909

0.922

0.943

0.943

0.953

We used the Daubechies8 wavelet in Algorithm 1 to analyze the ECG waveform, and took the kernel bandwidths

b(cl)

ii

/2 for the KDE classifier with Gaussian kernel. We used SVM classifier with C = 281.1.

= mean(cl)

Discussion

Table 1 showed how the classification results were dependent on various kernel functions in kernel density

estimation. Gaussian kernel produced best results.

Tables 3, 4, 5 and 6 show how the classification results depend on mother wavelets used in Algorithm 1.

Daubechies8 and Daubechies10 wavelets were best. Because we implemented wavelet transform program with the

use of matrix multiplication, we selected Daubechies8 wavelet to reduce computational burden. Daubechies10

wavelet did not produce much better classification accuracy than Daubechies8 wavelet.

Tables 2 and 4 indicate that the choice of kernel bandwidths was reasonable. When we took the kernel bandwidths

b(cl)

i

i

/2 for class cl, 1 ≤ i ≤ 3, we obtained best results except for the case of Coiflet18 wavelet. Even

in the case, the best parameter b(cl)

ii

choice in Tables 7, 8 and 9. In this way, we could automatically select 6 kernel bandwidths, and this exempted us

from choosing any numerical parameters.

= mean(cl)

= 0.4 · mean(cl)

was close enough to b(cl)

i

= mean(cl)

i

/2. We maintained this

The SVM classifiers in Tables 5, 7 and 8 produced better results than the KDE classifiers, but they required us to

determine optimal value of the parameter C. Whenever we used different wavelets on the same data set in Table 5,

we had to choose different trade-off parameter C. This was also the case in Table 8 where we intentionally selected

incorrect wavelet scales to remove baseline wandering.

Order of magnitude of feature3was very different from feature1and feature2. When we produced the feature

values using Daubechies8 wavelet in Algorithm 1, mean values of |feature1|, |feature2| and |feature3| were

7.327, 7.613 and 0.004, respectively. Thus we had to normalize the feature values to use them in classification.

Even if the orders of magnitude of feature1, feature2and feature3were very different, the equation of kernel

density estimation included a term

1

b(cl)

1

b(cl)

2

b(cl)

3

∑

ie

−1

2

∑3

j=1

[y]j−

[

j

x(cl)

i

]

j

b(cl)

2

. Furthermore each operand in the sum

comes in the form of

[y]j−

[

j

x(cl)

i

]

j

b(cl)

, normalization by kernel bandwidth. We thought these would be helpful to

overcome the difference of order of magnitude between feature1, feature2and feature3. This was a main

driving force to adopt the kernel density estimation.

We implemented the KDE and ANN classifier in C language for ourselves. For SVM classifier, we used libsvm

library [33]. We compiled the programs with gcc and g++ without using any SIMD (single instruction multiple

data) math library. Total amount of ECG text files which we used in our analysis was 200.4 MB. This amount is just

about voltage information not including time information. When we ran our programs to process the ECG text files

in Pentium4 3.2 GHz CPU, it took 243.0 seconds until the procedures of removing baseline wandering and

detecting time positions in Algorithm 1 were completed. This was when we used Daubechies8 wavelet. Feature

extraction in Algorithm 2, required 0.6 seconds. It took 1.2 seconds for the KDE classifier to process all the files.

14