Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection.
ABSTRACT Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection.
For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features.
We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes.
We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter.
- [Show abstract] [Hide abstract]
ABSTRACT: Cardiac disease is one of the main causes of catastrophic mortality. Therefore, detecting the symptoms of cardiac disease as early as possible is important for increasing the patient's survival. In this study, a compact and effective architecture for detecting atrial fibrillation (AFib) and myocardial ischemia is proposed. We developed a portable device using this architecture, which allows real-time electrocardiogram (ECG) signal acquisition and analysis for cardiac diseases. A noisy ECG signal was preprocessed by an analog front-end consisting of analog filters and amplifiers before it was converted into digital data. The analog front-end was minimized to reduce the size of the device and power consumption by implementing some of its functions with digital filters realized in software. With the ECG data, we detected QRS complexes based on wavelet analysis and feature extraction for morphological shape and regularity using an ARM processor. A classifier for cardiac disease was constructed based on features extracted from a training dataset using support vector machines. The classifier then categorized the ECG data into normal beats, AFib, and myocardial ischemia. A portable ECG device was implemented, and successfully acquired and processed ECG signals. The performance of this device was also verified by comparing the processed ECG data with highquality ECG data from a public cardiac database. Because of reduced computational complexity, the ARM processor was able to process up to a thousand samples per second, and this allowed real-time acquisition and diagnosis of heart disease. Experimental results for detection of heart disease showed that the device classified AFib and ischemia with a sensitivity of 95.1% and a specificity of 95.9%. Current home care and telemedicine systems have a separate device and diagnostic service system, which results in additional time and cost. Our proposed portable ECG device provides captured ECG data and suspected waveform to identify sporadic and chronic events of heart diseases. This device has been built and evaluated for high quality of signals, low computational complexity, and accurate detection.BioMedical Engineering OnLine 12/2014; 13(1):160. · 1.61 Impact Factor
Page 1
This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted
PDF and full text (HTML) versions will be made available soon.
Ischemia Episode Detection in ECG using Kernel Density Estimation, Support
Vector Machine and Feature Selection
BioMedical Engineering OnLine 2012, 11:30doi:10.1186/1475-925X-11-30
Jinho Park (jinho@gist.ac.kr)
Witold Pedrycz (wpedrycz@ualberta.ca)
Moongu Jeon (mgjeon@gist.ac.kr)
ISSN
1475-925X
Article type
Research
Submission date
21 January 2012
Acceptance date
23 May 2012
Publication date
15 June 2012
Article URL
http://www.biomedical-engineering-online.com/content/11/1/30
This peer-reviewed article was published immediately upon acceptance. It can be downloaded,
printed and distributed freely for any purposes (see copyright notice below).
Articles in BioMedical Engineering OnLine are listed in PubMed and archived at PubMed Central.
For information about publishing your research in BioMedical Engineering OnLine or any BioMed
Central journal, go to
http://www.biomedical-engineering-online.com/authors/instructions/
For information about other BioMed Central publications go to
http://www.biomedcentral.com/
BioMedical Engineering OnLine
© 2012 Park et al. ; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Page 2
Ischemia Episode Detection in ECG using Kernel Density Esti-
mation, Support Vector Machine and Feature Selection
Jinho Park1, Witold Pedrycz2and Moongu Jeon1∗
1School of Information and Communications, Gwangju Institute of Science and Technology, 1 Oryong-dong, Buk-gu,
Gwangju, Republic of Korea
2Department of Electrical and Computer Engineering, University of Alberta, Canada and Systems Research Institute, Polish
Academy of Sciences, Warsaw, Poland
Email: Jinho Park - jinho@gist.ac.kr;
Witold Pedrycz - wpedrycz@ualberta.ca;
Moongu Jeon∗- mgjeon@gist.ac.kr;
∗Corresponding author
Abstract
Background
Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in
electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic
disease. To this end, we propose a new method, which employs wavelets and simple feature selection.
Methods
For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in
90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based
on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for
differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized
and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point.
We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers
to those features.
Results
We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods.
Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic
ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively.
The SVM classifier detects 355 ischemic ST episodes.
Conclusions
We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing
baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature
extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features
were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed
KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any
numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select
a single parameter.
1
Page 3
Keywords
Myocardial ischemia, Discrete wavelet transform, Kernel density estimation, Support vector machine, QRS
complex detection, ECG baseline wandering removal
Background
Coronary artery disease is one of the leading causes of death in modern world. This disease mainly results from
atherosclerosis and thrombosis, and it manifests itself as coronary ischemic syndrome [1].
When a patient experiences coronary ischemic syndrome, his or her electrocardiogram (ECG) shows some peculiar
appearances. Each segment of ECG can be divided into P, Q, R, S and T waves as shown in Figure 1 where QRS
complex and T wave represent ventricular depolarization and repolarization, respectively. In most cases of normal
ECG, the ST segment has the same electric potential as the PR segment. When myocardial ischemia is present,
however, the electric potential of the ST segment is elevated or depressed with respect to the potential of the PR
segment [1,2]. When ischemia occurs, the PR segment is altered, or the ST segment deviates from normal level. If
the PR segment moved instead of the ST segment, this looks as if the ST segment itself were modified. This is
because the PR segment provides a kind of reference voltage level [1].
Figure 1 Normal ECG and ST segment elevation. (a) Normal ECG is divided into P, Q, R, S and T parts. Q, R
and S parts are called QRS complex in total. (b) This ECG waveform shows ST segment elevation
The ST segment deviation is mainly due to injury current in myocardial cells [1]. If the coronary artery becomes
blocked by blood clot, some myocytes are affected to be unresponsive to depolarization, or to repolarize earlier than
adjacent myocytes. In this case, voltage gradient can occur in the myocytes, and this comes to appear as
ST-segment deviation in ECG [1]. Figure 2 shows two cases when the voltage level of the ST segment deviates
from its normal position. The left column of the figure shows the distribution of electric charges around myocytes
when the heart is in resting state. This is related to the PR segment in ECG. The right column shows the distribution
of electric charges right after the ventricles contracted. This is related to the QRS complex and the ST segment in
ECG. The shaded region represents the area being affected by myocardial ischemia. In the case of the upper row in
Figure 2, there is no voltage gradient at first. After the ventricles contracted, however, the voltage gradient comes to
appear because the injured area did not respond to electric depolarization. In the second case of the bottom row,
there is no voltage gradient right after the ventricles contracted. In the left figure, however, there was initial voltage
gradient, and this makes the PR segment to be modified. The PR segment acts as a reference voltage level when we
judge whether the ST segment deviated from normal position. The modified PR segment makes us conclude that
there was a ST segment deviation [1].
Figure 2 Cause of ST segment deviation [1]. Left column shows distribution of electric charges before the
ventricles contracts. The right column shows the charge distribution after the ventricles contracted. Shaded area
represents that the area was affected by ischemia
There are several approaches to detect ischemic ST deviations. Some researchers used the entropy. Rabbani et al.
used the fact that signal perturbation of normal people is lower than the perturbation of ischemic patients. They
computed entropy measure of wavelet subband of ECG signal, and classified the ECG by examining which signal
exhibited a more chaotic perturbation [3]. Lemire et al. calculated signal entropy at various frequency levels. They
computed the entropy in each wavelet scale [4]. Some used adaptive neuro-fuzzy inference system. Pang et al. used
Karhunen-Lo` eve transform to extract several feature values. They classified ECG signal by an adaptive neuro-fuzzy
inference system [5]. Tonekabonipour et al. used multi-layer perceptron and radial basis function to detect ischemic
episode. They classified ECG signals by adaptive neuro-fuzzy network [6]. There are many papers which used
artificial neural network. Stamkopoulos et al. used nonlinear principal component analysis to analyze complex
data. They classified ECG signal by radial basis function neural network [7]. Maglaveras et al. used neural network
optimized with a backpropagation algorithm [8]. Afsar et al. used Karhunen-Lo` eve transform to find feature values,
and classified an input ECG by using a neural network [9]. Papaloukas et al. used artificial neural network which
was trained by Bayesian regularization method [10]. There are papers studied some other approaches. Bulusu et al.
determined morphological features of ECG, and classified the ECG data by support vector machine. Andreao et al.
2
Page 4
used hidden Markov models to analyze ECG segments. They detected ischemia episode by using median filter and
linear interpolation [11]. Faganeli and Jager tried to distinguish ischemic ST episode and non-ischemic ST episode
caused by heart rate change. To this end, they computed heart rate values, Mahanalobis distance of
Karhunen-Lo` eve transform coefficients and Legendre orthonormal polynomial coefficients [12]. Exarchos et al.
used decision tree. They formed decision rules comprising specific thresholds, and developed a fuzzy model to
classify ischemic ECG signals [13]. Garcia et al. considered root mean square of difference between the input
signal and the average signal composed of first 100 beats. They adopted an adaptive amplitude threshold to classify
ECG signal [14]. Murugan and Radhakrishnan used ant-miner algorithm to detect ischemic ECG beats. They
calculated several feature values such as ST segment deviation from input ECG signal [15]. Bakhshipour et al.
analyzed coefficients resulted from wavelet transform. They examined the relative quotient of the coefficients at
each decomposition level of the wavelet transform [16].
We approached this problem by extracting feature values from a ECG waveform. We first found time positions of
QRS complexes, and then determined values of the three features. We calculated the feature values for each heart
beat, and averaged their values in five successive beats. After that, we classified them by the methods of kernel
density estimation and support vector machine.
We show techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete
wavelet transform. With these explicit methods of dealing with ECG, we could discriminate ischemic ST episode
from normal ECG. We did not adopt implicit methods such as artificial neural networks or decision trees, because
we considered it was important to utilize explicit features for processes of decision making. The artificial neural
network has a kind of black box nature in its hidden layers [17], and a decision tree is apt to include several
numerical thresholds [13].
Methods
Materials
We used the European ST-T database from Physionet. European ST-T database has 90 records which are
two-channel and each two hours in duration [18,19]. Each record in this database has a different number of ST
episodes. Overall there are 367 ischemic ST episodes in the database. Sampling frequency of each ECG data is 250
Hz.
We excluded 5 records because these had some problems. The records e0133, e0155, e0509, and e0611 had no
ischemic ST episodes. The record e0163 had so limited ST episode whose length was just 31 seconds.
Removing baseline wandering
The ST segments in ECG can be strongly affected by baseline wandering [20]. Main causes of the baseline
wandering are respiration and electrode impedance change due to perspiration [20,21]. The frequency content of
the baseline wandering is usually in a range below 0.5 Hz [20,21].
We use discrete wavelet transform to remove baseline wandering in ECG. We transform signal vector into two
sequences of coefficients, approximation and detail coefficients sequences [22]. We do this in each step in an
iterative fashion, until we get an input signal whose length is smaller than the length of the filter which characterizes
the wavelet. In our case, we used Daubechies8 wavelet with filter length of 8. The resulting approximation
coefficient sequence becomes the input signal to the next discrete wavelet transform as shown in Figure 3(a) [22].
Figure 3 Removing baseline wandering in ECG. (a) Discrete wavelet transform of ecg (n) to find coefficient
sequences hk(n),gk(n),gk−1(n),··· ,g1(n). The⃗0(n) means zero sequence. (b) Top: input ECG, ecg (n),
bottom: wandering baseline in ECG, baseline(n). (c) Top: ecg (n), bottom: ecg (n) − baseline(n). When k is (d)
too small or (e) too large, top: ecg (n), middle: baseline(n), bottom: ecg (n) − baseline(n)
In each step, the coefficient sequence implies a band of frequencies. If the sampling frequency of a discrete ECG
signal ecg (n) is x, we can determine a continuous and band-limited signal within frequency limits of[0,x
2
]by
Nyquist sampling theorem [23]. Therefore if we have transformed the input signal ecg (n) into the approximation
3
Page 5
coefficient sequence h1(n) and detail coefficient sequence g1(n), then the frequency content of g1(n) is fromx
x
2, and the frequency content of h1(n) is belowx
approximation coefficient sequence hk(n), and the detail coefficient sequences gk(n),gk−1(n),··· ,g1(n), the
frequency contents of gk(n),gk−1(n),··· ,g1(n) become
respectively [24,25].
4to
4. In this regard, if we have transformed the ecg (n) into the
[
x
2k+1,x
2k
]
,
[
x
2k,
x
2k−1
]
,··· ,
[
x
22,x
2
]
To remove baseline wandering, we should choose appropriate wavelet scale. We follow argument similar to that
presented by Arvinti et al. except that they used stationary wavelet transform instead of its discrete
counterpart [26]. We remove signal components whose frequency content is less than 1/2 Hz [20,21]. If we have
transformed the ECG signal ecg (n) into coefficient sequences hk(n),gk(n),gk−1(n),··· ,g1(n), the frequency
contents of hk(n) and gk(n) become0,
2k+1
we choose k as
becomes less than 1/2 Hz. Thus, we assign zero sequence⃗0(n) to all the detail coefficient sequences
gk(n),gk−1(n),··· ,g1(n), and calculate inverse transform of hk(n),⃗0(n),⃗0(n),··· ,⃗0(n) to form the
baseline(n) in the bottom of Figure 3(b). If we subtract baseline(n) from ecg (n), we obtain the flattened signal
like the one shown in Figure 3(c).
[
x
]
and
[
x
2k+1,x
2k
]
respectively, where x is the sampling frequency. If
x
2k+1≤1
2, k =⌈log2x⌉, the frequency content of the approximation coefficient sequence hk(n)
If we select a wrong wavelet scale k to find coefficient sequences of ecg (n), we obtain disappointing results. The
flattened signal in Figure 3(c) is obtained when k is⌈log2250⌉= 8, where 250 is the sampling frequency expressed
baseline(n), resulted from the inverse discrete wavelet transform of h4(n),⃗0(n),⃗0(n),··· ,⃗0(n). This middle
waveform is too detailed, so the bottom waveform ecg (n) − baseline(n) was negatively affected. When we select
k = 12, see Figure 3(e), the bottom waveform was not different from the input waveform ecg (n).
in Hz. When select k = 4 to use h4(n),g4(n),··· ,g1(n), we obtain a plot in Figure 3(d). The middle waveform,
We adopt a discrete wavelet transform to retain the details of the ECG waveform because filtering by some cut-off
frequency can deteriorate the quality of the ECG waveforms [27].
Detecting QRS complexes
We have to select an appropriate wavelet scale to capture proper time positions of QRS complexes. We will deal
with only the flattened ECG waveform ecg (n) − baseline(n) referred in the previous section. We will denote it as
fecg (n).
First, we determine the sequences of wavelet coefficients of the fecg (n) obtaining
hk(n),gk(n),gk−1(n),··· ,g1(n) where k =⌈log2x⌉, x is sampling frequency. We assign zero to all the
coefficients),⃗0(n) (detail coefficients, onward), ··· ,⃗0(n),gj(n),⃗0(n),··· ,⃗0(n) to obtain pulse(n). To find a
protruding segment, that is, a QRS complex, we compute the score for each wavelet scale j,
?????
We select the wavelet scale j which produces the largest drop of scorej− scorej+1(j ≥ 2). The bottom
waveform in Figure 4(b) shows the time positions of QRS complexes when selecting this suitable wavelet scale.
coefficient sequences except one, gj(n). Then, we calculate inverse transform of⃗0(n) (approximation
scorej=
∑
l
fecg (l)
|pulse(l)|
m|pulse(m)|
∑
?????.
Figure 4 Selection of wavelet scale to find the time positions of QRS complexes. (a) Discrete wavelet transform
and inverse transform. (b) Top: A flattened ECG waveform, fecg (n). Middle: waveform resulted from the inverse
transform, pulse(n). Bottom: fecg (n)|pulse(n)|
After finding the locations of QRS complexes, we choose QRS onset and offset points in each QRS complex. We
search QRS onset point in backward direction from a peak point in each QRS complex. We take the QRS onset
point if the point is at the place of changing direction of rising and falling of fecg (n) twice. In the same way, we
4
Page 6
take the QRS offset point in forward direction from the peak point.
Algorithm 1 shows a process of removing baseline wandering and detecting QRS complexes.
Feature formation for classification problems
We deal with the flattened waveform, fecg (n), to obtain the values of the features. We take voltage level of QRS
onset point as the reference from which we measure voltage deviation [2,28]. We denote the mean value of electric
potentials at QRS onset points as fecg (QRS onset). We consider this value as an effective zero voltage, so we
measure voltage deviation from the fecg (QRS onset).
To form the first feature, we sum up all the voltage deviation from QRS offset point to T wave peak point as shown
in Figure 5(a) and (b).
T peak
∑
The second feature is similar to the first feature with an exception of the ending position of the sum. We terminate
the summation as we reach the first point, F, at which the voltage becomes equal to the reference voltage
fecg (QRS onset), see Figure 5. When doing this, we add the signed values of the voltage deviation to find
whether the area is lower or higher with respect to the reference voltage. Then we divide the value by the voltage at
QRS peak point. The second feature value is given as follows.
i=QRS offset
The third feature is a slope from the QRS onset point to the QRS offset point.
????
and arrange the three mean values as (feature1,feature2,feature3).
feature1=
i=QRS offset
??fecg (i) − fecg (QRS onset)??
feature2=
F
∑
(fecg (i) − fecg (QRS onset))
/|fecg (QRS peak)|
feature3=
fecg (QRS offset) − fecg (QRS onset)
QRS offset − QRS onset
????
We calculate these three feature values for each heart beat. Then we average these values in five successive beats,
Figure 5 Features used in the classification process. (a), (b) Area between QRS offset and T peak with respect to
the reference mean voltage fecg (QRS onset). (a) ST segment elevation. (b) ST segment depression. (c)
Normalized and signed sum of voltage deviations from the QRS offset to the first point F at which voltage becomes
equal to the reference voltage. (d) Slope from the QRS onset point to the QRS offset point. Markers ⃝, ? and △
designate QRS onset, peak and offset points respectively
Algorithm 2 shows the pseudo-code of computing feature values.
Classification by kernel density estimation
We approximate probability density at a point by considering the other points. Let us assume we have
d-dimensional points {x1,x2,··· ,xn}. We can estimate the probability density at a point y as p(y) =1
is a small volume around y, and K is a number of enclosed points in the volume V [29]. We replace the termK
d-dimensional Gaussian function as follows [30].
n
K
Vwhere V
Vby
p(y) =1
n
K
V
=1
n
n
∑
i=1
1
(√2π
)d ??∑??1/2e−1
2(y−xi)T∑−1(y−xi)
If we assume that the covariance matrix∑is a diagonal matrix with each diagonal element b2
j(1 ≤ j ≤ d), the
5
Page 7
Algorithm 1 A procedure to find time positions of QRS onset, peak and offset points. This procedure includes the
method of removing baseline wandering in ECG. nBeats stands for the number of QRS peaks. It is the length of
the sequences idx QRS Onset(n), idx QRS Peak(n) and idx QRS Offset(n).
Input: Sampling Hz, ecg (n)
Output: idx QRS Onset(n), idx QRS Peak(n), idx QRS Offset(n)
k ←⌈log2Sampling Hz⌉
for i = 1 to k do
gi(n) ←⃗0(n) {//⃗0(n) means zero sequence.}
end for
Inverse wavelet transform (IDWT) of hk(n),gk(n),gk−1(n),··· ,g1(n) into baseline(n)
fecg (n) ← ecg (n) − baseline(n)
DWT of fecg (n) into hk(n),gk(n),gk−1(n),··· ,g1(n)
hk(n) ←⃗0(n)
gk(n) ←⃗0(n)
for i = 1 to k − 1 do
g′
gi(n) ←⃗0(n)
end for
for i = 1 to k − 1 do
gi(n) ← g′
IDWT of hk(n),gk(n),gk−1(n),··· ,g1(n) into pulse(n)
scorei←
gi(n) ←⃗0(n)
end for
chosen scale ← argmax2≤i≤k−2{scorei− scorei+1}
gchosen scale(n) ← g′
IDWT of hk(n),gk(n),gk−1(n),··· ,g1(n) into pulse(n)
needle(n) ← |fecg (n)pulse(n)|
Make idx QRS Peak(n) by searching for local maxima of needle(n)
for i = 1 to nBeats do
if fecg (idx QRS Peak(i)) > 0 then
j ← 1
while fecg (idx QRS Peak(i) − j) ≤ fecg (idx QRS Peak(i) − j + 1) do
j ← j + 1
end while
while fecg (idx QRS Peak(i) − j) > fecg (idx QRS Peak(i) − j + 1) do
j ← j + 1
end while
idx QRS Onset(i) ← idx QRS Peak(i) − j
j ← 1
while fecg (idx QRS Peak(i) + j − 1) ≥ fecg (idx QRS Peak(i) + j) do
j ← j + 1
end while
while fecg (idx QRS Peak(i) + j − 1) < fecg (idx QRS Peak(i) + j) do
j ← j + 1
end while
idx QRS Offset(i) ← idx QRS Peak(i) + j
else
··· {//When QRS complex protrudes downward, code is same with reversing directions of inequality signs.}
end if
end for
Discrete wavelet transform (DWT) of ecg (n) into hk(n),gk(n),gk−1(n),··· ,g1(n)
i(n) ← gi(n)
i(n)
???∑
lfecg (l)
|pulse(l)|
m|pulse(m)|
∑
???
chosen scale(n)
6
Page 8
Algorithm 2 A procedure to compute feature values. nBeats denotes the number of QRS peaks. It is the length of
the sequences idx QRS Onset(n), idx QRS Peak(n) and idx QRS Offset(n). nclis equal to nBeats/5.
Input: fecg (n), idx QRS Onset(n), idx QRS Peak(n), idx QRS Offset(n)
Output:x(cl)
12
ncl
{//cl can be S (ST episode) or N (normal).}
(∑nBeats
mean idx diff2←
{//mean idx diff1and mean idx diff2are truncated into integers.}
fecg (QRS onset) ←
for i = 1 to nBeats do
k ← idx QRS Peak(i) + mean idx diff2
feature1(i) ←∑T peak
feature2(i) ←
m ← idx QRS Peak(i) − mean idx diff1
feature3(i) ←
end for
for i = 1 to nBeats/5 do
[
x(cl)
i
5
[
end for
{
,x(cl)
,··· ,x(cl)
}
mean idx diff1←
i=1
(idx QRS Peak(i) − idx QRS Onset(i))
(idx QRS Offset(i) − idx QRS Peak(i))
(∑nBeats
??fecg (j) − fecg (QRS onset)??
???fecg(k)−fecg(m)
∑5i
j=5i−4feature2(j)
∑5i
)
/nBeats
(∑nBeats
i=1
)
/nBeats
i=1
fecg (idx QRS Onset(i))
)
/nBeats
j=k
j=k
(∑F
(fecg (j) − fecg (QRS onset)))
k−m
/|fecg (idx QRS Peak(i))|
???
x(cl)
i
[
x(cl)
i
]
]
1←1
]
3←1
5
j=5i−4feature1(j)
∑5i
j=5i−4feature3(j)
2←1
5
probability density at the point y is given as follows [31].
p(y) =1
n
n
∑
i=1
1
(√2π
)d
(b1b2···bd)
e
−1
2
∑d
j=1
([y]j−[xi]j
bj
)2
We classify a test point by examining posterior probabilities in which the test point belongs to two classes, normal
or ischemic ST episode. We assume we have nSpoints
x(S)
{
part, respectively. Each point is described by three components (feature1,feature2,feature3).
{
1,x(S)
2,··· ,x(S)
nS
}
, and nNpoints
x(N)
1
,x(N)
2
,··· ,x(N)
nN
}
. The first and the second set designate training sets of ischemic ST episode and normal
We compute posterior probability in which the test point y belongs to each class by Bayes’ theorem as follows [29].
P (class | y) =
P (class)p(y | class)
P (class = N)p(y | class = N) + P (class = S)p(y | class = S)
The prior probability P (class) is given as P (class = N) = nN/(nN+ nS) or P (class = S) = nS/(nN+ nS).
The likelihood p(y | class = N) and p(y | class = S) reads as
p(y | class = N) =
1
nN
(√2π
)3(
b(N)
1
b(N)
2
b(N)
3
)
nN
∑
i=1
e
−1
2
∑3
j=1
[y]j−
[
j
x(N)
i
]
j
b(N)
2
,
p(y | class = S) =
1
nS
(√2π
)3(
b(S)
1b(S)
2b(S)
3
)
nS
∑
i=1
e
−1
2
∑3
j=1
[y]j−
[
j
x(S)
i
]
j
b(S)
2
.
The quantities b(N)
i
and b(S)
i
(1 ≤ i ≤ 3) are called kernel bandwidths. We calculate these bandwidths for each class
7
Page 9
(N or S) and component (1 ≤ i ≤ 3). These kernel bandwidths impact accuracy of kernel density estimation [32].
{
For each component (1 ≤ i ≤ 3) of the feature vector, we calculate the mean value of differences as follows.
1
ncl(ncl− 1)/2
We have ncltraining points
x(cl)
1
,x(cl)
2
,··· ,x(cl)
ncl
}
where cl denotes class, N (normal) or S (ischemic ST episode).
mean(cl)
i
=
ncl
∑
j=1
ncl
∑
k=j+1
???
[
x(cl)
j
]
i−
[
x(cl)
k
]
i
???
We choose half of the mean,1
(1 ≤ i ≤ 3).
2mean(cl)
i
, as kernel bandwidth b(cl)
i
for each class cl (N or S), and component i
Classification with the use of support vector machine
Let us assume we have ncltraining points
(feature1,feature2,feature3) in a three-dimensional feature space. We construct support vector machine
classifier by solving the following optimization problem [33]
{
subject to
t(cl)
i
wT· ϕ
{
x(cl)
1
,x(cl)
2
,··· ,x(cl)
ncl
}
. Each point is described as
min
w,b,ξ
1
2wT· w + C∑ncl
x(cl)
i
j=1ξj
)
}
((
)
+ b
≥ 1 − ξi,ξi≥ 0.
The target label t(cl)
between the slack variable (ξi) penalty and the margin (wT· w) [29]. The dual form of the above classifier reads as
follows
{∑ncl
subject to
j
i
is specified as 1 (normal) or -1 (ischemic ST episode). The parameter C controls the trade-off
max
α
j=1αj−1
∑ncl
)
2αT· Hα
αj= 0,
}
j=1t(cl)
0 ≤ αj≤ C
where the matrix H is expressed as
(
new pattern y, we examine decision function, sgn
{
classification rate.
Hij≡ t(cl)
i
t(cl)
j
K
x(cl)
i
,x(cl)
j
)
= t(cl)
i
t(cl)
j
ϕ
(
x(cl)
i
· ϕ
(∑ncl
(
j=1t(cl)
x(cl)
j
)
= t(cl)
αjK
i
t(cl)
j
x(cl)
j
e−1
3
)
???x(cl)
+ b
i
−x(cl)
)
j
???
2
[33]. When we classify a
j
(
,y
. Whenever the input training set
x(cl)
1
,x(cl)
2
,··· ,x(cl)
ncl
}
was changed, we varied the parameter C to find its value which produces the highest
Experiments setting
We used kernel density estimation and support vector machine methods to evaluate the proposed approach. We
completed the experiment for each channel and record available in the European ST-T database. First, we trained
the classifier based on a subset of ST episodes and normal ECG. Then we tested how well the feature values
discriminated the two classes, ST episode and normal. When we formed the ST episode data, we used all the
ischemic ST episodes except ST deviations data resulted from non-ischemic causes such as position related changes
in the electrical axis of the heart. To preserve balance between ST episode and normal ECG data, we collected
normal data from the beginning of each record as much as the amount of ST episode data. When dividing the data
into training and test sets, we assigned one tenth of data to the training data, and the rest to the test data. In the cases
of e0106 lead 0, e0110, e0136, e0170, e0304, e0601, and e0615 records, we constructed the training data of one
third of all data and test data of two thirds because these records had much small ischemic ST episode data. To
avoid ambiguous region between ischemic ST episode and normal ECG, we removed 10 seconds amount of ECG
data from each side of the boundary.
When we classify a test set {yi}, four quantities are computed: true positive (TP), false negative (FN), false positive
(FP), and true negative (TN). TP is a number of ischemic events correctly detected. FN is a number of erroneously
rejected (missed) ischemic events. FP is a number of non-ischemic, that is, normal parts which the classifier
8
Page 10
erroneously detected as ischemic events. TN is a number of normal parts which our classifier correctly rejected as
non-ischemic events [34]. These are numbers of corresponding yipoints which were obtained by averaging three
feature values of successive five beats in Algorithm 2. The sensitivity and specificity are expressed in a usual
fashion, Se = TP/(TP + FN) and Sp = TN/(TN + FP) respectively [6].
We tested the classifiers by counting how many ST episodes were correctly caught, out of 367 episodes in the 85
records of European ST-T database. For an interval of ischemic ST episode data, we formed n test points
{y1,y2,··· ,yn} from the data (Algorithm 2), and classified each test point and then counted numbers of two
classes, “ischemic” and “normal”. If the number of class “ischemic” was larger than n/2, we declared the interval
to be an ischemic ST episode. The experiments were completed for 367 ischemic ST episodes.
We compared the results of kernel density estimation (KDE) and support vector machine (SVM) methods with
those formed by artificial neural network (ANN). The corresponding ANN classifier exhibits the following
topology. The input layer has three nodes which accept feature1, feature2and feature3respectively. The
output layer has two nodes which have target values (1,0) and (0,1) in the cases of “ischemia” and “normal”
classes, respectively. We initialized bias weights as 0, and assigned random values between -1.0 and 1.0 to the
weights of the network. The learning was carried out by running the backpropagation method [17] for 3000
iterations. We used a sigmoid activation function 1/(1 + e−x)and set learning rate 0.01. We adopted various
the number in each parenthesis represents a number of nodes in the corresponding hidden layer. We used stochastic
(incremental) gradient descent method to alleviate some drawbacks of the standard gradient descent method,
see [17].
topologies of hidden layers such as 3 → (5) → (5) → 2, 3 → (6) → 2, 3 → (7) → 2 and 3 → (8) → 2 where
Results
KDE with various kernels
We can use various kernels in kernel density estimation. If we have training points {x1,x2,··· ,xn} and a test point
y, the probability density at y is given as follows [35].
p(y) =1
n
n
∑
i=1
kG
b1b2b3e−1
2u2
i
(Gaussian),
p(y) =1
n
n
∑
(
i=1
kR
b1b2b31{|ui|≤1}
(Rectangular),
p(y) =1
n
n
∑
i=1
kE
b1b2b3
1 − u2
i
)
1{|ui|≤1}
(Bartlett-Epanechnikov),
p(y) =1
n
n
∑
n
∑
∑
i=1
kB
b1b2b3
(
(
1 − u2
i
)21{|ui|≤1}
)31{|ui|≤1}
(Byweight),
p(y) =1
n
i=1
kTriw
b1b2b3
1 − u2
i
(Triweight),
p(y) =1
n
n
i=1
kTria
b1b2b3
(1 − ui)1{|ui|≤1}
(Triangular).
Here kG, kR, kE, kB, kTriwand kTriaare constants, and u2
three feature values. The indicator function 1{|ui|≤1}is given as follows.
iis given as u2
i≡∑3
j=1
([y]j−[xi]j
bj
)2
because we use
1{|ui|≤1}=
{
1
0
(if |ui| ≤ 1)
(otherwise)
Table 1 shows classification results for various kernels. In all cases we used Daubechies8 wavelet to produce
9
Page 11
training and test sets. We took each bandwidth b(cl)
1 ≤ i ≤ 3. The “detect” means how many ST episodes our classifier correctly detected, out of total 367 episodes.
The “factor” in this table specifies how we multiplied on the mean(cl)
this factor from 0.1 to 3.0, and selected the one for which a sum of sensitivity and specificity values attains a
maximum. Because the Gaussian kernel produced best results, in the sequel we will use the Gaussian kernel.
Table 2 shows the results with respect to various kernel bandwidths.
i
= mean(cl)
i
· factor for class cl, ischemic or normal, and
i
to form the kernel bandwidth b(cl)
i
. We varied
Table 1 Classification results with respect to various kernels
kernel factor
Gaussian 0.5
Rectangular1.5
Epanechnikov1.7
Byweight 2.0
Triweight2.1
Triangular1.8
Se.
0.939
0.892
0.904
0.912
0.916
0.908
Sp.
0.912
0.913
0.915
0.916
0.916
0.917
TP TN
21441
21460
21522
21533
21542
21554
FP
2075
2056
1994
1983
1974
1962
FN
1794
3185
2811
2600
2471
2714
detect
349
329
335
333
336
334
27600
26209
26583
26794
26923
26680
Table 2 Classification results of Gaussian kernels with respect to various bandwidths
factorSe. Sp.
0.20.9430.867
0.30.9440.893
0.40.9420.905
0.50.939 0.912
0.6 0.9340.915
0.70.929 0.916
0.8 0.9240.916
TPTN
20399
20996
21279
21441
21526
21550
21529
FP
3117
2520
2237
2075
1990
1966
1987
FN
1666
1649
1697
1794
1941
2076
2246
detect
353
352
351
349
343
338
337
27728
27745
27697
27600
27453
27318
27148
Results for KDE, SVM and ANN with various wavelets
We examined the classifiers to find out how their performance depends on the mother wavelets which were used to
produce training and test sets in Algorithm 1. We used 7 wavelets, Haar, Daubechies4, Daubechies8,
Daubechies10, Coiflet6, Coiflet12 and Coiflet18 [22,36]. The number forming a part of the name of each wavelet
designates the length of filter which characterizes corresponding wavelet. Figure 6 shows selected shapes of
wavelet functions except for the Haar wavelet which is given as
{
−1
Haar (t) =
1
(0 ≤ t ≤ 1/2)
(1/2 ≤ t ≤ 1).
Table 3 shows the classification results obtained for KDE. The kernel bandwidth is expressed as b(cl)
for each class cl and 1 ≤ i ≤ 3. We used Gaussian kernel.
i
= mean(cl)
i
/2
Table 3 Classification results for KDE with respect to various wavelets
waveletSe.
Haar0.915
Daubechies4 0.936
Daubechies80.939
Daubechies100.942
Coiflet60.934
Coiflet12 0.932
Coiflet180.937
Sp.
0.893
0.906
0.912
0.916
0.900
0.914
0.919
TPTN
20245
21130
21441
21585
21837
21757
21612
FP
2420
2186
2075
1969
2430
2041
1911
FN
2418
1886
1794
1710
2027
2045
1859
detect
339
343
349
348
349
349
349
25906
27488
27600
27862
28586
27846
27721
10
Page 12
Figure 6 Shapes of various wavelets. (a) Daubechies4, (b) Daubechies8, (c) Daubechies10, (d) Coiflet6, (e)
Coiflet12 and (f) Coiflet18
Table 4 shows the classification results for the KDE with respect to various bandwidths and wavelets. The first
column for each wavelet item represents the sum of sensitivity and specificity. The second column shows how
many ST episodes were correctly detected. We used the kernel bandwidths b(cl)
cl and 1 ≤ i ≤ 3. The sum of sensitivity and specificity becomes maximum when the bandwidth b(cl)
b(cl)
ii
/2.
i
= mean(cl)
i
· factor for each class
i
is around
≈ mean(cl)
Table 4 Classification results for KDE versus selected values of bandwidths and types of wavelets
factorHaarDaub4Daub8
0.2 1.7703481.797348 1.811353
0.3 1.7973501.8253501.837 352
0.41.808345 1.838 3441.847351
0.5 1.8083391.8423431.851349
0.61.806336 1.840339 1.849343
0.7 1.800 3311.836 3361.846338
0.81.7923251.8293341.839337
Daub10
1.817
1.843
1.856
1.859
1.857
1.853
1.848
Coif6Coif12
1.812
1.833
1.844
1.846
1.843
1.837
1.831
Coif18
1.825
1.848
1.857
1.856
1.854
1.849
1.842
355
356
353
348
343
338
335
1.794
1.822
1.833
1.834
1.832
1.829
1.824
356
354
354
349
345
340
334
360
357
355
349
346
341
339
354
353
354
349
346
344
340
Table 5 shows the classification results obtained for SVM. The parameter C controls the trade-off between the slack
variable (ξi) penalty and the margin (wT· w). We examined the classification accuracy versus the values of C
changing from 0.1 to 300.0 in step of 0.1, and selected the one that made the sum of sensitivity and specificity
maximal.
Table 5 Classification results for SVM for various wavelets
wavelet
C
Haar 291.3
Daubechies4242.9
Daubechies8 245.5
Daubechies10174.3
Coiflet6288.2
Coiflet1252.8
Coiflet1823.4
Se.
0.924
0.937
0.941
0.943
0.933
0.929
0.936
Sp.
0.907
0.923
0.923
0.927
0.918
0.918
0.927
TPTN
20547
21519
21712
21838
22284
21858
21805
FP
2118
1797
1804
1716
1983
1940
1718
FN
2161
1847
1736
1678
2042
2134
1888
detect
345
349
355
349
348
348
352
26163
27527
27658
27894
28571
27757
27692
Table 6 shows the classification results obtained by ANN. The number in parenthesis represents the number of
nodes in the corresponding hidden layer. The first, second and third column express sensitivity, specificity and the
“detect” respectively. We experimented 10 times, and averaged the results because we obtained different results
each time due to the random initialization of weights.
Table 6 Results for ANN classifiers with respect to various wavelets and sizes of hidden layers
wavelet3 → (5) → (5) → 2
Haar0.8510.920311.80.8660.916
Daub40.866 0.932304.20.8810.930
Daub80.8640.931 307.20.878 0.929
Daub10 0.866 0.939312.5 0.882 0.935
Coif60.8480.930 311.20.8680.920
Coif120.8550.936310.50.8680.933
Coif180.8740.938 311.80.8830.937
3 → (6) → 23 → (7) → 2
0.8670.917
0.8800.932
0.875 0.931
0.8850.935
0.8630.923
0.8660.935
0.884 0.936
3 → (8) → 2
0.8670.916
0.8810.931
0.8800.932
0.8820.937
0.8720.927
0.8740.935
0.8860.937
319.2
313.7
317.9
321.8
319.3
318.2
319.2
319.1
315.2
319.4
325.5
319
319.6
320.2
320.7
317.8
319.4
325.1
321
319.8
323.5
Tables 3, 5 and 6 show the Daubechies8 and Daubechies10 wavelets give us superior results. The shapes of these
two wavelets are similar to typical ECG waveforms [37,38]. From now on, we use the Daubechies8 wavelet
exclusively.
11
Page 13
Effects of baseline wandering in ECG
Table 7 shows the classification results by KDE, SVM and ANN when we did not remove baseline wandering in
ECG. If we compare this table with the Tables 3, 5 and 6, we get to know it is essential to remove baseline
wandering in Algorithm 1. In the Table 7, we selected the kernel bandwidths b(cl)
b(cl)
ii
/2, (1 ≤ i ≤ 3). We used the ANN classifier with sizes of layers expressed as 3 → (7) → 2. The
results of ANN were obtained by averaging results for 10 repetition of the experiments. The parameter C of the
SVM classifier was 297.9.
i
in the KDE classifier as
= mean(cl)
Table 7 Classification results for KDE, SVM and ANN without removal of baseline wandering
Se. Sp.TP
KDE0.8520.83722132
SVM 0.8700.835 22605
ANN0.7850.82720388.8
TN
17211
17161
17005.7
FP
3342
3392
3547.3
FN
3839
3366
5582.2
detect
328
326
296.1
If we use unsuitable wavelet scale like the one in Figure 3 to remove baseline wandering, it becomes difficult to
obtain good results. As the sampling frequency was 250 Hz, we selected the wavelet scale⌈log2250⌉= 8 in
bandwidth setting in KDE and layer composition of ANN classifier were same as the Table 7. The middle row of
wavelet scale 8 in Table 8 was our choice in Algorithm 1. Each entry in the row of wavelet scale 8 has counterparts
in “Daubechies8” rows in Tables 3, 5 and 6.
Algorithm 1. Table 8 shows the classification results when wrong wavelet scales were selected. The kernel
Table 8 Classification results for KDE, SVM and ANN with incorrectly selected wavelet scales to remove
baseline wandering
KDE
Se.Sp.detecttrade-off
scale 60.8510.785319 288.2
scale 70.9310.905344232.8
scale 8 0.9390.912349 245.5
scale 90.9290.906347110.4
scale 10 0.9210.896340 97.5
SVM
Se.
0.842
0.932
0.941
0.930
0.916
ANN
Sp.
0.864
0.933
0.931
0.918
0.896
Sp.
0.818
0.915
0.923
0.918
0.907
detect
318
349
355
350
343
Se.
0.748
0.876
0.875
0.859
0.838
detect
277.5
320.7
319.4
320.7
313.2
Effects of simulated noise
We examined performance of the classifiers when we added simulated noise into the original ECG signal. We
modeled the noise as the sum of wandering baseline and AC power line 60 Hz noise.
Let us assume we have original signal data, ecg (i) (1 ≤ i ≤ n). First, we compute mean value and standard
deviation of the ECG signal as m =(∑n
(
where a is an amplification factor and b is an angular frequency of the added baseline. Here Samp Freq means
sampling frequency which was 250 Hz in our case. We varied a from 0.1 to 1.0 in step of 0.1, and selected b to be
equal to 2, 4 or 6.
i=1ecg (i))/n and s =
(
√(∑n
)
i=1(ecg (i))2)/n − m2. Then we form a new
+1
Samp Freq
signal ecg′(i) by
ecg′(i) = ecg (i) + s · a ·
sin
b ·
i
Samp Freq
2cos
(
2π60 ·
i
))
Figure 7 shows the original ECG and its noise-impacted version. Tables 9, 10 and 11 show the experimental results
for the noisy ECG signal. The first, second and third column in each b item represent the sensitivity, specificity and
the “detect” respectively. The kernel bandwidth is set as b(cl)
composition of the ANN classifier was 3 → (7) → 2. The first column in each b item in Table 10 includes the
trade-off parameter C which produced best results.
i
= mean(cl)
i
/2 for the KDE classifier. The layer
12
Page 14
Table 9 Classification results for KDE versus varying intensity of noise
ab=2
0.1 0.9370.903346
0.2 0.9330.893346
0.30.9290.884 342
0.40.9250.864346
0.5 0.9160.858346
0.60.9060.866343
0.7 0.8910.856 340
0.80.8870.850 339
0.90.8810.849341
1.00.870 0.844335
b=4
0.902
0.892
0.873
0.852
0.848
0.832
0.815
0.799
0.798
0.787
b=6
0.904
0.879
0.856
0.834
0.817
0.803
0.794
0.793
0.778
0.779
0.934
0.927
0.915
0.903
0.882
0.872
0.859
0.852
0.837
0.832
346
341
340
344
333
329
325
325
312
317
0.933
0.916
0.908
0.885
0.871
0.858
0.848
0.838
0.837
0.824
346
337
338
327
333
321
322
317
301
319
Table 10 Classification results for SVM with varying intensity of the simulated noise
ab=2
0.1 79.00.9360.920 350
0.2143.00.9290.914347
0.384.1 0.9230.910 348
0.4124.50.919 0.899348
0.565.1 0.9090.895347
0.696.70.9030.887 343
0.7119.30.8880.880 339
0.8 201.8 0.8930.863 334
0.9278.20.888 0.857335
1.0211.70.879 0.859324
b=4 b=6
254.6
168.5
50.9
51.3
86.7
67.3
27.2
24.9
16.7
58.2
0.940
0.929
0.915
0.903
0.882
0.879
0.868
0.851
0.841
0.835
0.915
0.904
0.891
0.869
0.859
0.844
0.832
0.829
0.816
0.806
352
349
341
341
333
330
329
320
308
311
147.4
33.9
72.3
52.0
79.4
71.6
36.6
27.1
71.0
20.3
0.936
0.922
0.902
0.888
0.872
0.865
0.856
0.848
0.850
0.837
0.916
0.888
0.875
0.848
0.828
0.809
0.796
0.785
0.777
0.776
349
339
337
330
328
323
313
310
315
317
Table 11 Classification results for ANN with varying intensity of the simulated noise
ab=2
0.10.8800.929 322.2
0.20.867 0.928322.8
0.30.8630.929319.3
0.40.856 0.917320.5
0.50.8300.918320.1
0.60.8180.916 314.9
0.70.803 0.910309.6
0.80.8140.898 308.6
0.9 0.7980.892303.4
1.00.777 0.890297.1
b=4
0.921
0.922
0.906
0.887
0.874
0.858
0.865
0.839
0.838
0.807
b=6
0.922
0.905
0.895
0.874
0.850
0.833
0.811
0.801
0.795
0.788
0.870
0.844
0.837
0.823
0.811
0.803
0.759
0.753
0.723
0.724
320.8
315.1
313.4
312.3
309.1
299.2
281.9
288.2
273.9
265.3
0.874
0.840
0.811
0.784
0.778
0.762
0.761
0.745
0.728
0.728
320.3
314.2
303.4
300.6
300.3
292
283.2
280.7
261.7
269.3
Figure 7 ECG signal affected by synthetic noise. (a) Original signal. (b) Noise-affected signal when a is 1.0 and
b is 6.0
Comparison with others’ works
To compare our approach with others’ works, we tested the classifiers on 10 selected records, e0103, e0104, e0105,
e0108, e0113, e0114, e0147, e0159, e0162 and e0206. Table 12 shows results of comparison. The papers by
Papaloukas et al. [10], Goletsis et al. [39], Exarchos et al. [13] and Murugan et al. [15] in Table 12 dealt with the 10
records.
13
Page 15
Table 12 Results of comparative analysis
Researcher
Papaloukas et al. [10]
Goletsis et al. [39]
Exarchos et al. [13]
Murugan et al. [15]
Present work by KDE
Present work by SVM
Sensitivity
0.90
0.912
0.912
0.923
0.945
0.957
Specificity
0.90
0.909
0.922
0.943
0.943
0.953
We used the Daubechies8 wavelet in Algorithm 1 to analyze the ECG waveform, and took the kernel bandwidths
b(cl)
ii
/2 for the KDE classifier with Gaussian kernel. We used SVM classifier with C = 281.1.
= mean(cl)
Discussion
Table 1 showed how the classification results were dependent on various kernel functions in kernel density
estimation. Gaussian kernel produced best results.
Tables 3, 4, 5 and 6 show how the classification results depend on mother wavelets used in Algorithm 1.
Daubechies8 and Daubechies10 wavelets were best. Because we implemented wavelet transform program with the
use of matrix multiplication, we selected Daubechies8 wavelet to reduce computational burden. Daubechies10
wavelet did not produce much better classification accuracy than Daubechies8 wavelet.
Tables 2 and 4 indicate that the choice of kernel bandwidths was reasonable. When we took the kernel bandwidths
b(cl)
i
i
/2 for class cl, 1 ≤ i ≤ 3, we obtained best results except for the case of Coiflet18 wavelet. Even
in the case, the best parameter b(cl)
ii
choice in Tables 7, 8 and 9. In this way, we could automatically select 6 kernel bandwidths, and this exempted us
from choosing any numerical parameters.
= mean(cl)
= 0.4 · mean(cl)
was close enough to b(cl)
i
= mean(cl)
i
/2. We maintained this
The SVM classifiers in Tables 5, 7 and 8 produced better results than the KDE classifiers, but they required us to
determine optimal value of the parameter C. Whenever we used different wavelets on the same data set in Table 5,
we had to choose different trade-off parameter C. This was also the case in Table 8 where we intentionally selected
incorrect wavelet scales to remove baseline wandering.
Order of magnitude of feature3was very different from feature1and feature2. When we produced the feature
values using Daubechies8 wavelet in Algorithm 1, mean values of |feature1|, |feature2| and |feature3| were
7.327, 7.613 and 0.004, respectively. Thus we had to normalize the feature values to use them in classification.
Even if the orders of magnitude of feature1, feature2and feature3were very different, the equation of kernel
density estimation included a term
1
b(cl)
1
b(cl)
2
b(cl)
3
∑
ie
−1
2
∑3
j=1
[y]j−
[
j
x(cl)
i
]
j
b(cl)
2
. Furthermore each operand in the sum
comes in the form of
[y]j−
[
j
x(cl)
i
]
j
b(cl)
, normalization by kernel bandwidth. We thought these would be helpful to
overcome the difference of order of magnitude between feature1, feature2and feature3. This was a main
driving force to adopt the kernel density estimation.
We implemented the KDE and ANN classifier in C language for ourselves. For SVM classifier, we used libsvm
library [33]. We compiled the programs with gcc and g++ without using any SIMD (single instruction multiple
data) math library. Total amount of ECG text files which we used in our analysis was 200.4 MB. This amount is just
about voltage information not including time information. When we ran our programs to process the ECG text files
in Pentium4 3.2 GHz CPU, it took 243.0 seconds until the procedures of removing baseline wandering and
detecting time positions in Algorithm 1 were completed. This was when we used Daubechies8 wavelet. Feature
extraction in Algorithm 2, required 0.6 seconds. It took 1.2 seconds for the KDE classifier to process all the files.
14