Conference PaperPDF Available

Feature Relevance Assessment in Automatic Inter-patient Heart Beat Classification.

Authors:

Abstract and Figures

Long-term ECG recordings are often required for the monitoring of the cardiac function in clinical applications. Due to the high number of beats to evaluate, inter-patient computer-aided heart beat classification is of great importance for physicians. The main difficulty is the extraction of discriminative features from the heart beat time series. The objective of this work is the assessment of the relevance of feature sets previously proposed in the literature. For this purpose, inter-patient classification of heart beats following AAMI guidelines is investigated. The class unbalance is taken into account by using a support vector machine (SVM) classifier that integrates distinct weights for the classes. The performances of the SVM model with an appropriate selection of features are better than those of previously reported inter-patient classification models. These results show that the choice of the features is of major importance, and that some usual feature sets do not serve the classification performances. In addition, the results drop significantly when the class unbalance is not taken into account, which shows that this issue must be addressed to grasp the importance of the pathological cases.
Content may be subject to copyright.
FEATURE RELEVANCE ASSESSMENT IN AUTOMATIC
INTER-PATIENT HEART BEAT CLASSIFICATION
G. de Lannoy1,2, D. Franc¸ois1, J. Delbeke2and M. Verleysen1
1Machine Learning Group, Universit´
e catholique de Louvain
pl. du Levant 3, 1348 Louvain-la-Neuve, Belgium
2Departement of physiology and pharmacology, Universit ´
e catholique de Louvain
av. Hippocrate 54, 1200 Bruxelles, Belgium
{gael.delannoy, damien.francois, jean.delbeke, michel.verleysen}@uclouvain.be
Keywords: Heart beat classification, AAMI standards, Support vector machines, Unbalanced data, Feature selection,
Inter-patient classification.
Abstract: Long-term ECG recordings are often required for the monitoring of the cardiac function in clinical applica-
tions. Due to the high number of beats to evaluate, inter-patient computer-aided heart beat classification is of
great importance for physicians. The main difficulty is the extraction of discriminative features from the heart
beat time series. The objective of this work is the assessment of the relevance of feature sets previously pro-
posed in the literature. For this purpose, inter-patient classification of heart beats following AAMI guidelines
is investigated. The class unbalance is taken into account by using a support vector machine (SVM) classifier
that integrates distinct weights for the classes. The performances of the SVM model with an appropriate selec-
tion of features are better than those of previously reported inter-patient classification models. These results
show that the choice of the features is of major importance, and that some usual feature sets do not serve the
classification performances. In addition, the results drop significantly when the class unbalance is not taken
into account, which shows that this issue must be addressed to grasp the importance of the pathological cases.
1 INTRODUCTION
The analysis of the electrocardiogram (ECG) signal
provides critical information on the cardiac function
of patients. Long-term recordings of the ECG signal
are, for example, required for the clinical diagnosis
of some disease conditions, and for the evaluation of
new drugs during phase-one studies by pharmaceu-
tical groups. Such long-term recordings are usually
obtained using the popular Holter recorders.
These systems are ambulatory heart activity
recording units delivering signal storing capabilities
ranging from 24 to 48 hours and thus providing with
data of hundreds to thousands of heart beats. The
analysis is usually performed off-line by cardiolo-
gists, whose diagnosis may rely on just a few transient
patterns. Due to the high number of beats to evalu-
ate, this task is very expensive and reliable visual in-
spection is difficult. Computer-aided classification of
pathological beats is therefore of great importance.
However, this is a difficult task in real situations.
First, several sources of noise pollute the ECG signal.
Among these, power line interferences, muscular arti-
facts, poor electrode contacts and baseline wandering
due to respiration can sometimes be identified. Sec-
ond, the classes are very unbalanced since a vast ma-
jority of the heart beats are normal healthy beats and
just a small number of beats are pathological, though
those are of major importance. Third, artificial intel-
ligence methods require the extraction of discrimina-
tive features from the heart beat time series. The ex-
traction of the information available in the ECG signal
into a set of relevant features requires proper expertise
because it is difficult and crucial.
Computer-aided heart beat classification has been
addressed previously in the literature. Several fea-
tures characterizing the heart beats and several classi-
fication models have been investigated (Clifford et al.,
2006). However, very few reported works follow
the standards defined by the American Association
for Medical Instrumentation (AAMI), which makes it
very difficult to assess the relative merits of the meth-
ods and of the proposed extracted features (Associa-
tion for the Advancement of Medical Instrumentation,
1998). Also, the unbalanced classes issue is usually
not taken into account.
13
Furthermore, most of proposed methods require
labeled beats from the tested patient in the training
of the model and actually perform what could be re-
ferred to as “intra-patient” classification. By con-
trast, “inter-patient” classification consists in classi-
fying the beats of a new tested patient according to a
reference database built from data coming from other
patients. This is a much harder task of generalization
but it is also much more useful since labeled beats
from a new patient are usually not timely available in
real situations.
In this work, inter-patient classification of heart
beats following the AAMI guidelines is investigated.
First, the class unbalance is taken into account by us-
ing a support vector machine classifier that integrates
distinct weights for the classes depending on their pri-
ors. Second, a large number of distinct features pro-
posed in the literature are combined and evaluated,
and a discussion is made about the relevance of each
type of features.
The following of this paper is organized as fol-
lows. Section 2 briefly reviews the state of the art in
heart beat classification. Section 3 provides a short
overview of the theoretical background for the math-
ematical methods used in this work. Section 4 de-
scribes the methodology followed by the experiments
and Section 5 presents the results.
2 STATE OF THE ART
This section provides a short review of the state of the
art in supervised heart beat classification. Two kinds
of heart beat classification paradigms can be distin-
guished, corresponding to either intra-patient classi-
fication or inter-patient classification.
Inter-patient classification consists in classifying
the beats of a new tested patient according to a refer-
ence database and a model built from data from other
patients. This process thus implies generalization
from one patient to another. As far as intra-patient
classification is concerned, the reference database
must contain previously labeled beats from the tested
patient. The results that can be achieved are natu-
rally better than when inter-patient classification is
performed, but the patient labeled beats are usually
not available in real situations. Furthermore, because
pathological beats can be very rare, there is no guar-
antee that the few training beats that would be labeled
for this patient would contain representatives for each
class; and the classifier could possibly fail in predict-
ing something it has not learned.
Despite these major drawbacks, the majority of
previously reported work is about intra-patient classi-
fication. Different models have been proposed for this
task, including neural networks (Osowski and Hoai,
2001), k-nearest neighbors (Christov et al., 2006),
hidden Markov models (Cheng and Chan, 1998) and
support vector machines (Melgani and Bazi, 2008). A
comprehensive review of intra-patient classification
methods and their results can be found in (Clifford
et al., 2006).
As far as inter-patient classification is concerned,
the first study to establish a reliable inter-patient clas-
sification methodology following AAMI standards is
(Chazal et al., 2004). A linear discriminant analy-
sis (LDA) classifier model is trained and the results
are evaluated on an independent test set. The un-
balanced classes issue is addressed by introducing
weights in the linear discriminant functions. In (Park
et al., 2008), these classification results are improved
on the same dataset using SVM and other features.
Hierarchical SVMs are used to reduce the effect of
unbalanced classes.
The classification performances heavily rely on
the extraction of relevant features from the heart beat
time series. A variety of features have been proposed
to characterize the heart beats. The representation of
the heart beat signal by the coefficients of Hermite ba-
sis functions expansions is introduced for a clustering
application in (Lagerholm et al., 2000), and later used
for classification by (Osowski et al., 2004) and (Park
et al., 2008). Another type of features that has been
proposed is the representation of the heart beats by
higher order statistics, and in particular the cumulants
of order 2, 3 and 4 (Osowski and Hoai, 2001).
Another widely used group of features is mor-
phological features (latter referred to as segmenta-
tion features) (Christov et al., 2006; Chazal et al.,
2004). These features require the annotation of
PQRST waves and then summarize the morphology
of the heart beat series by their duration, area, Q-T
intervals, S-T intervals, the height of the QRS com-
plex, etc. In most of previously reported works, the
successive time differences between the R spikes of
heart beats (later referred to as R-R intervals) are al-
ways combined to the other features. However, the
intrinsic relevance of each type of features remains
unknown. In this paper, this relevance is investigated
using feature selection techniques (Franc¸ois, 2008;
Guyon et al., 2006).
3 THEORETICAL BACKGROUND
This section provides a brief theoretical background
on mathematical methods that are used in this work.
BIOSIGNALS 2010 - International Conference on Bio-inspired Systems and Signal Processing
14
3.1 Support Vector Machines
A support vector machine (SVM) is a supervised
learning method that was first introduced by Vap-
nik (Vapnik, 1999). The two-classes case is de-
scribed here, because its extension to multiple classes
is straightforward by applying the one-against-all or
one-against-one methods. Let us first define the pdi-
mensional feature vector xk={x1
k,x2
k,...,xp
k}and the
associated class value yk∈ {−1,1}for a given heart
beat kwith kranging from 1 to K,Kbeing the total
number of heart beats.
SVMs are linear machines that rely on a prepro-
cessing to represent the features in a higher dimen-
sion, typically much higher than the original feature
space. With an appropriate non-linear mapping ϕ(x)
to a sufficiently high dimensional space, finite data
from two categories can always be separated by a hy-
perplane. In SVMs, this hyperplane is chosen as the
one with the largest margin.
Assume each observation xkhas been transformed
to zk=ϕ(xk). The equation of the hyperplane in
the augmented space is defined as g(z) = atzwhere
both the weight and the transformed pattern vectors
are augmented by a0=w0and z0=1 respectively. A
separating hyperplane thus ensures that
ykg(zk)1k=1,...,K.(1)
The distance from any hyperplane to a trans-
formed pattern zis |g(z)|/||a||, and Eq. 1 implies that
ykg(zk)
||a|| b k =1,...,K(2)
where it is assumed that bis an existing positive mar-
gin. The objective is then to find the weight vector
athat maximizes b. As the solution vector can be
scaled arbitrarily, the constraint b||a|| =1 is usually
imposed, which is equivalent to minimizing ||a||2. By
constructing the Lagrangian, this primal optimization
can be reformulated in a so-called dual form that max-
imizes
L(α) =
K
k=1
αk1
2
K
k=1
K
j=1
αkαjykyjzt
kzj,(3)
with respect to the Lagrange multipliers αkassociated
to inequalities 1, subject to the constraints
K
k=1
ykαk=0 0 αkC(4)
where Crepresents the regularizing parameter and de-
termines the balance between the complexity of the
model and the classification error. These equations
can be efficiently solved using quadratic program-
ming. For this type of optimization, there exist many
highly effective learning algorithms. A common
method for solving the problem is Platt’s Sequen-
tial Minimal Optimization (SMO) algorithm, which
breaks the problem down into 2-dimensional sub-
problems that may be solved analytically, eliminating
the need for a numerical optimization algorithm such
as conjugate gradient methods (Platt, 1999).
In the dual form, the explicit form of the mapping
function ϕmust not be known as long as the kernel
function K(xi,xj) = ϕ(xi)ϕ(xj)is defined. The ker-
nel can for example be the linear kernel K(xi,xj) =
xt
ixjor the radial basis function kernel K(xi,xj) =
exp(γ||xixj||2)where γis a kernel parameter to
be tuned.
3.2 Hermite Basis Functions
The representation of the heart beat signal via Her-
mite basis functions (HBF) was first introduced by
(Lagerholm et al., 2000) for a clustering application
and later by (Osowski et al., 2004) for classifica-
tion. This approach exploits similarities between the
shapes of HBF and typical ECG waveforms. Let us
denote the heart beat signal by x(t). Its expansion into
a Hermite series of order Nis written as
x(t) =
N1
n=0
cnφn(t,σ)(5)
where cnare the expansion coefficients and σis the
width parameter. φn(t,σ)are the Hermite basis func-
tions of the nth order defined as follows:
φn(t,σ) = 1
pσ2nn!πet2/2σ2Hn(t/σ)(6)
where Hn(t/σ)is the Hermite polynomial of the nth
order. The Hermite polynomials satisfy the following
recurrence relation:
Hn(x) = 2xHn1(x)2(n1)Hn2(x)(7)
with H0(x) = 1 and H1(x) = 2x.
The higher the order of the Hermite polynomial,
the higher its frequency of changes in the time do-
main, and the better the capability of the expansion in
Eq. 5 to reconstruct the signal (Clifford et al., 2006).
The width parameter σcan be tuned to provide a good
representation of beats with large differences in dura-
tions. The coefficients cnof the HBF expansion can
be estimated by minimizing the sum of squared errors
using singular value decomposition and the pseudo-
inverse technique. These coefficients summarize the
shape of the heart beat signal and can be treated as the
features used in the classification process.
FEATURE RELEVANCE ASSESSMENT IN AUTOMATIC INTER-PATIENT HEART BEAT CLASSIFICATION
15
3.3 Higher Order Statistics
The statistical properties of the heart beat signal can
be represented by its higher order statistics (HOS).
The cumulants of order two, three and four are usu-
ally used (Osowski and Hoai, 2001). Assuming the
heart beat signal x(t)has a zero mean, its cumulant
Ci
xof order ican be computed as follows:
C2
x(τ1) = E{x(t)x(t+τ1)}
C3
x(τ1,τ2) = E{x(t)x(t+τ1)x(t+τ2)}
C4
x(τ1,τ2,τ3) = E{x(t)x(t+τ1)x(t+τ2)x(t+τ3)}
C2
x(τ1)C2
x(τ3τ2)
C2
x(τ2)C2
x(τ3τ1)
C2
x(τ3)C2
x(τ2τ1)
where Eis the expectation operator and τ1,τ2,τ3are
the time lags.
4 METHODOLOGY
Let us assume that a reference database has been ob-
tained and labeled by a cardiologist, with all patholo-
gies of interest being represented. Given a new ECG
signal, for example recorded using an Holter system,
one wants to use the information contained in the ref-
erence database in order to predict the pathologies
present in the new signal.
4.1 ECG Filtering
The filtering procedure defined in (Chazal et al.,
2004) is used in this work. The ECG signal is first fil-
tered by two median filters. The first median filter is
of 200 msec width and removes the QRS complexes
and the P waves. The resulting signal is then pro-
cessed with a second median filter of 600 msec width
to remove the T waves. The signal resulting from the
second filter operation contains the baseline wander-
ings and can be subtracted from the original signal.
Powerline and other high frequency artifacts are then
removed from the baseline corrected signal with a FIR
filter.
4.2 Heart Beat Extraction
Several computer-aided annotation algorithms have
been reported in the literature in order to automati-
cally detect the characteristic points of the ECG (Clif-
ford et al., 2006). The standard ecgpuwave1segmen-
1see http://www.physionet.org/physiotools/software-
index.shtml
tation software provided with the MIT-BIH database
is used to provide estimates of such characteristic
points. Nevertheless, even the best annotation algo-
rithms sometimes fail in detecting the exact beginning
of the beats (the start of the P wave). However, the R
spike has a very high detection rate and can be used
as a more reliable marker. Defining a static window
around the R spike is thus a safer way to separate the
beats without missing a large amount of data. A win-
dow of 250 msec before and after the R position is
used in this work.
4.3 Feature Extraction
Five groups of features are extracted from each heart
beat: R-R intervals, Hermite basis function expansion
coefficients, higher order statistics, segmentation fea-
tures and patient-normalized segmentation features.
1. R-R intervals: This group consists of six features
built from the R-R interval series. The first three
features are the R-R interval to the previous beat,
the R-R interval to the next beat and the average
of R-R intervals in a window of 10 surroundings
beats. The other next three interval features corre-
spond to the ratio between the previous three val-
ues and their mean value for this patient. These
last features are thus independent from the mean
normal behavior of the heart of patients, which
can naturally be very different between individ-
uals, possibly misleading the classifier.
2. Segmentation features: A large variety of 24 fea-
tures are computed from the estimated character-
istic points. Some of these features are a boolean
flag indicating the presence/absence of QRS, P
and T waves. If the waves are present, their dura-
tion, maximum and minimum values, area, stan-
dard deviation, skewness and kurtosis are com-
puted as features. The complete list of segmen-
tation features can be found in (Christov et al.,
2006).
3. HBF coefficients: The parameters for computing
the HBF expansion coefficients as defined in (Park
et al., 2008) are used. The order of the Hermite
polynomial is set to 20, and the width parameter σ
is estimated so as to minimize the reconstruction
error. Figure 1 shows a normal beat and its recon-
struction from the estimated HBF coefficients.
4. Higher order statistics: The 2nd, 3rd and 4th or-
der cumulant functions are computed. The pa-
rameters as defined in (Osowski et al., 2004) are
used: the lag parameters range from -250 msec to
250 msec centered on the R spike and 10 equally
spaced sample points of each cumulant are used as
BIOSIGNALS 2010 - International Conference on Bio-inspired Systems and Signal Processing
16
features, for a total of 30 features. Figure 2 shows
an example of cumulants for a normal beat.
5. Patient-normalized segmentation features: This
group of features contains the same features as the
segmentation group, but the values are normalized
by their mean value for each patient. The normal-
ization is obviously not applied to boolean seg-
mentation features. Here again, the objective is
to make each feature independent from the mean
behavior of the heart of a patient, because it can
naturally be very different between individuals.
Figure 1: (1) A normal ECG beat and (2) its reconstruction
from HBF coefficients.
Figure 2: (1) A normal ECG beat (2) its cumulant of the 2nd
order (3) its cumulant of the third order and (4) its cumulant
of the fourth order.
4.4 Classification Model
Classification models based on SVMs with the one-
against-one multi-class strategy are considered in this
study. The training of the model is performed on a ref-
erence database, and this model is then applied to get
a prediction of the class label of new heart beats from
another patient. Several types of kernels are evaluated
in this study, and the linear kernel always outperforms
the other kernels. As this is in accordance with pre-
vious works on heart beat classification using SVMs
(Park et al., 2008), only the results with the linear ker-
nel are reported here.
The relative proportions of the classes of the avail-
able training examples influence dramatically the per-
formances of the SVM classifier. If a few number of
classes dominates the training data, the performances
of SVM drop significantly (Yinggang and Qinming,
2006). One solution to this problem is to randomly
downsample the larger classes, but this results in a
waste of potentially useful data. A better solution fol-
lowed in this work is to include all training examples
but reduce the contribution of dominating classes in
the training process.
This is achieved by weighting the parameter Cof
class iin Eq. 4 to wiC. The problem is to find the opti-
mal wivalues. In a two-class problem, it can easily be
estimated by cross-validation. In a multi-class prob-
lem, this is much more difficult; in our experiments
these values are set according to the prior probabili-
ties of each class in the training data. Let us define
ni, the number of beats in class i. The weight associ-
ated to each class iis then set as wi=ni/Nwhere N
is the total number of beats. Intuitively, the addition
of weights in the classifier for small classes means
that more attention is given to pathological classes
and less to normal beats. This is in accordance with
doctors who clearly prefer that a healthy patient is
wrongly diagnosed ill rather than an ill patient is di-
agnosed healthy and left untreated.
4.5 Performance Evaluation
In a heart beat classification task, around 90% of beats
are normal beats and a dummy classifier which would
always predicting the normal class would get 90% ac-
curacy. For this reason, it is important to look at class
accuracies separately and to use the average of these
class accuracies as performance measure rather than
considering the overall classification accuracy.
FEATURE RELEVANCE ASSESSMENT IN AUTOMATIC INTER-PATIENT HEART BEAT CLASSIFICATION
17
Table 1: Grouping of the MIT-BIH labeled heart beat types according to the AAMI standards.
Normal beats (N) Supraventricular ectopic
beats (S)
Ventricular ectopic beats
(V)
Fusion beats (F)
Normal beats Atrial premature beat Premature ventricular
contraction
Fusion of ventricular and
normal beats
Left bundle branch block
beats
Aberrated atrial prema-
ture beat
Ventricular escape beats
Right bundle branch
block beats
Nodal (junctional) pre-
mature beats
Atrial escape beats Supraventricular prema-
ture beats
Nodal (junctional) es-
pace beats
5 EXPERIMENTS AND RESULTS
Data from the MIT-BIH arrhythmia database (Gold-
berger et al., 2000) are used in our experiments.
The database contains 48 half-hour long ambulatory
recordings obtained from 48 patients, for a total of ap-
proximatively 110’000 heart beats labeled into 15 dis-
tinct types. Following the AAMI recommendations,
the four recordings with paced beats are rejected and
the MIT-BIH labeled types are then grouped into four
more clinically relevant heart beat classes (Associa-
tion for the Advancement of Medical Instrumentation,
1998) (see Table 1 for grouping details):
N-class includes beats originating in the sinus node
(normal and bundle branch block beat types);
S-class includes supraventricular ectopic beats;
V-class includes ventricular ectopic beats (VEBs);
F-class includes beats that result from fusing normal
and VEBs.
The dataset configuration is the same as in (Chazal
et al., 2004; Park et al., 2008). The 44 available
recordings are divided in two independent datasets of
22 recordings each with approximatively the same ra-
tio of heart beats classes. The first dataset is the train-
ing set, and is used to build the model. The second
dataset is the test set, and is used to obtain an inde-
pendent measure of the performances of the classifier.
Table 2 shows the number of beats in each class and
their frequencies in the two datasets.
All features introduced in Section 4.3 are com-
puted and all the possible combinations of the five
feature groups are evaluated with the weighted SVM
model, for a total of 31 configurations. Table 3 holds
the most interesting results out of the 31 configura-
tions, together with the results of previously reported
models that also followed AAMI guidelines and inter-
patient classification. The best and worst results of
the SVM model when no weights are defined (“raw”
models) are also shown in the table.
The most remarkable observation is that when R-
R features are not included in the model, is has been
impossible to obtain more than 55.4% of mean accu-
racy. The other feature groups, when included alone
in the model, always lead to an accuracy below 50%.
Best overall performances are obtained with the
combination of R-R intervals and segmentation fea-
tures with 83.0% accuracy. The addition of any other
features to this selection always leads to a lower accu-
racy. In particular, the normalization of the segmen-
tation features with respect to each patient provides a
lower accuracy when these features are coupled with
R-R intervals than when their non normalised version
is used. It is also interesting to note that R-R interval
features yield 80.8% of accuracy by themselves and
are clearly the most important features to include in
the model.
The weights included in the SVM to take the un-
balanced class ratio into account are also of major im-
portance. If no weights are defined, the best average
accuracy that can be obtained by the SVM model de-
creases to 54.3%, with an accuracy of only 7.4% for
class S and of 33.0% for class F which is clearly un-
acceptable.
The weighted SVM model with the best selection
(or any of the top three selections in Table 3) achieves
performances significantly higher than previously re-
ported models, with a reduced number of features.
Furthermore, the weighted SVM model yields better
results for each of the pathological classes with a class
accuracy always over 80%.
BIOSIGNALS 2010 - International Conference on Bio-inspired Systems and Signal Processing
18
Table 2: Distribution of heart beat classes in the two independent datasets.
N S V F Total
Training 45801 938 3708 414 50861
90.05% 1.84% 7.29% 0.81% 100%
Test 44202 1835 3204 388 49629
89.06% 3.7% 6.46% 0.78% 100%
Table 3: Selection of the most interesting results out of the 31 configurations with the weighted SVM model (sorted in
decreasing average accuracy). Best and worst results of the SVM model with no weights (“raw” SVM) are also displayed.
Results of previously reported comparable models are also included.
Model Feature sets Results
R-R Seg HBF HOS NSeg N S V F Avg.
Weighted SVM • • 75.1 89.3 86.9 80.7 83.0
(Top 5) 77.8 63.8 86.9 94.6 80.8
• • 75.4 89.4 75.4 66.8 76.8
• • 83.8 78.7 73.0 35.1 67.6
88.5 78.6 74.2 4.6 61.5
...
(Selected) 63.2 61.5 80.2 16.8 55.4
• • 79.5 28.8 73.8 3.6 46.4
78.0 25.9 78.7 0.3 45.7
78.1 2.1 60.8 6.2 36.8
...
(Bottom 3) 47.8 13.5 57.4 15.2 33.5
75.2 3.7 52.6 0.8 33.1
68.5 4.5 53.7 1.0 31.9
“Raw” SVM • • 96.2 7.4 80.5 33.0 54.3
(Best and worst) 100.0 0 0 0 25.0
Hierar. SVM (Park et al., 2008) • • 86.2 82.6 80.8 54.9 76.1
Weighted LDA (Chazal et al., 2004) • • 86.7 53.3 67.3 71.6 69.7
6 CONCLUSIONS
The classification of heart beats is of great impor-
tance for clinical applications involving the long-term
monitoring of the cardiac function. The main diffi-
culty is the extraction of discriminative features from
the heart beat time series. The goal of this work is
the assessment of the relevance of feature sets often
used in the literature. Five feature groups are consid-
ered: R-R intervals, segmentation features, HBF coef-
ficients, higher-order statistics and patient-normalized
segmentation features.
For this purpose, this work has followed and mo-
tivated the use of:
AAMI guidelines for the establishment of reliable
classifiers and for the evaluation of their relative
merits;
inter-patient rather than intra-patient classifica-
tion;
weighted multi-class SVM models to address the
class unbalance problem;
preprocessing and dataset preparation according
to the literature;
and the average class accuracy as performance
measure.
Best results are obtained with the combination of
R-R intervals and segmentation features, with an aver-
age class accuracy of 83.0%. Any addition of features
to these two groups leads to a lower performance. In
particular, the normalization of the segmentation fea-
tures with respect to each patient provides a lower
accuracy when these features are coupled with R-R
intervals. When R-R intervals are not added in the
model, it has been impossible to obtain more than
55.4% (obtained by the segmentation features alone)
which is unacceptable. To the opposite, it is interest-
ing to observe that R-R intervals alone already lead to
80.8% of average accuracy.
These results show that R-R intervals are clearly
the most significant features to include in a heart beat
FEATURE RELEVANCE ASSESSMENT IN AUTOMATIC INTER-PATIENT HEART BEAT CLASSIFICATION
19
classification problem. The second most important
features are morphological features. The other feature
groups such as Hermite basis function expansion co-
efficients, higher-order autocorrelation statistics and
patient-normalized features do not seem to serve the
classification performances.
These results obtained with the weighted SVM
model and R-R intervals combined to segmentation
features are significantly better than previously re-
ported inter-patient classification models. In particu-
lar, the classification performances for the pathologi-
cal classes are always improved with more than 80%;
those classes are of crucial importance for the diag-
nosis. Furthermore, these performances are achieved
with a reduced number of features. The choice of the
features is thus a task of major importance, as a bad
selection or too many features can lead to unaceptable
results.
Another important issue for classification of heart
beats resides in the class unbalance, which is met with
weights being included in the SVM model. Indeed,
the average accuracy obtained by the model with our
best feature selection decreases from 83.0% to 54.3%,
with an accuracy of only 7.4% for class S and of
33.0% for class F when these weights are removed,
leading to rather useless models that are unable to
grasp the importance of the pathological cases.
ACKNOWLEDGEMENTS
G. de Lannoy is funded by a Belgian F.R.I.A. grant.
This work was partly supported by the Belgian
“R´
egion Wallonne” ADVENS and DEEP projects.
REFERENCES
Association for the Advancement of Medical Instrumenta-
tion (1998). Testing and reporting performance results
of cardiac rhythm and st segment measurement algo-
rithms. ANSI/AAMI EC38:1998.
Chazal, P. D., O’Dwyer, M., and Reilly, R. B. (2004). Auto-
matic classification of heartbeats using ecg morphol-
ogy and heartbeat interval features. Biomedical Engi-
neering, IEEE Transactions on, 51:1196–1206.
Cheng, W. and Chan, K. (1998). Classification of electro-
cardiogram using hidden markov models. Engineer-
ing in Medicine and Biology Society, 1998. Proceed-
ings of the 20th Annual International Conference of
the IEEE, 1:143–146.
Christov, I., G´
omez-Herrero, G., Krasteva, V., Jekova, I.,
Gotchev, A., and Egiazarian, K. (2006). Comparative
study of morphological and time-frequency ecg de-
scriptors for heartbeat classification. Med. Eng. Phys.,
28(9):876–887.
Clifford, G. D., Azuaje, F., and McSharry, P. (2006). Ad-
vanced Methods And Tools for ECG Data Analysis.
Artech House, Inc., Norwood, MA, USA.
Franc¸ois, D. (2008). Feature selection. In Wang, J., ed-
itor, Encyclopedia of data mining and warehousing,
second edition, Information Science Reference.
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov,
P. C., Mark, R., Mietus, J., Moody, G., Peng, C.-K.,
and Stanley, H. (2000). PhysioBank, PhysioToolkit,
and PhysioNet: Components of a new research re-
source for complex physiologic signals. Circulation,
101(23):e215–e220.
Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. A.
(2006). Feature Extraction: Foundations and Appli-
cations (Studies in Fuzziness and Soft Computing).
Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Lagerholm, M., Peterson, C., Braccini, G., Edenbrandt,
L., and Sornmo, L. (2000). Clustering ecg com-
plexes using hermite functions and self-organizing
maps. Biomedical Engineering, IEEE Transactions
on, 47(7):838–848.
Melgani, F. and Bazi, Y. (2008). Classification of electro-
cardiogram signals with support vector machines and
particle swarm optimization. Information Technology
in Biomedicine, IEEE Transactions on, 12(5):667–
677.
Osowski, S. and Hoai, L. (2001). Ecg beat recognition us-
ing fuzzy hybrid neural network. Biomedical Engi-
neering, IEEE Transactions on, 48(11):1265–1271.
Osowski, S., Hoai, L., and Markiewicz, T. (2004). Sup-
port vector machine-based expert system for reliable
heartbeat recognition. Biomedical Engineering, IEEE
Transactions on, 51(4):582–589.
Park, K., Cho, B., Lee, D., Song, S., Lee, J., Chee, Y., Kim,
I., and Kim, S. (2008). Hierarchical support vector
machine based heartbeat classification using higher
order statistics and hermite basis function. In Com-
puters in Cardiology, 2008, pages 229–232.
Platt, J. C. (1999). Fast training of support vector machines
using sequential minimal optimization. In Scholkopf,
B., Burges, C., and Smola, A., editors, Advances in
Kernel Methods. The MIT Press.
Vapnik, V. N. (1999). The Nature of Statistical Learning
Theory (Information Science and Statistics). Springer.
Yinggang, Z. and Qinming, H. (2006). An unbalanced
dataset classification approach based on v-support
vector machine. In Intelligent Control and Automa-
tion, 2006. WCICA 2006. The Sixth World Congress
on, volume 2, pages 10496–10501.
BIOSIGNALS 2010 - International Conference on Bio-inspired Systems and Signal Processing
20
... Since the SVM can manage high-dimensional and large datasets this classifier constitutes a suited choice for many tasks which are related to biomedical classification problems [11]. Facial expression classification [27], text classification [58], beat detection [46] and QRS complex classification [23] are only a few examples for successful applications of the SVM. ...
... To reduce the contribution of dominating classes in the training process one has to weight the parameter C by adding a new parameter w i for each class [23]. ...
... The linear Kernel has been chosen because it is computationally easiest to apply. Furthermore, applying this Kernel one just needs to find the optimal parameter C. Moreover this, the linear Kernel turned out to be even more powerful compared to more complex Kernels in other investigations [23]. ...
... Use of the RR interval ratio can reduce the overlap between S and class N heartbeat and thus increase the S detection rate. First the ECG heart-beat time series are divided into two data sets one for training and another for testing as used by Lannoy [10] as shown in Table II. The two sets of heart beat time series are transformed into R-peak aligned time time series as proposed in Section IIIA and is shown in Figure 2 for the test record 213. ...
... The classification experiments were conducted based on the division of MIT-BIH records into two data sets as used by Chazal [6] and Lannoy [10] is shown in Table II. For the comparative results shown in Table III, we considered the papers which used the complete data of 44 records of MIT-BIH Arrhythmia Database excluding the other 4 records containing the paced beats of the respective 4 patients as recommended by AAMI standard. ...
Conference Paper
Full-text available
An ElectroCardiogram (ECG) inter-patient heartbeat time series classification method by a hierarchical system of based on support vector machine and Decision rule, using full heart-beat time series by alignment of R-peaks of all beats, is proposed. PQRST Time series of heart-beats having converted into equal length series by alignment of R-peaks of all heart-beats based on R-peak of largest length PQRST series in the data and by padding zeroes to the smaller length series on either side, was used in this experimentation. The main objective of this paper is to identify the abnormalities in ECG heart beats based on AAMI Categorization. Experiments were conducted on ECG data of 44 patients obtained from MIT-BIH Arrhythmia database. Results were compared with existing methods such as weighted support vector machine (SVM), hierarchical SVM and weighted linear discriminant analysis (LDA). Comparative analysis confirms the viability and superiority of the proposed approach in terms of Total classification accuracy (TCA). Proposed system achieved Sensitivities of 98.7%, 85.9%, 88.8%, 58.3%, PPV% of 98.53%, 82.2%, 89.9%, 85.6% for N, S, V and F classes respectively and a TCA of 97.3%.
Article
Full-text available
In this paper, one-dimensional Discrete Anamorphic Stretch Transform is proposed as an additional pre-processor for the feature extraction of the ECG signal using discrete wavelet transform in order to enhance the arrhythmia classification accuracy. Three DAST kernels: linear, sublinear, and superlinear kernels are proposed for enhancing the morphological features of the QRS complex. Its effectiveness is evaluated using two classifiers: feed-forward-based neural network and support vector machine with radial basis function. The MIT–BIH arrhythmia database and the generic cardiac beat classes such as normal (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F) and unknown beat (Q) are used for evaluating the proposed pre-processor. The training and testing of the classifier follow an inter-patient as well as intra-patient procedures. The classifier with SVM_RBF and the proposed pre-processor using DAST result in an increase in the average accuracy, sensitivity, specificity, positive predictivity, F-score and overall accuracy by 1.29%, 15.63%, 3.7%, 35.7%, 20.66%, and 2.796%, respectively, compared to that without DAST. The percentage improvement in the above performance metrics using ANN Classifier with DAST is 2.99%, 27.73%, 6.83%, 64.27%, 31.53% and 6.48%, respectively, compared to that without DAST. The morphological features obtained using DAST and DWT are also combined with RR-interval features. The combined feature set is found to have better classification accuracy than that using only morphological features. The accuracy of the proposed classifier is also found to be improved compared to many of the standard ECG classifiers reported in the literature.
Chapter
Full-text available
Blind Source Separation approaches have proved their efficiency to solve problems dealing with recovering a set of underlying sources from recoded observations without any a priori knowledge on the mixture process and sources. For this reason, we propose to use them to extract the true fetal ECG signal and consequently to calculate its instantaneous heart rate. Thus, we aim the application of the Robust Second-Order Blind Identification (RSOBI) algorithm, which exploits non-stationarity properties and second-order statistics, to a set of ECG mixtures recorded on pregnant mother. The obtained results show that we can separate original mixtures into 3 main sources which can be considered as the fetal ECG, the maternal ECG and noise. The recovered fetal ECG signals were found very clean and have permitted to perform fetal instantaneous heart rate calculation with a high precision.KeywordsBlind source separationFetal ECGInstantaneous heart rateRSOBI
Chapter
Electrocardiography (ECG) is a test at that checks the electric activity of the coronary heart. Arrhythmia is a sort of coronary heart ailment characterized through abnormal heartbeats. The prognosis is primarily based totally at the hobby of the R top withinside the ECG sign. The maximum not unusual place sort of arrhythmia is atrial fibrillation. The heartbeat will become abnormal and fast because of this. In order to stay a healthful life, it is far important to repair a everyday coronary heart rhythm. The present-day technique of detecting arrhythmia is to connect the tool to a lead and ship an ECG sign to a health practitioner, while the occasion is occurring. The uncooked ECG sign acquired from the present-day database is preprocessed the use of the FFT filter. The diploma of the polynomial equation is decided through the wide variety of factors that have to match in an effort to create an easy curve that replicates the HRV sign A polynomial diploma of n = 6 equation yields the high-quality outcomes in becoming the HRV sign. Statistical and wavelet parameters are mixed right into an unmarried set of parameters in hybrid with curve becoming. The performance of the proposed set of rules is in comparison with that of different algorithms and evaluated on an MIT database. The proposed offline technique has an accuracy of 94%.KeywordsSignalFeaturesClassifiersFFT
Article
Full-text available
An arrhythmia classification model based on an adaptive boosting algorithm is proposed in this paper. According to the AAMI standard, 15 kinds of abnormal cardiac rhythms are grouped and the datasets are segmented by the non-crossover method. The electrocardiogram (ECG) signals are denoised by the filter method, and then divided into fixed-length ECG beats, and five features are extracted from time-domain and frequency-domain. Then, the base classifier of the algorithm and its optimal algorithm parameters is selected to realize the multi-classification of cardiac anomalies, aiming at mining hidden knowledge from human physiological data to detect human health status, making the diagnosis process more automatic, efficient, and intelligent.
Chapter
The report of World Health Organization (WHO) specifies that the diagnosis and treatment of cardiovascular diseases are challenging tasks. To study the electrical conductivity of the heart, Electrocardiogram (ECG) which is an inexpensive diagnostic tool, is used. Classification is the most well-known topic for arrhythmia detection related to cardiovascular disease. Many algorithms have been evolved for the classification of heartbeat arrhythmia in the previous few decades using the CAD system. In this paper, we have developed a new deep CNN (11-layer) model for automatically classifying ECG heartbeats into five different groups according to the ANSI-AAMI standard (1998) without using feature extraction and selection techniques. The experiment is performed on publicly available Physionet MIT-BIH database and evaluated results are then compared with the existing works mentioned in the literature. To handle the problem of minority classes as well as the class imbalance problem, the database has been oversampled artificially using SMOTE technique. The augmented ECG database was employed for training the model while the testing was performed on the unseen dataset. On evaluation of the results from the experiment, we found that the proposed CNN model performed better in comparison to the experiments mentioned in other papers in terms of accuracy, sensitivity, and specificity. abstract environment.
Article
Full-text available
Objective: To train convolutional networks using multi-lead ECG data and classify new data accurately to provide reliable information for clinical diagnosis. Methods: The data were pre-processed with a bandpass filter, and signal framing was adopted to adjust the data of different lengths to the same size to facilitate network training and prediction. The dataset was expanded by increasing the sample size to improve the detection rate of abnormal samples. A depth-wise separable convolution structure was used for more specific feature extraction for different channels of twelve-lead ECG data. We trained the two classifiers for each label using the improved DenseNet to classify different labels. Results: The propose model showed an accuracy of 80.13% for distinguishing between normal and abnormal ECG with a sensitivity of 80.38%, a specificity of 79.91% and a F1 score of 79.35%. Conclusions: The model proposed herein can rapidly and effectively classify the ECG data. The running time of a single dataset on GPU is 33.59 ms, which allows real-time prediction to meet the clinical requirements.
Conference Paper
In this paper, an efficient heart beat classification algorithm suitable for implementation on mobile devices is presented. A simplified ECG model is used for feature extraction in the time domain. The QRS complex is modeled using straight lines, while P and T waves are modeled using parabolas. The model parameters are estimated by minimizing the root mean square (RMS) of the model error. Heart beats are classified as one of the following: normal (N), supraventricular (S) and Ventricular (V) ectopic beats using a feed-forward neural network. A series of tests have been performed to evaluate the classification algorithm using the MIT-BIH arrhythmia database ECG signals subset and expressed in the terms of sensitivity (Se), specificity (Sp) and accuracy (Acc). The best results were achieved when the classification algorithm was applied on the third model set. The proposed algorithm has been implemented as a J2ME mobile application. It has been tested on signals recorded by a telemedicine health care system and have achieved an average accuracy above 93%.
Article
Full-text available
An abstract is not available.
Conference Paper
Full-text available
The heartbeat class detection of the electrocardiogram is important in cardiac disease diagnosis. For detecting morphological QRS complex, conventional detection algorithm have been designed to detect P, QRS, T wave. However, the detection of the P and T wave is difficult because their amplitudes are relatively low, and occasionally they are included in noise. We applied two morphological feature extraction methods: higher-order statistics and Hermite basis functions. Moreover, we assumed that the QRS complexes of class N and S may have a morphological similarity, and those of class V and F may also have their own similarity. Therefore, we employed a hierarchical classification method using support vector machines, considering those similarities in the architecture. The results showed that our hierarchical classification method gives better performance than the conventional multiclass classification method. In addition, the Hermite basis functions gave more accurate results compared to the higher order statistics.
Article
Full-text available
An integrated method for clustering of QRS complexes is presented which includes basis function representation and self-organizing neural networks (NN's). Each QRS complex is decomposed into Hermite basis functions and the resulting coefficients and width parameter are used to represent the complex. By means of this representation, unsupervised self-organizing NN's are employed to cluster the data into 25 groups. Using the MIT-BIH arrhythmia database, the resulting clusters are found to exhibit a very low degree of misclassification (1.5%). The integrated method outperforms, on the MIT-BIH database, both a published supervised learning method as well as a conventional template cross-correlation clustering method.
Article
Full-text available
This paper presents a new solution to the expert system for reliable heartbeat recognition. The recognition system uses the support vector machine (SVM) working in the classification mode. Two different preprocessing methods for generation of features are applied. One method involves the higher order statistics (HOS) while the second the Hermite characterization of QRS complex of the registered electrocardiogram (ECG) waveform. Combining the SVM network with these preprocessing methods yields two neural classifiers, which have been combined into one final expert system. The combination of classifiers utilizes the least mean square method to optimize the weights of the weighted voting integrating scheme. The results of the performed numerical experiments for the recognition of 13 heart rhythm types on the basis of ECG waveforms confirmed the reliability and advantage of the proposed approach.
Article
Full-text available
A method for the automatic processing of the electrocardiogram (ECG) for the classification of heartbeats is presented. The method allocates manually detected heartbeats to one of the five beat classes recommended by ANSI/AAMI EC57:1998 standard, i.e., normal beat, ventricular ectopic beat (VEB), supraventricular ectopic beat (SVEB), fusion of a normal and a VEB, or unknown beat type. Data was obtained from the 44 nonpacemaker recordings of the MIT-BIH arrhythmia database. The data was split into two datasets with each dataset containing approximately 50,000 beats from 22 recordings. The first dataset was used to select a classifier configuration from candidate configurations. Twelve configurations processing feature sets derived from two ECG leads were compared. Feature sets were based on ECG morphology, heartbeat intervals, and RR-intervals. All configurations adopted a statistical classifier model utilizing supervised learning. The second dataset was used to provide an independent performance assessment of the selected configuration. This assessment resulted in a sensitivity of 75.9%, a positive predictivity of 38.5%, and a false positive rate of 4.7% for the SVEB class. For the VEB class, the sensitivity was 77.7%, the positive predictivity was 81.9%, and the false positive rate was 1.2%. These results are an improvement on previously reported results for automated heartbeat classification systems.
Article
Full-text available
The prompt and adequate detection of abnormal cardiac conditions by computer-assisted long-term monitoring systems depends greatly on the reliability of the implemented ECG automatic analysis technique, which has to discriminate between different types of heartbeats. In this paper, we present a comparative study of the heartbeat classification abilities of two techniques for extraction of characteristic heartbeat features from the ECG: (i) QRS pattern recognition method for computation of a large collection of morphological QRS descriptors; (ii) Matching Pursuits algorithm for calculation of expansion coefficients, which represent the time-frequency correlation of the heartbeats with extracted learning basic waveforms. The Kth nearest neighbour classification rule has been applied for assessment of the performances of the two ECG feature sets with the MIT-BIH arrhythmia database for QRS classification in five heartbeat types (normal beats, left and right bundle branch blocks, premature ventricular contractions and paced beats), as well as with five learning datasets-one general learning set (GLS, containing 424 heartbeats) and four local sets (GLS+about 0.5, 3, 6, 12 min from the beginning of the ECG recording). The achieved accuracies by the two methods are sufficiently high and do not show significant differences. Although the GLS was selected to comprise almost all types of appearing heartbeat waveforms in each file, the guaranteed accuracy (sensitivity between 90.7% and 99%, specificity between 95.5% and 99.9%) was reasonably improved when including patient-specific local learning set (sensitivity between 94.8% and 99.9%, specificity between 98.6% and 99.9%), with optimal size found to be about 3 min. The repeating waveforms, like normal beats, blocks, paced beats are better classified by the Matching Pursuits time-frequency descriptors, while the wide variety of bizarre premature ventricular contractions are better recognized by the morphological descriptors.
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Conference Paper
Support vector machine (SVM) has been extensively studied and has shown remarkable success in many applications. However, when faced with unbalanced datasets, the SVM can not get ideal classification result and even in some cases the classification ability was very bad and unaccepted. The V-support vector machine (V-SVM) is a new formulation of the regular SVM, and its parameter V has intuitive meanings compared with C (the penalty constant in SVM). By investigating the relation between SVM and V-SVM, we gave an equation between V and C, meanwhile we analyzed the factor behind the classification failure of SVM on unbalanced dataset. Then a classification algorithm based on V-SVM was addressed to overcome this inconvenience. Experimental results show the effectiveness of the proposed algorithm
Article
The newly inaugurated Research Resource for Complex Physiologic Signals, which was created under the auspices of the National Center for Research Resources of the National Institutes of Health, is intended to stimulate current research and new investigations in the study of cardiovascular and other complex biomedical signals. The resource has 3 interdependent components. PhysioBank is a large and growing archive of well-characterized digital recordings of physiological signals and related data for use by the biomedical research community. It currently includes databases of multiparameter cardiopulmonary, neural, and other biomedical signals from healthy subjects and from patients with a variety of conditions with major public health implications, including life-threatening arrhythmias, congestive heart failure, sleep apnea, neurological disorders, and aging. PhysioToolkit is a library of open-source software for physiological signal processing and analysis, the detection of physiologically significant events using both classic techniques and novel methods based on statistical physics and nonlinear dynamics, the interactive display and characterization of signals, the creation of new databases, the simulation of physiological and other signals, the quantitative evaluation and comparison of analysis methods, and the analysis of nonstationary processes. PhysioNet is an on-line forum for the dissemination and exchange of recorded biomedical signals and open-source software for analyzing them. It provides facilities for the cooperative analysis of data and the evaluation of proposed new algorithms. In addition to providing free electronic access to PhysioBank data and PhysioToolkit software via the World Wide Web (http://www.physionet. org), PhysioNet offers services and training via on-line tutorials to assist users with varying levels of expertise.