Asymmetricmargin support vector machines for lung tissue classification.
ABSTRACT This paper concerns lung tissue classification using asymmetricmargin support vector machine (ASVM) to handle the imbalance of the positive and negative classes in a oneagainstall multiclass classification problem. The hyperparameters of the algorithm are obtained using an optimization of the upper bound of the leaveoneout error of the ASVM. The ASVM is applied on the dataset with its original distribution and oversampled so that the ratio of the examples is equal to the prevalence of patients having the tissue in the database. The two versions of the ASVM models were compared with a model build with a conventional SVM. The ASVM improved the results obtained with a conventional SVM. The incorporation of prior knowledge concerning the prevalence of the patients improved the results obtained with ASVM.

 SourceAvailable from: Tapan P. Bagchi[Show abstract] [Hide abstract]
ABSTRACT: This paper extends the utility of support vector machines—a recent innovation in AI being adopted for cancer diagnostics—by analytically modelling the impact of imperfect class labelling of training data. It uses ROC computations to study the SVM's ability to classify new examples correctly even in the presence of misclassified data in training examples. It uses DOE to reveal that misclassifications present in training data affect training quality, and hence performance, albeit not as strongly as the SVM's primary design parameters. Still, our results give strong support for one's striving to develop the best trained SVM that is intended to be utilized, for instance, for medical diagnostics, for misclassified training data shrink decision boundary distance, and increasing generalization error. Further, this study affirms that to be effective, the SVM design optimization objective should incorporate real life costs or consequences of classifying wrongly.NMIMS Management Review. 01/2013; XXIII(Oct  Nov):67  90.  SourceAvailable from: Tapan P. Bagchi
Conference Paper: SVM Classifiers Based On Imperfect Training Data
[Show abstract] [Hide abstract]
ABSTRACT: This paper extends the utility of asymmetric soft margin support vector machines—by analytically modeling imperfect class labeling in the training data. It uses ROC computations to first establish the strong relationship between the SVM's performance and its ability to classify examples correctly, even in the presence of misclassified training examples. It uses DOE to reveal that misclassification also affects training quality, and hence performance, though not as strongly. Still, our results give strong support for one's striving to develop the best trained SVM that is intended to be utilized, for instance, for medical diagnostics, as misclassifications shrink decision boundary distance and increase generalization error. I. TWOCLASS CLASSIFICATION AND THE SVM Clinical diagnostics has always depended on the clinician's ability to diagnose pathologies based on the observation of symptoms exhibited by the patient and then classifying his/her condition. Correct diagnosis can make the difference between life and death in the correct and timely intervention, be it hypertension, diabetes, or the various types of malignancy. Similar situations arise where the precise link between cause and effect is not yet established and one is predestined to process a certain amount of data to draw inferences to guide decisions. For hypertension, for example, attempts have been recently made to probe the situation beyond the measurement of systolic/diastolic blood pressures—one finds such studies attempting to predict the occurrence of hypertension based on observations of age, sex, family history, smoking habits, lipoprotein, triglyceride, uric acid, total cholesterol and body mass index, etc. In many such situations the treatment given is based on a binary classification—the ailment is present, or it is not [1]. Classification is challenging not only in respect to acquiring the relevant data through tests about factors known to be associated with the pathology, but also the data analytics adopted to lead to reliable and correct prediction. This present paper looks into one such data analysis technique, now about 20 years in use and known as support vector machine or SVM, that helps one to develop classification models based on statistical principles of learning. Like artificial neural networks, an SVM is data driven—it is trained using a dataset of examples with known class (label), and then utilized to predict the class of new examples. How well an SVM works is measured by the accuracy with which it can predict the class of unseen examples (examples not included in training the SVM). The tone set in the present work is to move beyond the simple notion of "accuracy"—the conventional classifier performance measure—by incorporating analytical modeling of correct/incorrect classification of instances in the training sample. This has not yet been done. We focus specifically on the effect on ROC of imperfect labeling of input data. Support vector machine is an algorithmic approach proposed by Vapnik and his colleagues [2] to the issue of classification of instances (for example, patients who may or may not have diabetes) and it falls in the broader context of supervised learning in artificial intelligence (AI) in computer science. It begins with a set of data comprising feature vectors of instances {x i }, and a class tag or label {y i } attached to each of those instances. The most common application of SVM aims at training a model that learns from those instances, and estimates the model parameters. Subsequently that model is used to predict the class of an instance for which only feature values are available and one is interested in finding its class (y) label, with high degree of correctness. By an elaborate procedure of optimization, an SVM is designed so as to display minimum classification error for unseen instances, an attribute measured by its "generalization error." Binary classification is its most common use.POMS 2014; 01/2014
Page 1
Asymmetric–margin support vector machines for lung tissue
classification
Jimison Iavindrasana, Adrien Depeursinge, Gilles Cohen, Antoine Geissbuhler and Henning M¨ uller
Abstract—This paper concerns lung tissue classification using
asymmetric–margin support vector machine (ASVM) to handle
the imbalance of the positive and negative classes in a one–
against–all multiclass classification problem. The hyperparam
eters of the algorithm are obtained using an optimization of
the upper bound of the leave–one–out error of the ASVM. The
ASVM is applied on the dataset with its original distribution
and oversampled so that the ratio of the examples is equal to
the prevalence of patients having the tissue in the database.
The two versions of the ASVM models were compared with a
model build with a conventional SVM. The ASVM improved the
results obtained with a conventional SVM. The incorporation
of prior knowledge concerning the prevalence of the patients
improved the results obtained with ASVM.
I. INTRODUCTION
Interstitial lung diseases (ILD) form a heterogeneousgroup
of diseases containing more than 150 disorders of the lung
tissue. Many of the diseases are rare and present unspecific
symptoms. During the diagnostic process, all available in
formation including the patient’s personal data, medication,
past medical history, host risk factors and laboratory tests
(e.g. pulmonary function tests, hematocrit, ...) are metic
ulously analyzed to find any indicator of the presence of
an ILD. Beside the patient’s clinical data, imaging of the
chest allows to resolve an ambiguity in a large number
of cases by enabling the visual assessment of the lung
tissue [1]. The most common imaging modality used is the
chest X–ray because of its low cost and radiation dose. It is
of sometimes of limited usefulness for the characterization
of lung tissue as these are overlaid with other anatomi
cal structures, making the reading sometimes difficult. The
gold standard imaging technique used in case of doubt is
the high–resolution computed tomography (HRCT), which
provides three–dimensional images of the lung tissue with
high spatial resolution. Most of the histological diagnoses
of ILDs are associated with a given combination of image
findings (i.e. abnormal lung tissue) [2]. The most common
lung tissue patterns are emphysema, ground glass, fibrosis,
micronodules and consolidation. These are characterized by
distinct texture properties in HRCT imaging. The detection
and characterization of the lung tissue patterns in HRCT is
time–consuming and requires experience. In order to reduce
the risk of omission of important tissue lesions and to ensure
the reproducibility of image interpretation, computer–aided
Theauthorsarewith
of
theService
Rue
(phone: +41 22 372 8874; email:
Henning
of
Gabrielle–Perret–Gentil
MedicalInformatics,
University
1211 Geneva
jimison.iavindrasana@sim.hcuge.ch).
also with the HES–SO Valais, TechnoArk 3, 3960 Sierre, Switzerland
Hospitals
14, Switzerland
Geneva, 4,
M¨ uller is
Fig. 1. Construction of the block instances from manually delineated ROIs.
diagnosis (CAD) was proposed several times for HRCT of
the lung [3], [4], [5], [6], [7], [8], [9], [10]. The typical
approaches use supervised machine learning to draw decision
boundaries in feature spaces spanned by texture attributes.
The reported performance of these approaches suggests that
these systems have the potential to be valuable tools in clin
ical routine by providing second opinions to the clinicians.
However, the CAD system must include a sufficient number
of classes of lung tissue to cover the heterogeneous visual
findings associated with ILDs. A CAD system, which aims
at detecting one single lung tissue pattern is of limited use
as the radiologist still needs to look for other pathological
lung tissue patterns in the image series.
A major performance problem of multi–class CAD sys
tems is the challenge of learning from highly imbalanced
datasets. In a given dataset of HRCT images of patients
affected with ILDs, the instances consist of manually de
lineated regions of interest (ROI) showing examples of the
lung tissue patterns that are cut out into square blocks that
may overlap or not (see Figure 1). The resulting distributions
of the classes are depending both on the prevalence of
each lung tissue sort and the average ROI surface and can
as a consequence be highly imbalanced. Although equal
sensitivity and specificity are needed among the classes, most
of the machine learning techniques favor performance of
the majority class and research efforts are needed to ensure
balanced performances among all classes.
In this article, support vector machines with asymmetric
margins (ASVM) are used to classify the ROIs.
The paper is structured as follows: Section 2 introduces
the method for handling imbalanced datasets with SVMs and
the estimation of the SVM/ASVM hyperparameters. Section
Page 2
3 details the materials and the classification method used
followed by the presentation of the results in Section 4.
A discussion of the results is found in Section 5 and a
conclusion and future ideas in Section 6.
II. LEARNING IMBALANCED DATA SETS WITH ASVM
After a short review of the techniques to handle imbal
anced datasets the SVM algorithm is introduced followed by
details on the built–in dataset imbalance management and a
method to estimate the SVM/ASVM hyperparameters.
A. Imbalanced learning approaches in the literature
Two main approaches were proposed in the literature to
manage imbalanced datasets: the resampling strategy and
the algorithm–based strategy [11]. The first one is a data–
driven strategy and performed by down–sampling the ma
jority class or oversampling the minority class. This method
has many variants with respect to the resampling technique:
resampling at random, undersampling by removing redun
dant or noisy majority examples [12], oversampling with
synthetic examples drawn using clustering algorithms [13]
and [14], oversampling positive examples located near the
decision function [15]. A comparative study of the available
resampling strategies was carried out by [16] with the C4.5
algorithm and the random undersampling and oversampling
methods outperforming all resampling strategies. The second
strategy (algorithm–driven) consists of altering the misclas
sification cost of the classes such as in [17] or altering the
data representation to achieve a high separability of the data
(see for example [18]). A comparison of the resampling and
cost–sensitive strategy with SVM can be found in [14]. In this
comparative study, the SVM with asymmetric margins was
used and it outperformed the resampling technique (com
bination of undersampling and oversampling with artificial
examples generated with the k–means algorithm).
B. Support Vector Machines
A maximum margin classifier looks for an optimal hyper
plane separating the training dataset such as the distance of
the training points to the optimal hyperplane is maximized.
This assumes that the training data are separable. Finding this
optimal hyperplane is equivalent to resolving the following
quadratic optimization problem:
min(wTw)s.t. yi(wTxi+ b) ≥ 1
(1)
where w is a vector perpendicular to the hyperplane, b is
a scalar value, {xi,yi}N
?d,yi∈ {−1,1},N the number of examples and d the num
bers of variables). If certain conditions hold and using the
Lagrangian formulation, the previous problem is equivalent
to its dual, which is a quadratic optimization problem and
which can be solved using several optimization techniques:
i=1are the training points (xi ∈
maxα≥0
θ(α) =
?
iαi−1
?
2
?
i
?
jαiαjyiyjxT
ixj (2)
s.t.
iαiyi= 0
where αi are the Lagrangian multipliers. A support vector
machine (SVM) is a maximum margin classifier, which uses
only the points on both sides of the margin or support vectors
(points xifor which αi> 0) to build a model.
For a non–separable training data set, penalty variables
ξi are introduced to soften the constraints of the maximum
margin formulation (1). The penalty variables ξi are drawn
as follows: 0 < ξi≤ 1 if the points are on the correct side of
the hyperplane and ξ > 1 if the point is on the wrong side.
A cost variable C is also introduced to control the trade–
off between the width of the margin and the points within
the margin. The final goal of the SVM classifier is then to
maximize the margin while minimizing the total sum of the
penalties and thus equation (1) becomes:
min
s.t.
(wTw) + C?
yi(wTxi+ b) ≥ 1 − ξi, ξi≥ 0
The primal formulation of the SVM can be solved us
ing [19] for separable cases (Eq. (1)) and [20] and for non–
separable cases (Eq. (2)). However, the dual problem is often
solved because the duality theory provides a convenient way
to deal with the constraints. The dual optimization problem
can also be written in terms of the dot products permitting the
use of the kernel functions. The kernel trick allows to apply
the maximum margin algorithm to a transformed version of a
non–separable dataset (feature space) via a mapping function
φ. The related dual problem can be expressed as
iξp
i
(3)
maxα
2αTe − αT?G(K) +1
α ≥ 0, αTy = 0
CIn
?
(4)
st
where e is the n–vector of ones, α ∈ ?n, G(K) the Gram
matrix is defined by Gij(K) = [K]ijyiyj = k(xi,xj)yiyj,
In, which is a diagonal matrix of 1 and α ≥ 0 means
αi ≥ 0, i = 1,...,n. The transformation function φ is
integrated into the definition of the Gram matrix. According
to Mercer’s theorem, (3) can be expressed by transforming
the input data with φ and taking the dot product to define
the kernel or taking directly any kernel and using it without
knowing the function φ. One kind of such kernels can
be the Gaussian kernel (also called radial basis functions
(RBF) kernel) expressed as K(xi,xj) = φ(xi)Tφ(xj) =
e−?xi−xj?2/(2σ2). For such a kernel, the misclassification
cost C and the kernel hyperparameter σ require optimization.
The graph (A) of Figure 3 illustrates the data classification
with SVM in a feature space.
Many researchers consider SVMs as one of the best classi
fication algorithm due to its theoretical foundation based on
structural risk minimization implying a better generalization
performance [21]. However, SVMs may provide bad results
if it is used with the wrong parameters. The usual way to find
the parameters of SVMs is to scan a range of possible values
of the parameters, evaluate the classifier with a data splitting
such as cross–validation or a bootstrapping procedure and
then select those providing the best performance. A better
method to measure the generalization performance is to
evaluate SVMs with the leave–one–out procedure during the
grid search. These processes are expensive with respect to
computation time because they require an SVM resolution
Page 3
at each step. A more efficient way to choose the SVM
parameters is to take advantage of the underlying theory
using the bound of the leave–one–out error.
For the SVM with an RBF kernel and in the case of non–
separable training data (hard margin SVM), Vapnik showed
that the leave–one–out error is upper bounded by 4R2?w?2
(the radius margin bound) [21]. R is the radius of the
smallest sphere containing all φ(xi) and is the solution of
the following optimization problem:
maxβ
1 − βTKβ
0 ≤ βii = 1,...,n
eTβ = 1
st
This bound of the leave–one–out error can be used to
estimate the parameter σ of the RBF kernel and the soft
margin parameter C. The readers are referred to [22] for a
survey of the SVM error bounds estimation.
C. SVM with Asymmetrical Misclassification Cost
The SVM formulations above (Eq. 2) mean that the
misclassification cost of positive and negative examples (in
the case of a binary classification) are the same. This SVM
formulation may be incongruous for problems with high
imbalance between classes or those whose error penalty is
not the same for each class.
The SVM algorithm implements natively a cost–sensitive
strategy. For this purpose, two misclassification costs (C+
for yi= +1 and C−for yi = −1) are introduced. In this
case, the primal formulation of the SVM is:
minw,b,ξ
?w,w? + C+?
yi(?w,Φ(xi)? + b) ≥ 1 − ξ+
yi(?w,Φ(xi)? + b) ≤ −1 − ξ−
ξi≥ 0, i = 1,...,n
where i+= {iyi= +1} and i−= {iyi= −1}
The corresponding dual form is:
i∈i+(ξ+
i)2+ C−?
i∈i−(ξ−
i)2(5)
st
i, i ∈ i+
i, i ∈ i−
maxα
2αTe − αT?G(K) +
1
C+I+
n+
1
C−I−
n
?
(6)
stα ≥ 0, αTy
where e, α and G(K) has the same expression as in (Eq. 2)
and I+
i+(resp. i ∈ i−) and 0 elsewhere. It is important to highlight
that for an identical misclassification cost for positive and
negative examples, we obtain the formulation (Eq. 1) with
(Eq. 3). Figure 3 illustrates the differences between SVM
and ASVM.
Other approaches were proposed in the literature to handle
imbalanced datasets with SVMs. A naive post–processing
method consists of shifting the separating hyperplane far
away from the positive examples. Another one, proposed in
[17], applies a conformal transformation in the feature space
to achieve a high separability of the training data.
n(resp. I+
n) is a diagonal matrix composed by 1 for i ∈
Fig. 2.Illustration of an SVM.
Fig. 3.
problem. Squares represent positive and circles negative examples; dark
symbols stand for training and grey for test examples. The graph (A) shows
the decision boundary induced by a conventional SVM. The graph (B)
shows the new boundary obtained by introducing two cost hyperparameters
respectively for positive and negative examples. Notice the two positive
examples (grey squares), which are misclassified in (A) and correctly
classified in (B) and also the direction of the vector w?perpendicular to
the separating hyperplane in the graph (B).
Illustration of the asymmetric–margin SVM on a toy classification
D. Radius margin bound
As stated above, the radius margin bound (RMB) proposed
by Vapnik is for hard–margin SVMs. To obtain the radius
margin bound of the soft–margin SVM, the soft–margin
should be casted into the hard margin formulation which is
achieved using the following change:
˜ w ≡
?
w
√Cξ
?
and set the i–th training data as
?φ(xi)
yiei
√C
?
Page 4
The kernel function becomes ˜K(xi,xj) = K(xi,xj) +
δij/C, where δij = 1 if i = j and 0 otherwise. The new
radius margin bound is˜R2?˜ w?2, where˜R2is the objective
value of
maxβ
1 +1
0 ≤ βi,i = 1,...,n
eTβ = 1
C− βT?K +1
C
?β
(7)
st
To solve the asymmetric–margin problem, all SVM solvers
use only one value of C and balance the misclassification
with weights to obtain the value of C+and C−. To exploit
the radius–margin for asymmetrical misclassification cost,
we introduce the following relation: C = C++ C−=
w+C + w−C and w++ w− = 1 i.e. the cost asymmetry
is taken only into account during the SVM optimization
problem resolution. We also use the heuristic proposed by
Morik et al.: the potential total cost of the false positives
equals the potential total cost of the false negatives i.e. the
costs C+and C−conform to the relation in Eq. 6 [23].
Thus, we obtain the value of w+ =
where N+,N−and N are respectively the number of positive,
negative and all training examples. With these weights, it
is now possible to introduce a higher cost when the SVMs
misclassify positive examples compared to a misclassification
of negative examples.
N−
N
and w− =
N+
N
C+
C−=number of negative training example
E. Gradient descent algorithm and SVM model selection
Optimization is generally concerned with the minimization
(or maximization) of a function of which parameters are
subject to one or more functional constraint. f is named
a continuously differentiable function. A gradient descent
algorithm is a method to solve an optimization problem.
It looks iteratively (until a stop criterion is reached) for a
direction dk∈ ?nand dk?= 0 from a starting point x0∈ ?n
satisfying:
number of positive training examples=N−
N+
(8)
∀k > 1,∃? > 0 such as ∇f(xk)dk≤ −??∇f(xk)??dk? (9)
There are many ways to define the descent direction dk. One
strategy, known as the Newton method, is to use
dk= −∇2f(xk)−1∇f(xk)
(10)
The computation of the second derivative (Hessian) of f
is computationally expensive. The quasi–Newton method
approximates the Hessian at each iteration. The algorithm of
the hyperparameter selection using the quasi Newton method
is shown below as introduced in [22].
The radius margin bound of the L2–SVMs (p = 2 in (3))
with the RBF kernel being continuously differentiable with
respect to the parameters C and σ. Thus, the optimal param
eter can be computed using the gradient descent algorithm
according to the following:
∂BRML2
∂Vt
=
∂(R2?w?2)
∂Vt
= ?w?2 ∂R2
∂Vt+ R2∂(?w?2)
∂Vt
(11)
Algorithm for model selection using the gradient descent algorithm
1.
2.
Initialize SVM hyperparameters
Solve SVM problem using a standard SVM
algorithm
Minimize the RMB according to the values
of the Lagrangian multipliers with a
gradient descent algorithm
Go to step 2 or stop when the minimum of
the RMB is reached
3.
4.
where Vtis the t–th parameter of the L2–SVM.
For V = (C,σ2),
∂ ?w?2
∂C
∂ ?w?2
∂σ2
∂R2
∂C
∂R2
∂σ2=
∂˜k(xi,xj)
∂σ2
III. MATERIAL AND METHODS
This section introducea the dataset we are using for the
experimentations, the software used for the extraction of
features, the SVM software, the features build from the
HRCT images and the learning process. The latter includes
the implementation of the resampling methods, the model
selection process and the details of the metric used to assess
the quality of the classification.
=
?n
i=1α2
i/C2
=
?n
i,j=1αiαjyiyj
∂˜k(xi,xj)
∂σ2
=
?n
?n
˜k(xi,xj)?xi−xj?2
i=1βi(1 − βi)/C2
i,j=1βiβj
∂˜k(xi,xj)
∂σ2
=
2σ4
A. Dataset
The dataset used in this work is extracted from an in–house
multimedia collection of cases at the University Hospitals
of Geneva (HUG). The diagnoses of each ILD cases was
confirmed by a biopsy or an equivalent test (e.g. bron
choalveolar lavage, tuberculin skin test, Kveim test, ...). For
each collected patient, 99 clinical parameters associated with
13 of the most frequent diagnoses of ILDs were collected
from the electronic health record (EHR), describing the
patient’s clinical state at the time of the stay when the HRCT
image series were acquired. The lung tissue patterns related
to the ILD diagnosis were manually delineated in HRCT
images series (1mm slice thickness, no contrast agent) by
two experienced radiologists at the HUG. The distributions
of the 6 most represented tissue sorts are detailed in Table I
in terms of number of ROIs, volumes and number of block
instances obtained as shown in Figure 1. The size of the
blocks is 32 × 32 × 1 pixels.
B. Software
The image processing algorithms include wavelet–based
features and grey–level histograms and were implemented
in Java. The classification task is carried out with libSVM
implementing the SVM–L2 for binary classification [24].
Page 5
TABLE I
DISTRIBUTION OF THE CLASSES IN TERMS OF ROIS, VOLUMES AND
BLOCKS. THE NUMBER OF INSTANCES CORRESPONDS TO THE NUMBER
OF BLOCKS.
label
healthy
emphysema
ground glass
fibrosis
micronodules
consolidation
Total
ROIs
100
66
427
473
297
196
1559
volume (liters)
5.12 l
1.15 l
4.91 l
8.45 l
16.06 l
0.69 l
36.38 l
blocks
3043
422
2313
3113
6133
90
15114
patients
7
5
37
38
16
14
87
C. Texture features
The features used to characterize the texture properties
of the 6 lung tissue patterns are derived from grey–level his
tograms and tailored wavelet transforms (WT). The resulting
feature space has a dimension of 46.
1) Grey–level histograms: Thanks to Hounsfield Units
(HU), the pixel values in HRCT images corresponds univo
quely to the density of the observed tissue and thus contain
essential information for the characterization of the lung
tissue. To encode this information, 22 histogram bins of
grey–levels in the interval [−1050;600[ are used as texture
features. An additional feature related to the number of air
pixels is computed as the number of pixel values below 1000
HU.
2) Wavelet–based features: Near affine–invariant texture
features are derived from a tailored WT. A frame transform
is used to ensure translation–invariant descriptions of the
lung tissue patterns [10], [25]. Based on the assumption
that no predominant orientations are contained in the lung
tissue patterns a rotation–invariant nonseparable WT is im
plemented using isotropic polyharmonic B–spline scaling
functions and wavelets [26], [27]. At last, an augmented
scale progression is obtained using the quincunx lattice for
upsampling the filters by a factor of
of the WT. Within each unique subband i, the wavelet
coefficients are characterized by a mixture of two Gaussians
with fixed means µi
wavelet–based features are thus generated by 8 iterations of
the WT.
√2 at each iteration
1,2= µiand distinct variances σi
1,2. 24
D. Imbalance management
The datasets we are using contain imbalance with respect
to the class distribution. Three strategies were implemented
to handle the imbalance. The first strategy implements the
data–driven method (BAL). A random down–sampling of the
majority class is carried out to obtain 50% of the positive
and negative cases during the model selection. The second
strategy uses the cost–sensitivity method (ASVM). The ratio
of the original dataset is kept during the model selection
process and the values of the cost hyperparameters were
adjusted according to the imbalance rate of the positive and
negative cases. The third strategy uses a combination of the
resampling and cost–sensitive methods (ASVM + RES). The
resampling level is based on the prevalence of the patient in
the database (see Table I). If the ratio of the tissue is less than
the prevalence, the examples of this class are oversampled
(consolidation, emphysema, fibrosis, ground glass). If the
ratio of the tissue is greater than the prevalence, the majority
will be oversampled so that the ratio of positive and negative
cases is equal to the prevalence ( healthy, micronodules).
E. Model selection
The selection of the model was inspired by the experimen
tal setup proposed in [28]. We have chosen 5 starting points
for the gradient descent. These 5 starting points were applied
to 5 random training files. The parameters obtained with
an initialization point providing the least hyperparameter
variance are considered. The median is taken as a new
starting point and is evaluated on the whole training set by the
means of a leave–one–patient–out (LOPO) procedure [29].
The final parameters are the median of those obtained from
this last step. We analyze the error rate, the sensitivity, the
specificity and the precision of the prediction on test sets.
As we are in a multi–class classification, we use the one–
against–all procedure.
F. Model comparisons
In many classification projects, the accuracy is chosen as
the main performance criterion of a model. With imbalanced
datasets, we have to take into account the ability of the
classifier to predict the examples of each class (sensitivity,
specificity and precision). These four metrics are used to
measure the performance of each model. To assess the mul
ticlass performance of the algorithms, the geometric mean is
computed as follows:
Ageom=
Nclass
?
?
?
?
Nclass
?
i=1
Al,
(12)
with Nclassthe number of classes and Althe class–specific
accuracies.
To evaluate the best strategy for our dataset, we carried out
a McNemar test with Bonferroni correction on the prediction
results. This test measures if the predictions made with 2
models are significantly different from the statistical point
of view. We also use the area under the receiving operator
curve (AUC) to rank the three strategies. Each strategy is
assigned a score from 3 to 1 (best to worst) according to the
AUC value.
IV. RESULTS
Results of the model selection and associated classifica
tion performance obtained with the various techniques are
described in this section.
A. Model selection using the gradient descent
For the model selection, we used five starting points (C,σ)
on five random training sets: (1,1), (5,5), (5,1), (1,5) and
(10,1). Among these five initialization points, the first four
converged around the same region but a few initialization
Page 6
TABLE II
PERFORMANCE ON consolidation VS. ALL CLASSES.
Consolidation
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.02
0.39
0.99
0.16
0.23
0.69
ASVM
0.02
0.40
0.99
0.16
0.23
0.69
ASVM + RES
0.02
0.40
0.99
0.16
0.23
0.69
TABLE III
PERFORMANCE ON emphysema VS ALL CLASSES.
Emphysema
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.22
0.45
0.79
0.06
0.10
0.62
ASVM
0.22
0.46
0.79
0.06
0.10
0.62
ASVM + RES
0.22
0.46
0.78
0.06
0.10
0.62
points provide high variance on C. The median of these
20 intermediate parameters was used as a starting point
on a LOPO cross–validation to obtain the final parameters.
The algorithm converges after 7 to 18 iterations and the
computation of these parameters varies from 10 minutes to
24 hours depending on the size of the resampled training
data.
B. Classification performance
The Tables II, III, IV, V, VI and VII summarize the
classification results obtained using the three models. AUC
obtained with the various techniques are summarized in
Figure 4. The best Ageomvalue of 0.752 was obtained using
the ASVM+RES approach. It is followed by ASYM with
Ageom= 0.749 and worst performance is obtained with BAL
with Ageom= 0.746.
V. DISCUSSION
The convergence of the four initialization points to the
same region indicates a consistency of the use of the RMB
TABLE IV
PERFORMANCE ON fibrosis VS. ALL CLASSES.
Fibrosis
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.21
0.80
0.79
0.50
0.61
0.79
ASVM
0.19
0.77
0.82
0.53
0.63
0.79
ASVM + RES
0.20
0.79
0.80
0.51
0.62
0.80
TABLE V
PERFORMANCE ON ground glass VS. ALL CLASSES.
Ground glass
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.46
0.86
0.48
0.23
0.36
0.67
ASVM
0.46
0.87
0.48
0.23
0.37
0.68
ASVM + RES
0.45
0.88
0.5
0.24
0.38
0.69
TABLE VI
PERFORMANCE ON healthy VS. ALL CLASSES.
Healthy
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.24
0.95
0.71
0.45
0.61
0.83
ASVM
0.23
0.95
0.72
0.46
0.62
0.83
ASVM + RES
0.23
0.95
0.72
0.46
0.62
0.84
TABLE VII
PERFORMANCE ON micronodules VS. ALL CLASSES.
Micronodules
Error
Sensitivity
Specificity
Precision
F–measure
AUC
BAL
0.31
0.55
0.79
0.64
0.59
0.67
ASVM
0.32
0.53
0.79
0.63
0.57
0.66
ASVM + RES
0.30
0.45
0.87
0.69
0.54
0.66
for SVM hyperparameters estimation. The computation time
depends primarily on the size of the training set. During
the model selection process i.e. LOPO cross–validation, the
SVMs were run only 2 times per fold and the algorithm
converged after 7 to 18 iterations.
With respect to the multiclass performance, the geometric
mean ranked the third strategy (ASVM+RES) as the best
choice for the lung tissue classification especially for the
fibrosis, ground glass and healthy tissues (see for example the
figure 4). The McNemar statistical test on each pair of these
strategies indicates that there are no significant differences
in the results obtained with the three strategies for the clas
sification of consolidation and emphysema. The McNemar
test also indicates no significant difference between ASVM
and ASVM+RES for the classification of healthy tissue.
Table VIII summarizes the ranking of the three strategies
according to the value of the AUC for four tissue types.
The choice of the AUC to rank the strategies was taken
because it takes into account the sensitivity and the specificity
of the classifier i.e. the ratio of true positive and true neg
ative cases. Depending on the final use of the classification
models, the ranking in Table VIII may not hold anymore.
For instance, if the f–measure was used to rank the three
strategies as in Table VIII, the ASVM and ASVM+RES
strategies would have the same rank. A ranking according
to the f–measure would agree with a ranking based on AUC
except for fibrosis where ASVM has the highest f–measure
(i.e. would be ranked as the best) even if it has the lowest
TABLE VIII
RANKING OF THE THREE STRATEGIES ACCORDING TO THE AUC
VALUES.
BAL
1
1
1
3
6
ASVM
2
2
2
2
8
ASVM+RES
3
3
3
1
10
fibrosis
ground glass
healthy
micronodules
Total ranking
Page 7
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
healthyemphysema ground glassfibrosismicronodules consolidation
AUC
No diff.
BAL
ASVM
ASVM + RES
Fig. 4.
bars at the top of the histogram indicate if the two results has no significant
difference according to the McNemar statistical test.
AUC values obtained with the various techniques. The horizontal
sensitivity.
The BAL strategy outperforms than other strategies for the
classification of micronodules. The asymmetric–margin of
the SVM altered the sensitivity of the model. The addition of
synthetic majority examples combined with an asymmetric–
margin SVM has further deteriorated the sensitivity of the
model but improved the precision and the specificity of the
classifier. A possible explanation of this phenomenon is the
existence of outliers in the micronodules examples, which
were misclassified with the ASVM and ASVM+RES and
thus decreasing the sensitivity and the precision.
The results obtained in the classification of healthy tissue
is of particular interest in clinical practice where the three
models have sensitivity equal to 95%. Indeed, an ASVM
model for the classification of healthy tissue can be used to
detect abnormal tissue types in HRCT.
Table II highlights the question of using the accuracy as a
performance metric in the classification of imbalanced data
sets. According to this table, the accuracy is around 98%
but the algorithm correctly classified 99% of the negative
examples while the sensitivity and precision on positive
examples are low.
The precision for the classification of consolidation, em
physema and ground glass are very low and more investiga
tions are needed to improve the results. This is probably
due to the characteristics of these tissues: they are not
very specific according to a discrimination measure of these
tissues against the rest. The latter was carried out with the
Rayleigh quotient (ratio of the between–class variance and
the within–class variance) on the dataset projected into the
principal component axis. The figure 5 highlights the high
disparity of the consolidation in the principal components
space.
Many investigations will be carried out in the future to
improve the classification of the three lung tissues cited in
the previous paragraph. Increasing the number of examples
of these tissues may provide better improvement of the clas
40002000
0 2000 4000 6000 8000
10000
12000
14000
160006000
4000
2000
0
2000
4000
6000
8000
5000
4000
3000
2000
1000
0
1000
2000
3000
4000
5000
Class 1
Class +1
Fig. 5.
principal component axes. The consolidation examples are labelled as +1.
Projection of the consolidation vs all dataset on the first three
sification results. The consolidation tissues, for examples, are
from younger patients compared to the other patients in the
database. Another avenue for future investigation is to play
with the oversampling ratio. The computational speed of the
gradient descent method (compared to a fine–grained grid–
search) and the stability of the model will allow us to carry
out more experimentations with respect to the oversampling
ratio of the minority classes. The addition of clinical features
may also increase the performance of the classification. The
use of kernel–based algorithm to transform the input space
into a linearly separable dataset did not provide good results
and it is possible that the RBF kernel is not wellsuited for
the classification of these tissues. Other approaches allowing
the selection of the appropriate kernel also figures in the list
of future works [30].
VI. CONCLUSIONS
Wepresentedinthispapertheeffectivenessof
asymmetric–margin SVMs for imbalanced lung tissue clas
sification. The introduction of prior knowledge of the preva
lence of the patients in the database to correct the ratio of the
examples improved the results with the algorithm. Artificial
cases were created according to the k–means algorithm.
The conventional SVM was only better in the classification
of micronodules due to the presence of outliers in the
examples. While the results obtained with ASVM for the
classification of fibrosis and healthy tissues are satisfactory,
more investigations are needed for the classification of con
solidation, emphysema, ground glass and also micronodules.
Increasing the number of cases, varying the oversampling
ratio, addition of clinical features and selection of appropriate
kernel for each classification are the most important for future
investigations.
ACKNOWLEDGMENT
This work was supported by the Swiss National Sci
ence Foundation (FNS) with grant 200020–118638/1 and
Page 8
the equalization fund of Geneva University Hospitals and
University of Geneva (grant 05–I–13 and 05–9–II)
REFERENCES
[1] K. R. Flaherty, T. E. King, J. Ganesh Raghu, J. P. Lynch III, T. V.
Colby, W. D. Travis, B. H. Gross, E. A. Kazerooni, G. B. Toews,
Q. Long, S. Murray, V. N. Lama, S. E. Gay, and F. J. Martinez, “Idio
pathic interstitial pneumonia: What is the effect of a multidisciplinary
approach to diagnosis?” American Journal of Respiratory and Critical
Care Medicine, vol. 170, pp. 904–910, July 2004.
[2] W. R. Webb, N. L. M¨ uller, and D. P. Naidich, Eds., High–Resolution
CT of the Lung.Philadelphia, PA, USA: Lippincott Williams &
Wilkins, 2001.
[3] S. Delorme, M.A. KellerReichenbecher, I. Zuna, W. Schlegel,
and G. Van Kaick, “Usual interstitial pneumonia: Quantitative
assessment of high–resolution computed tomography findings by
computer–assisted texture–based image analysis,” Investigative Radi
ology, vol. 32, no. 9, pp. 566–574, September 1997.
[4] C.R. Shyu, C. E. Brodley, A. C. Kak, A. Kosaka, A. M. Aisen, and
L. S. Broderick, “ASSERT: A physician–in–the–loop content–based
retrieval system for HRCT image databases,” Computer Vision and
Image Understanding (special issue on content–based access for image
and video libraries), vol. 75, no. 1/2, pp. 111–132, July/August 1999.
[5] R. Uppaluri, E. A. Hoffman, M. Sonka, G. W. Hunninghake, and
G. McLennan, “Interstitial lung disease: A quantitative study using the
adaptive multiple feature method,” American Journal of Respiratory
and Critical Care Medicine, vol. 159, no. 2, pp. 519–525, February
1999.
[6] I. C. Sluimer, P. F. van Waes, M. A. Viergever, and B. van Ginneken,
“Computer–aided diagnosis in high resolution CT of the lungs,”
Medical Physics, vol. 30, no. 12, pp. 3081–3090, December 2003.
[Online]. Available: http://link.aip.org/link/?MPH/30/3081/1
[7] F. Chabat, G.Z. Yang, and D. M. Hansell, “Obstructive lung
diseases: Texture classification for differentiation at CT,” Radiology,
vol. 228, no. 3, pp. 871–877, September 2003. [Online]. Available:
http://radiology.rsnajnls.org/cgi/content/abstract/228/3/871
[8] Y. Uchiyama, S. Katsuragawa, H. Abe, J. Shiraishi, F. Li, Q. Li, C.T.
Zhang, K. Suzuki, and K. Doi, “Quantitative computerized analysis
of diffuse lung disease in high–resolution computed tomography,”
Medical Physics, vol. 30, no. 9, pp. 2440–2454, September 2003.
[9] T. Zrimec and J. S. J. Wong, “Improving computer aided disease
detection using knowledge of disease appearance,” in MEDINFO
2007. Proceedings of the 12th World Congress on Health (Medical)
Informatics, vol. 129. IOS Press, August 2007, pp. 1324–1328.
[10] A. Depeursinge, D. Sage, A. Hidki, A. Platon, P.A. Poletti, M. Unser,
and H. M¨ uller, “Lung tissue classification using Wavelet frames,” in
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th
Annual International Conference of the IEEE.
Computer Society, August 2007, pp. 6259–6262.
[11] N. Japkowicz and S. Stephen, “The class imbalance problem: A
systematic study,” Intelligent Data Analysis Journal, vol. 6, no. 5,
November 2002.
[12] M. Kubat and S. Matwin, “Addressing the curse of imbalanced training
sets: one–sided selection,” in Proceedings of the 14th International
Conference on Machine Learning.
179–186.
[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
“Smote: Synthetic minority oversampling technique,” Journal of Ar
tificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[14] G.Cohen,M.Hilario,
A. Geissbuhler, “Learning from imbalanced data in surveillance
ofnosocomialinfection,”
Artificial
vol.37,no.1,pp.7–18,
http://dx.doi.org/10.1016/j.artmed.2005.03.002
[15] H. Han, W. Wang, and B. Mao, “Borderlinesmote: A new over
sampling method in imbalanced data sets learning,” in International
Conference on Intelligent Computing (ICIC), 2005, pp. 878–887.
[16] J. V. Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental
perspectives on learning from imbalanced data,” in ICML, 2007, pp.
935–942.
[17] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the sensi
tivity of support vector machines,” in Proceedings of the International
Joint Conference on AI, 1999, pp. 55–60.
Lyon, France: IEEE
Morgan Kaufmann, 1997, pp.
H.Sax,S.Hugonnet, and
Intelligence
2006.
in Medicine,
Available:May[Online].
[18] G. Wu and E. Y. Chang, “Classboundary alignment for imbalanced
dataset learning,” in In ICML 2003 Workshop on Learning from
Imbalanced Data Sets, 2003, pp. 49–56.
[19] Y.J. Lee and O. L. Mangasarian, “Ssvm: A smooth support vector ma
chine for classification,” Computational optimization and applications,
vol. 20, no. 1, pp. 5–22, 2001.
[20] O. Chapelle, “Training a support vector machine in the primal,” Neural
Computation, vol. 19, pp. 1155–1178, 2007.
[21] V. Vapnik, Statistical learning theory.
[22] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing
multiple parameters for support vector machines,” Machine Learning,
vol. 46, no. 1, pp. 131–159, 2002.
[23] K. Morik, M. Imhoff, P. Brockhausen, T. Joachims, and U. Gather,
“Knowledge discovery and knowledge validation in intensive care,”
Artificial Intelligence in Medicine, vol. 19, no. 3, pp. 225–249, 2000.
[24] K.M. Chung, W.C. Kao, C.L. Sun, L.L. Wang, and C.J. Lin,
“Radius margin bounds for support vector machines with the rbf
kernel,” Neural Computation, vol. 15, no. 11, pp. 2643–2681, 2003.
[25] M. Unser, “Texture classification and segmentation using wavelet
frames,” IEEE Transactions on Image Processing, vol. 4, no. 11, pp.
1549–1560, November 1995.
[26] A. Depeursinge, D. Van De Ville, M. Unser, and H. M¨ uller, “Lung
tissue analysis using isotropic polyharmonic B–spline wavelets,” in
MICCAI 2008 Workshop on Pulmonary Image Analysis, New York,
USA, September 2008, pp. 125–134.
[27] D. Van De Ville, T. Blu, and M. Unser, “Isotropic polyharmonic
B–Splines: Scaling functions and wavelets,” IEEE Transactions on
Image Processing, vol. 14, no. 11, pp. 1798–1813, November 2005.
[28] G. Rtsch, T. Onoda, K.R. Mller, and T. O. Gmd, “Soft margins for
adaboost,” Journal of Machine Learning, vol. 42, no. 3, pp. 287–320,
1998.
[29] M. Dundar, G. Fung, L. Bogoni, M. Macari, A. Megibow,
andB.Rao, “Amethodology
a CAD system and potential pitfalls,” International Congress
Series,vol.1268,pp.1010–1014,
– Computer Assisted Radiology and Surgery. Proceedings of
the 18th International Congress and Exhibition. [Online]. Avail
able: http://www.sciencedirect.com/science/article/B75814CHRSVD
6S/2/06d1476fa7e0028d30aa5db70037f836
[30] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “More
efficiency in multiple kernel learning,” in ICML, 2007, pp. 775–782.
Wiley, New York, NY, 1998.
for trainingandvalidating
June2004, cARS2004
Supplementary resources (1)

WCCI2010 asymSVM