Page 1

Asymmetric–margin support vector machines for lung tissue

classification

Jimison Iavindrasana, Adrien Depeursinge, Gilles Cohen, Antoine Geissbuhler and Henning M¨ uller

Abstract—This paper concerns lung tissue classification using

asymmetric–margin support vector machine (ASVM) to handle

the imbalance of the positive and negative classes in a one–

against–all multiclass classification problem. The hyperparam-

eters of the algorithm are obtained using an optimization of

the upper bound of the leave–one–out error of the ASVM. The

ASVM is applied on the dataset with its original distribution

and oversampled so that the ratio of the examples is equal to

the prevalence of patients having the tissue in the database.

The two versions of the ASVM models were compared with a

model build with a conventional SVM. The ASVM improved the

results obtained with a conventional SVM. The incorporation

of prior knowledge concerning the prevalence of the patients

improved the results obtained with ASVM.

I. INTRODUCTION

Interstitial lung diseases (ILD) form a heterogeneousgroup

of diseases containing more than 150 disorders of the lung

tissue. Many of the diseases are rare and present unspecific

symptoms. During the diagnostic process, all available in-

formation including the patient’s personal data, medication,

past medical history, host risk factors and laboratory tests

(e.g. pulmonary function tests, hematocrit, ...) are metic-

ulously analyzed to find any indicator of the presence of

an ILD. Beside the patient’s clinical data, imaging of the

chest allows to resolve an ambiguity in a large number

of cases by enabling the visual assessment of the lung

tissue [1]. The most common imaging modality used is the

chest X–ray because of its low cost and radiation dose. It is

of sometimes of limited usefulness for the characterization

of lung tissue as these are overlaid with other anatomi-

cal structures, making the reading sometimes difficult. The

gold standard imaging technique used in case of doubt is

the high–resolution computed tomography (HRCT), which

provides three–dimensional images of the lung tissue with

high spatial resolution. Most of the histological diagnoses

of ILDs are associated with a given combination of image

findings (i.e. abnormal lung tissue) [2]. The most common

lung tissue patterns are emphysema, ground glass, fibrosis,

micronodules and consolidation. These are characterized by

distinct texture properties in HRCT imaging. The detection

and characterization of the lung tissue patterns in HRCT is

time–consuming and requires experience. In order to reduce

the risk of omission of important tissue lesions and to ensure

the reproducibility of image interpretation, computer–aided

Theauthorsarewith

of

theService

Rue

(phone: +41 22 372 8874; email:

Henning

of

Gabrielle–Perret–Gentil

MedicalInformatics,

University

1211 Geneva

jimison.iavindrasana@sim.hcuge.ch).

also with the HES–SO Valais, TechnoArk 3, 3960 Sierre, Switzerland

Hospitals

14, Switzerland

Geneva, 4,

M¨ uller is

Fig. 1. Construction of the block instances from manually delineated ROIs.

diagnosis (CAD) was proposed several times for HRCT of

the lung [3], [4], [5], [6], [7], [8], [9], [10]. The typical

approaches use supervised machine learning to draw decision

boundaries in feature spaces spanned by texture attributes.

The reported performance of these approaches suggests that

these systems have the potential to be valuable tools in clin-

ical routine by providing second opinions to the clinicians.

However, the CAD system must include a sufficient number

of classes of lung tissue to cover the heterogeneous visual

findings associated with ILDs. A CAD system, which aims

at detecting one single lung tissue pattern is of limited use

as the radiologist still needs to look for other pathological

lung tissue patterns in the image series.

A major performance problem of multi–class CAD sys-

tems is the challenge of learning from highly imbalanced

datasets. In a given dataset of HRCT images of patients

affected with ILDs, the instances consist of manually de-

lineated regions of interest (ROI) showing examples of the

lung tissue patterns that are cut out into square blocks that

may overlap or not (see Figure 1). The resulting distributions

of the classes are depending both on the prevalence of

each lung tissue sort and the average ROI surface and can

as a consequence be highly imbalanced. Although equal

sensitivity and specificity are needed among the classes, most

of the machine learning techniques favor performance of

the majority class and research efforts are needed to ensure

balanced performances among all classes.

In this article, support vector machines with asymmetric

margins (ASVM) are used to classify the ROIs.

The paper is structured as follows: Section 2 introduces

the method for handling imbalanced datasets with SVMs and

the estimation of the SVM/ASVM hyperparameters. Section

Page 2

3 details the materials and the classification method used

followed by the presentation of the results in Section 4.

A discussion of the results is found in Section 5 and a

conclusion and future ideas in Section 6.

II. LEARNING IMBALANCED DATA SETS WITH ASVM

After a short review of the techniques to handle imbal-

anced datasets the SVM algorithm is introduced followed by

details on the built–in dataset imbalance management and a

method to estimate the SVM/ASVM hyperparameters.

A. Imbalanced learning approaches in the literature

Two main approaches were proposed in the literature to

manage imbalanced datasets: the resampling strategy and

the algorithm–based strategy [11]. The first one is a data–

driven strategy and performed by down–sampling the ma-

jority class or oversampling the minority class. This method

has many variants with respect to the resampling technique:

resampling at random, undersampling by removing redun-

dant or noisy majority examples [12], oversampling with

synthetic examples drawn using clustering algorithms [13]

and [14], oversampling positive examples located near the

decision function [15]. A comparative study of the available

resampling strategies was carried out by [16] with the C4.5

algorithm and the random undersampling and oversampling

methods outperforming all resampling strategies. The second

strategy (algorithm–driven) consists of altering the misclas-

sification cost of the classes such as in [17] or altering the

data representation to achieve a high separability of the data

(see for example [18]). A comparison of the resampling and

cost–sensitive strategy with SVM can be found in [14]. In this

comparative study, the SVM with asymmetric margins was

used and it outperformed the resampling technique (com-

bination of undersampling and oversampling with artificial

examples generated with the k–means algorithm).

B. Support Vector Machines

A maximum margin classifier looks for an optimal hyper-

plane separating the training dataset such as the distance of

the training points to the optimal hyperplane is maximized.

This assumes that the training data are separable. Finding this

optimal hyperplane is equivalent to resolving the following

quadratic optimization problem:

min(wTw)s.t. yi(wTxi+ b) ≥ 1

(1)

where w is a vector perpendicular to the hyperplane, b is

a scalar value, {xi,yi}N

?d,yi∈ {−1,1},N the number of examples and d the num-

bers of variables). If certain conditions hold and using the

Lagrangian formulation, the previous problem is equivalent

to its dual, which is a quadratic optimization problem and

which can be solved using several optimization techniques:

i=1are the training points (xi ∈

maxα≥0

θ(α) =

?

iαi−1

?

2

?

i

?

jαiαjyiyjxT

ixj (2)

s.t.

iαiyi= 0

where αi are the Lagrangian multipliers. A support vector

machine (SVM) is a maximum margin classifier, which uses

only the points on both sides of the margin or support vectors

(points xifor which αi> 0) to build a model.

For a non–separable training data set, penalty variables

ξi are introduced to soften the constraints of the maximum

margin formulation (1). The penalty variables ξi are drawn

as follows: 0 < ξi≤ 1 if the points are on the correct side of

the hyperplane and ξ > 1 if the point is on the wrong side.

A cost variable C is also introduced to control the trade–

off between the width of the margin and the points within

the margin. The final goal of the SVM classifier is then to

maximize the margin while minimizing the total sum of the

penalties and thus equation (1) becomes:

min

s.t.

(wTw) + C?

yi(wTxi+ b) ≥ 1 − ξi, ξi≥ 0

The primal formulation of the SVM can be solved us-

ing [19] for separable cases (Eq. (1)) and [20] and for non–

separable cases (Eq. (2)). However, the dual problem is often

solved because the duality theory provides a convenient way

to deal with the constraints. The dual optimization problem

can also be written in terms of the dot products permitting the

use of the kernel functions. The kernel trick allows to apply

the maximum margin algorithm to a transformed version of a

non–separable dataset (feature space) via a mapping function

φ. The related dual problem can be expressed as

iξp

i

(3)

maxα

2αTe − αT?G(K) +1

α ≥ 0, αTy = 0

CIn

?

(4)

st

where e is the n–vector of ones, α ∈ ?n, G(K) the Gram

matrix is defined by Gij(K) = [K]ijyiyj = k(xi,xj)yiyj,

In, which is a diagonal matrix of 1 and α ≥ 0 means

αi ≥ 0, i = 1,...,n. The transformation function φ is

integrated into the definition of the Gram matrix. According

to Mercer’s theorem, (3) can be expressed by transforming

the input data with φ and taking the dot product to define

the kernel or taking directly any kernel and using it without

knowing the function φ. One kind of such kernels can

be the Gaussian kernel (also called radial basis functions

(RBF) kernel) expressed as K(xi,xj) = φ(xi)Tφ(xj) =

e−?xi−xj?2/(2σ2). For such a kernel, the misclassification

cost C and the kernel hyperparameter σ require optimization.

The graph (A) of Figure 3 illustrates the data classification

with SVM in a feature space.

Many researchers consider SVMs as one of the best classi-

fication algorithm due to its theoretical foundation based on

structural risk minimization implying a better generalization

performance [21]. However, SVMs may provide bad results

if it is used with the wrong parameters. The usual way to find

the parameters of SVMs is to scan a range of possible values

of the parameters, evaluate the classifier with a data splitting

such as cross–validation or a bootstrapping procedure and

then select those providing the best performance. A better

method to measure the generalization performance is to

evaluate SVMs with the leave–one–out procedure during the

grid search. These processes are expensive with respect to

computation time because they require an SVM resolution

Page 3

at each step. A more efficient way to choose the SVM

parameters is to take advantage of the underlying theory

using the bound of the leave–one–out error.

For the SVM with an RBF kernel and in the case of non–

separable training data (hard margin SVM), Vapnik showed

that the leave–one–out error is upper bounded by 4R2?w?2

(the radius margin bound) [21]. R is the radius of the

smallest sphere containing all φ(xi) and is the solution of

the following optimization problem:

maxβ

1 − βTKβ

0 ≤ βii = 1,...,n

eTβ = 1

st

This bound of the leave–one–out error can be used to

estimate the parameter σ of the RBF kernel and the soft

margin parameter C. The readers are referred to [22] for a

survey of the SVM error bounds estimation.

C. SVM with Asymmetrical Misclassification Cost

The SVM formulations above (Eq. 2) mean that the

misclassification cost of positive and negative examples (in

the case of a binary classification) are the same. This SVM

formulation may be incongruous for problems with high

imbalance between classes or those whose error penalty is

not the same for each class.

The SVM algorithm implements natively a cost–sensitive

strategy. For this purpose, two misclassification costs (C+

for yi= +1 and C−for yi = −1) are introduced. In this

case, the primal formulation of the SVM is:

minw,b,ξ

?w,w? + C+?

yi(?w,Φ(xi)? + b) ≥ 1 − ξ+

yi(?w,Φ(xi)? + b) ≤ −1 − ξ−

ξi≥ 0, i = 1,...,n

where i+= {i|yi= +1} and i−= {i|yi= −1}

The corresponding dual form is:

i∈i+(ξ+

i)2+ C−?

i∈i−(ξ−

i)2(5)

st

i, i ∈ i+

i, i ∈ i−

maxα

2αTe − αT?G(K) +

1

C+I+

n+

1

C−I−

n

?

(6)

stα ≥ 0, αTy

where e, α and G(K) has the same expression as in (Eq. 2)

and I+

i+(resp. i ∈ i−) and 0 elsewhere. It is important to highlight

that for an identical misclassification cost for positive and

negative examples, we obtain the formulation (Eq. 1) with

(Eq. 3). Figure 3 illustrates the differences between SVM

and ASVM.

Other approaches were proposed in the literature to handle

imbalanced datasets with SVMs. A naive post–processing

method consists of shifting the separating hyperplane far

away from the positive examples. Another one, proposed in

[17], applies a conformal transformation in the feature space

to achieve a high separability of the training data.

n(resp. I+

n) is a diagonal matrix composed by 1 for i ∈

Fig. 2.Illustration of an SVM.

Fig. 3.

problem. Squares represent positive and circles negative examples; dark

symbols stand for training and grey for test examples. The graph (A) shows

the decision boundary induced by a conventional SVM. The graph (B)

shows the new boundary obtained by introducing two cost hyperparameters

respectively for positive and negative examples. Notice the two positive

examples (grey squares), which are misclassified in (A) and correctly

classified in (B) and also the direction of the vector w?perpendicular to

the separating hyperplane in the graph (B).

Illustration of the asymmetric–margin SVM on a toy classification

D. Radius margin bound

As stated above, the radius margin bound (RMB) proposed

by Vapnik is for hard–margin SVMs. To obtain the radius

margin bound of the soft–margin SVM, the soft–margin

should be casted into the hard margin formulation which is

achieved using the following change:

˜ w ≡

?

w

√Cξ

?

and set the i–th training data as

?φ(xi)

yiei

√C

?

Page 4

The kernel function becomes ˜K(xi,xj) = K(xi,xj) +

δij/C, where δij = 1 if i = j and 0 otherwise. The new

radius margin bound is˜R2?˜ w?2, where˜R2is the objective

value of

maxβ

1 +1

0 ≤ βi,i = 1,...,n

eTβ = 1

C− βT?K +1

C

?β

(7)

st

To solve the asymmetric–margin problem, all SVM solvers

use only one value of C and balance the misclassification

with weights to obtain the value of C+and C−. To exploit

the radius–margin for asymmetrical misclassification cost,

we introduce the following relation: C = C++ C−=

w+C + w−C and w++ w− = 1 i.e. the cost asymmetry

is taken only into account during the SVM optimization

problem resolution. We also use the heuristic proposed by

Morik et al.: the potential total cost of the false positives

equals the potential total cost of the false negatives i.e. the

costs C+and C−conform to the relation in Eq. 6 [23].

Thus, we obtain the value of w+ =

where N+,N−and N are respectively the number of positive,

negative and all training examples. With these weights, it

is now possible to introduce a higher cost when the SVMs

misclassify positive examples compared to a misclassification

of negative examples.

N−

N

and w− =

N+

N

C+

C−=number of negative training example

E. Gradient descent algorithm and SVM model selection

Optimization is generally concerned with the minimization

(or maximization) of a function of which parameters are

subject to one or more functional constraint. f is named

a continuously differentiable function. A gradient descent

algorithm is a method to solve an optimization problem.

It looks iteratively (until a stop criterion is reached) for a

direction dk∈ ?nand dk?= 0 from a starting point x0∈ ?n

satisfying:

number of positive training examples=N−

N+

(8)

∀k > 1,∃? > 0 such as ∇f(xk)dk≤ −??∇f(xk)??dk? (9)

There are many ways to define the descent direction dk. One

strategy, known as the Newton method, is to use

dk= −∇2f(xk)−1∇f(xk)

(10)

The computation of the second derivative (Hessian) of f

is computationally expensive. The quasi–Newton method

approximates the Hessian at each iteration. The algorithm of

the hyperparameter selection using the quasi Newton method

is shown below as introduced in [22].

The radius margin bound of the L2–SVMs (p = 2 in (3))

with the RBF kernel being continuously differentiable with

respect to the parameters C and σ. Thus, the optimal param-

eter can be computed using the gradient descent algorithm

according to the following:

∂BRML2

∂Vt

=

∂(R2?w?2)

∂Vt

= ?w?2 ∂R2

∂Vt+ R2∂(?w?2)

∂Vt

(11)

Algorithm for model selection using the gradient descent algorithm

1.

2.

Initialize SVM hyperparameters

Solve SVM problem using a standard SVM

algorithm

Minimize the RMB according to the values

of the Lagrangian multipliers with a

gradient descent algorithm

Go to step 2 or stop when the minimum of

the RMB is reached

3.

4.

where Vtis the t–th parameter of the L2–SVM.

For V = (C,σ2),

∂ ?w?2

∂C

∂ ?w?2

∂σ2

∂R2

∂C

∂R2

∂σ2=

∂˜k(xi,xj)

∂σ2

III. MATERIAL AND METHODS

This section introducea the dataset we are using for the

experimentations, the software used for the extraction of

features, the SVM software, the features build from the

HRCT images and the learning process. The latter includes

the implementation of the resampling methods, the model

selection process and the details of the metric used to assess

the quality of the classification.

=

?n

i=1α2

i/C2

=

?n

i,j=1αiαjyiyj

∂˜k(xi,xj)

∂σ2

=

?n

?n

˜k(xi,xj)?xi−xj?2

i=1βi(1 − βi)/C2

i,j=1βiβj

∂˜k(xi,xj)

∂σ2

=

2σ4

A. Dataset

The dataset used in this work is extracted from an in–house

multimedia collection of cases at the University Hospitals

of Geneva (HUG). The diagnoses of each ILD cases was

confirmed by a biopsy or an equivalent test (e.g. bron-

choalveolar lavage, tuberculin skin test, Kveim test, ...). For

each collected patient, 99 clinical parameters associated with

13 of the most frequent diagnoses of ILDs were collected

from the electronic health record (EHR), describing the

patient’s clinical state at the time of the stay when the HRCT

image series were acquired. The lung tissue patterns related

to the ILD diagnosis were manually delineated in HRCT

images series (1mm slice thickness, no contrast agent) by

two experienced radiologists at the HUG. The distributions

of the 6 most represented tissue sorts are detailed in Table I

in terms of number of ROIs, volumes and number of block

instances obtained as shown in Figure 1. The size of the

blocks is 32 × 32 × 1 pixels.

B. Software

The image processing algorithms include wavelet–based

features and grey–level histograms and were implemented

in Java. The classification task is carried out with libSVM

implementing the SVM–L2 for binary classification [24].

Page 5

TABLE I

DISTRIBUTION OF THE CLASSES IN TERMS OF ROIS, VOLUMES AND

BLOCKS. THE NUMBER OF INSTANCES CORRESPONDS TO THE NUMBER

OF BLOCKS.

label

healthy

emphysema

ground glass

fibrosis

micronodules

consolidation

Total

ROIs

100

66

427

473

297

196

1559

volume (liters)

5.12 l

1.15 l

4.91 l

8.45 l

16.06 l

0.69 l

36.38 l

blocks

3043

422

2313

3113

6133

90

15114

patients

7

5

37

38

16

14

87

C. Texture features

The features used to characterize the texture properties

of the 6 lung tissue patterns are derived from grey–level his-

tograms and tailored wavelet transforms (WT). The resulting

feature space has a dimension of 46.

1) Grey–level histograms: Thanks to Hounsfield Units

(HU), the pixel values in HRCT images corresponds univo-

quely to the density of the observed tissue and thus contain

essential information for the characterization of the lung

tissue. To encode this information, 22 histogram bins of

grey–levels in the interval [−1050;600[ are used as texture

features. An additional feature related to the number of air

pixels is computed as the number of pixel values below -1000

HU.

2) Wavelet–based features: Near affine–invariant texture

features are derived from a tailored WT. A frame transform

is used to ensure translation–invariant descriptions of the

lung tissue patterns [10], [25]. Based on the assumption

that no predominant orientations are contained in the lung

tissue patterns a rotation–invariant nonseparable WT is im-

plemented using isotropic polyharmonic B–spline scaling

functions and wavelets [26], [27]. At last, an augmented

scale progression is obtained using the quincunx lattice for

upsampling the filters by a factor of

of the WT. Within each unique subband i, the wavelet

coefficients are characterized by a mixture of two Gaussians

with fixed means µi

wavelet–based features are thus generated by 8 iterations of

the WT.

√2 at each iteration

1,2= µiand distinct variances σi

1,2. 24

D. Imbalance management

The datasets we are using contain imbalance with respect

to the class distribution. Three strategies were implemented

to handle the imbalance. The first strategy implements the

data–driven method (BAL). A random down–sampling of the

majority class is carried out to obtain 50% of the positive

and negative cases during the model selection. The second

strategy uses the cost–sensitivity method (ASVM). The ratio

of the original dataset is kept during the model selection

process and the values of the cost hyperparameters were

adjusted according to the imbalance rate of the positive and

negative cases. The third strategy uses a combination of the

resampling and cost–sensitive methods (ASVM + RES). The

resampling level is based on the prevalence of the patient in

the database (see Table I). If the ratio of the tissue is less than

the prevalence, the examples of this class are oversampled

(consolidation, emphysema, fibrosis, ground glass). If the

ratio of the tissue is greater than the prevalence, the majority

will be oversampled so that the ratio of positive and negative

cases is equal to the prevalence ( healthy, micronodules).

E. Model selection

The selection of the model was inspired by the experimen-

tal setup proposed in [28]. We have chosen 5 starting points

for the gradient descent. These 5 starting points were applied

to 5 random training files. The parameters obtained with

an initialization point providing the least hyperparameter

variance are considered. The median is taken as a new

starting point and is evaluated on the whole training set by the

means of a leave–one–patient–out (LOPO) procedure [29].

The final parameters are the median of those obtained from

this last step. We analyze the error rate, the sensitivity, the

specificity and the precision of the prediction on test sets.

As we are in a multi–class classification, we use the one–

against–all procedure.

F. Model comparisons

In many classification projects, the accuracy is chosen as

the main performance criterion of a model. With imbalanced

datasets, we have to take into account the ability of the

classifier to predict the examples of each class (sensitivity,

specificity and precision). These four metrics are used to

measure the performance of each model. To assess the mul-

ticlass performance of the algorithms, the geometric mean is

computed as follows:

Ageom=

Nclass

?

?

?

?

Nclass

?

i=1

Al,

(12)

with Nclassthe number of classes and Althe class–specific

accuracies.

To evaluate the best strategy for our dataset, we carried out

a McNemar test with Bonferroni correction on the prediction

results. This test measures if the predictions made with 2

models are significantly different from the statistical point

of view. We also use the area under the receiving operator

curve (AUC) to rank the three strategies. Each strategy is

assigned a score from 3 to 1 (best to worst) according to the

AUC value.

IV. RESULTS

Results of the model selection and associated classifica-

tion performance obtained with the various techniques are

described in this section.

A. Model selection using the gradient descent

For the model selection, we used five starting points (C,σ)

on five random training sets: (1,1), (5,5), (5,1), (1,5) and

(10,1). Among these five initialization points, the first four

converged around the same region but a few initialization

Page 6

TABLE II

PERFORMANCE ON consolidation VS. ALL CLASSES.

Consolidation

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.02

0.39

0.99

0.16

0.23

0.69

ASVM

0.02

0.40

0.99

0.16

0.23

0.69

ASVM + RES

0.02

0.40

0.99

0.16

0.23

0.69

TABLE III

PERFORMANCE ON emphysema VS ALL CLASSES.

Emphysema

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.22

0.45

0.79

0.06

0.10

0.62

ASVM

0.22

0.46

0.79

0.06

0.10

0.62

ASVM + RES

0.22

0.46

0.78

0.06

0.10

0.62

points provide high variance on C. The median of these

20 intermediate parameters was used as a starting point

on a LOPO cross–validation to obtain the final parameters.

The algorithm converges after 7 to 18 iterations and the

computation of these parameters varies from 10 minutes to

24 hours depending on the size of the resampled training

data.

B. Classification performance

The Tables II, III, IV, V, VI and VII summarize the

classification results obtained using the three models. AUC

obtained with the various techniques are summarized in

Figure 4. The best Ageomvalue of 0.752 was obtained using

the ASVM+RES approach. It is followed by ASYM with

Ageom= 0.749 and worst performance is obtained with BAL

with Ageom= 0.746.

V. DISCUSSION

The convergence of the four initialization points to the

same region indicates a consistency of the use of the RMB

TABLE IV

PERFORMANCE ON fibrosis VS. ALL CLASSES.

Fibrosis

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.21

0.80

0.79

0.50

0.61

0.79

ASVM

0.19

0.77

0.82

0.53

0.63

0.79

ASVM + RES

0.20

0.79

0.80

0.51

0.62

0.80

TABLE V

PERFORMANCE ON ground glass VS. ALL CLASSES.

Ground glass

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.46

0.86

0.48

0.23

0.36

0.67

ASVM

0.46

0.87

0.48

0.23

0.37

0.68

ASVM + RES

0.45

0.88

0.5

0.24

0.38

0.69

TABLE VI

PERFORMANCE ON healthy VS. ALL CLASSES.

Healthy

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.24

0.95

0.71

0.45

0.61

0.83

ASVM

0.23

0.95

0.72

0.46

0.62

0.83

ASVM + RES

0.23

0.95

0.72

0.46

0.62

0.84

TABLE VII

PERFORMANCE ON micronodules VS. ALL CLASSES.

Micronodules

Error

Sensitivity

Specificity

Precision

F–measure

AUC

BAL

0.31

0.55

0.79

0.64

0.59

0.67

ASVM

0.32

0.53

0.79

0.63

0.57

0.66

ASVM + RES

0.30

0.45

0.87

0.69

0.54

0.66

for SVM hyperparameters estimation. The computation time

depends primarily on the size of the training set. During

the model selection process i.e. LOPO cross–validation, the

SVMs were run only 2 times per fold and the algorithm

converged after 7 to 18 iterations.

With respect to the multiclass performance, the geometric

mean ranked the third strategy (ASVM+RES) as the best

choice for the lung tissue classification especially for the

fibrosis, ground glass and healthy tissues (see for example the

figure 4). The McNemar statistical test on each pair of these

strategies indicates that there are no significant differences

in the results obtained with the three strategies for the clas-

sification of consolidation and emphysema. The McNemar

test also indicates no significant difference between ASVM

and ASVM+RES for the classification of healthy tissue.

Table VIII summarizes the ranking of the three strategies

according to the value of the AUC for four tissue types.

The choice of the AUC to rank the strategies was taken

because it takes into account the sensitivity and the specificity

of the classifier i.e. the ratio of true positive and true neg-

ative cases. Depending on the final use of the classification

models, the ranking in Table VIII may not hold anymore.

For instance, if the f–measure was used to rank the three

strategies as in Table VIII, the ASVM and ASVM+RES

strategies would have the same rank. A ranking according

to the f–measure would agree with a ranking based on AUC

except for fibrosis where ASVM has the highest f–measure

(i.e. would be ranked as the best) even if it has the lowest

TABLE VIII

RANKING OF THE THREE STRATEGIES ACCORDING TO THE AUC

VALUES.

BAL

1

1

1

3

6

ASVM

2

2

2

2

8

ASVM+RES

3

3

3

1

10

fibrosis

ground glass

healthy

micronodules

Total ranking

Page 7

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

healthyemphysema ground glassfibrosismicronodules consolidation

AUC

No diff.

BAL

ASVM

ASVM + RES

Fig. 4.

bars at the top of the histogram indicate if the two results has no significant

difference according to the McNemar statistical test.

AUC values obtained with the various techniques. The horizontal

sensitivity.

The BAL strategy outperforms than other strategies for the

classification of micronodules. The asymmetric–margin of

the SVM altered the sensitivity of the model. The addition of

synthetic majority examples combined with an asymmetric–

margin SVM has further deteriorated the sensitivity of the

model but improved the precision and the specificity of the

classifier. A possible explanation of this phenomenon is the

existence of outliers in the micronodules examples, which

were misclassified with the ASVM and ASVM+RES and

thus decreasing the sensitivity and the precision.

The results obtained in the classification of healthy tissue

is of particular interest in clinical practice where the three

models have sensitivity equal to 95%. Indeed, an ASVM

model for the classification of healthy tissue can be used to

detect abnormal tissue types in HRCT.

Table II highlights the question of using the accuracy as a

performance metric in the classification of imbalanced data

sets. According to this table, the accuracy is around 98%

but the algorithm correctly classified 99% of the negative

examples while the sensitivity and precision on positive

examples are low.

The precision for the classification of consolidation, em-

physema and ground glass are very low and more investiga-

tions are needed to improve the results. This is probably

due to the characteristics of these tissues: they are not

very specific according to a discrimination measure of these

tissues against the rest. The latter was carried out with the

Rayleigh quotient (ratio of the between–class variance and

the within–class variance) on the dataset projected into the

principal component axis. The figure 5 highlights the high

disparity of the consolidation in the principal components

space.

Many investigations will be carried out in the future to

improve the classification of the three lung tissues cited in

the previous paragraph. Increasing the number of examples

of these tissues may provide better improvement of the clas-

-4000-2000

0 2000 4000 6000 8000

10000

12000

14000

16000-6000

-4000

-2000

0

2000

4000

6000

8000

-5000

-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

5000

Class -1

Class +1

Fig. 5.

principal component axes. The consolidation examples are labelled as +1.

Projection of the consolidation vs all dataset on the first three

sification results. The consolidation tissues, for examples, are

from younger patients compared to the other patients in the

database. Another avenue for future investigation is to play

with the oversampling ratio. The computational speed of the

gradient descent method (compared to a fine–grained grid–

search) and the stability of the model will allow us to carry

out more experimentations with respect to the oversampling

ratio of the minority classes. The addition of clinical features

may also increase the performance of the classification. The

use of kernel–based algorithm to transform the input space

into a linearly separable dataset did not provide good results

and it is possible that the RBF kernel is not well-suited for

the classification of these tissues. Other approaches allowing

the selection of the appropriate kernel also figures in the list

of future works [30].

VI. CONCLUSIONS

Wepresentedinthispapertheeffectivenessof

asymmetric–margin SVMs for imbalanced lung tissue clas-

sification. The introduction of prior knowledge of the preva-

lence of the patients in the database to correct the ratio of the

examples improved the results with the algorithm. Artificial

cases were created according to the k–means algorithm.

The conventional SVM was only better in the classification

of micronodules due to the presence of outliers in the

examples. While the results obtained with ASVM for the

classification of fibrosis and healthy tissues are satisfactory,

more investigations are needed for the classification of con-

solidation, emphysema, ground glass and also micronodules.

Increasing the number of cases, varying the oversampling

ratio, addition of clinical features and selection of appropriate

kernel for each classification are the most important for future

investigations.

ACKNOWLEDGMENT

This work was supported by the Swiss National Sci-

ence Foundation (FNS) with grant 200020–118638/1 and

Page 8

the equalization fund of Geneva University Hospitals and

University of Geneva (grant 05–I–13 and 05–9–II)

REFERENCES

[1] K. R. Flaherty, T. E. King, J. Ganesh Raghu, J. P. Lynch III, T. V.

Colby, W. D. Travis, B. H. Gross, E. A. Kazerooni, G. B. Toews,

Q. Long, S. Murray, V. N. Lama, S. E. Gay, and F. J. Martinez, “Idio-

pathic interstitial pneumonia: What is the effect of a multidisciplinary

approach to diagnosis?” American Journal of Respiratory and Critical

Care Medicine, vol. 170, pp. 904–910, July 2004.

[2] W. R. Webb, N. L. M¨ uller, and D. P. Naidich, Eds., High–Resolution

CT of the Lung.Philadelphia, PA, USA: Lippincott Williams &

Wilkins, 2001.

[3] S. Delorme, M.-A. Keller-Reichenbecher, I. Zuna, W. Schlegel,

and G. Van Kaick, “Usual interstitial pneumonia: Quantitative

assessment of high–resolution computed tomography findings by

computer–assisted texture–based image analysis,” Investigative Radi-

ology, vol. 32, no. 9, pp. 566–574, September 1997.

[4] C.-R. Shyu, C. E. Brodley, A. C. Kak, A. Kosaka, A. M. Aisen, and

L. S. Broderick, “ASSERT: A physician–in–the–loop content–based

retrieval system for HRCT image databases,” Computer Vision and

Image Understanding (special issue on content–based access for image

and video libraries), vol. 75, no. 1/2, pp. 111–132, July/August 1999.

[5] R. Uppaluri, E. A. Hoffman, M. Sonka, G. W. Hunninghake, and

G. McLennan, “Interstitial lung disease: A quantitative study using the

adaptive multiple feature method,” American Journal of Respiratory

and Critical Care Medicine, vol. 159, no. 2, pp. 519–525, February

1999.

[6] I. C. Sluimer, P. F. van Waes, M. A. Viergever, and B. van Ginneken,

“Computer–aided diagnosis in high resolution CT of the lungs,”

Medical Physics, vol. 30, no. 12, pp. 3081–3090, December 2003.

[Online]. Available: http://link.aip.org/link/?MPH/30/3081/1

[7] F. Chabat, G.-Z. Yang, and D. M. Hansell, “Obstructive lung

diseases: Texture classification for differentiation at CT,” Radiology,

vol. 228, no. 3, pp. 871–877, September 2003. [Online]. Available:

http://radiology.rsnajnls.org/cgi/content/abstract/228/3/871

[8] Y. Uchiyama, S. Katsuragawa, H. Abe, J. Shiraishi, F. Li, Q. Li, C.-T.

Zhang, K. Suzuki, and K. Doi, “Quantitative computerized analysis

of diffuse lung disease in high–resolution computed tomography,”

Medical Physics, vol. 30, no. 9, pp. 2440–2454, September 2003.

[9] T. Zrimec and J. S. J. Wong, “Improving computer aided disease

detection using knowledge of disease appearance,” in MEDINFO

2007. Proceedings of the 12th World Congress on Health (Medical)

Informatics, vol. 129. IOS Press, August 2007, pp. 1324–1328.

[10] A. Depeursinge, D. Sage, A. Hidki, A. Platon, P.-A. Poletti, M. Unser,

and H. M¨ uller, “Lung tissue classification using Wavelet frames,” in

Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th

Annual International Conference of the IEEE.

Computer Society, August 2007, pp. 6259–6262.

[11] N. Japkowicz and S. Stephen, “The class imbalance problem: A

systematic study,” Intelligent Data Analysis Journal, vol. 6, no. 5,

November 2002.

[12] M. Kubat and S. Matwin, “Addressing the curse of imbalanced training

sets: one–sided selection,” in Proceedings of the 14th International

Conference on Machine Learning.

179–186.

[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,

“Smote: Synthetic minority over-sampling technique,” Journal of Ar-

tificial Intelligence Research, vol. 16, pp. 321–357, 2002.

[14] G.Cohen,M.Hilario,

A. Geissbuhler, “Learning from imbalanced data in surveillance

ofnosocomialinfection,”

Artificial

vol.37,no.1,pp.7–18,

http://dx.doi.org/10.1016/j.artmed.2005.03.002

[15] H. Han, W. Wang, and B. Mao, “Borderline-smote: A new over-

sampling method in imbalanced data sets learning,” in International

Conference on Intelligent Computing (ICIC), 2005, pp. 878–887.

[16] J. V. Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental

perspectives on learning from imbalanced data,” in ICML, 2007, pp.

935–942.

[17] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the sensi-

tivity of support vector machines,” in Proceedings of the International

Joint Conference on AI, 1999, pp. 55–60.

Lyon, France: IEEE

Morgan Kaufmann, 1997, pp.

H.Sax,S.Hugonnet, and

Intelligence

2006.

in Medicine,

Available:May[Online].

[18] G. Wu and E. Y. Chang, “Class-boundary alignment for imbalanced

dataset learning,” in In ICML 2003 Workshop on Learning from

Imbalanced Data Sets, 2003, pp. 49–56.

[19] Y.-J. Lee and O. L. Mangasarian, “Ssvm: A smooth support vector ma-

chine for classification,” Computational optimization and applications,

vol. 20, no. 1, pp. 5–22, 2001.

[20] O. Chapelle, “Training a support vector machine in the primal,” Neural

Computation, vol. 19, pp. 1155–1178, 2007.

[21] V. Vapnik, Statistical learning theory.

[22] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing

multiple parameters for support vector machines,” Machine Learning,

vol. 46, no. 1, pp. 131–159, 2002.

[23] K. Morik, M. Imhoff, P. Brockhausen, T. Joachims, and U. Gather,

“Knowledge discovery and knowledge validation in intensive care,”

Artificial Intelligence in Medicine, vol. 19, no. 3, pp. 225–249, 2000.

[24] K.-M. Chung, W.-C. Kao, C.-L. Sun, L.-L. Wang, and C.-J. Lin,

“Radius margin bounds for support vector machines with the rbf

kernel,” Neural Computation, vol. 15, no. 11, pp. 2643–2681, 2003.

[25] M. Unser, “Texture classification and segmentation using wavelet

frames,” IEEE Transactions on Image Processing, vol. 4, no. 11, pp.

1549–1560, November 1995.

[26] A. Depeursinge, D. Van De Ville, M. Unser, and H. M¨ uller, “Lung

tissue analysis using isotropic polyharmonic B–spline wavelets,” in

MICCAI 2008 Workshop on Pulmonary Image Analysis, New York,

USA, September 2008, pp. 125–134.

[27] D. Van De Ville, T. Blu, and M. Unser, “Isotropic polyharmonic

B–Splines: Scaling functions and wavelets,” IEEE Transactions on

Image Processing, vol. 14, no. 11, pp. 1798–1813, November 2005.

[28] G. Rtsch, T. Onoda, K.-R. Mller, and T. O. Gmd, “Soft margins for

adaboost,” Journal of Machine Learning, vol. 42, no. 3, pp. 287–320,

1998.

[29] M. Dundar, G. Fung, L. Bogoni, M. Macari, A. Megibow,

andB.Rao, “Amethodology

a CAD system and potential pitfalls,” International Congress

Series,vol.1268,pp.1010–1014,

– Computer Assisted Radiology and Surgery. Proceedings of

the 18th International Congress and Exhibition. [Online]. Avail-

able: http://www.sciencedirect.com/science/article/B7581-4CHRSVD-

6S/2/06d1476fa7e0028d30aa5db70037f836

[30] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “More

efficiency in multiple kernel learning,” in ICML, 2007, pp. 775–782.

Wiley, New York, NY, 1998.

for trainingandvalidating

June2004, cARS2004