Content uploaded by Yudong Zhang
Author content
All content in this area was uploaded by Yudong Zhang on Sep 30, 2020
Content may be subject to copyright.
An MR Brain Images Classifier via Principal
Component Analysis and Kernel Support Vector
Machine
Yudong Zhang, Lenan Wu
School of Information Science and Engineering, Southeast University, Nanjing China
Email: zhangyudongnuaa@gmail.com, wuln@seu.edu.cn
Abstract: Automated and accurate classification of MR brain images is extremely important for
medical analysis and interpretation. Over the last decade numerous methods have already been
proposed. In this paper, we presented a novel method to classify a given MR brain image as normal or
abnormal. The proposed method first employed wavelet transform to extract features from images,
followed by applying principle component analysis (PCA) to reduce the dimensions of features. The
reduced features were submitted to a kernel support vector machine (KSVM). The strategy of K-fold
stratified cross validation was used to enhance generalization of KSVM. We chose seven common
brain diseases (glioma, meningioma, Alzheimer’s disease, Alzheimer’s disease plus visual agnosia,
Pick’s disease, sarcoma, and Huntington’s disease) as abnormal brains, and collected 160 MR brain
images (20 normal and 140 abnormal) from Harvard Medical School website. We performed our
proposed methods with four different kernels, and found that the GRB kernel achieves the highest
classification accuracy as 99.38%. The LIN, HPOL, and IPOL kernel achieves 95%, 96.88%, and
98.12%, respectively. We also compared our method to those from literatures in the last decade, and the
results showed our DWT+PCA+KSVM with GRB kernel still achieved the best accurate classification
results. The average processing time for a 256x256 size image on a laptop of P4 IBM with 3GHz
processor and 2GB RAM is 0.0448s. From the experimental data, our method was effective and rapid.
It could be applied to the field of MR brain image classification and can assist the doctors to diagnose
where a patient is normal or abnormal to certain degrees.
Keyword: Magnetic Resonance Imaging; Digital Wavelet Transform; Principle Component Analysis;
Kernel Support Vector Machine; Classification
1 Introduction
Magnetic resonance imaging (MRI) is an imaging technique that produces high quality images of
the anatomical structures of the human body, especially in the brain, and provides rich information for
clinical diagnosis and biomedical research [1-5]. The diagnostic values of MRI are greatly magnified
by the automated and accurate classification of the MRI images [6-8].
Wavelet transform is an effective tool for feature extraction from MR brain images, because it
allows analysis of images at various levels of resolution due to its multi-resolution analytic property.
However, this technique requires large storage and is computationally expensive [9]. In order to reduce
the feature vector dimensions and increase the discriminative power, the principal component analysis
(PCA) was used [10]. PCA is appealing since it effectively reduces the dimensionality of the data and
therefore reduces the computational cost of analyzing new data [11]. Then, the problem of how to
classify on the input data arises.
In recent years, researchers have proposed a lot of approaches for this goal, which fall into two
categories. One category is supervised classification, including support vector machine (SVM) [12] and
k-nearest neighbors (k-NN) [13]. The other category is unsupervised classification [14], including
self-organization feature map (SOFM) [12] and fuzzy c-means [15]. While all these methods achieved
good results, and yet the supervised classifier performs better than unsupervised classifier in terms of
classification accuracy (success classification rate). However, the classification accuracies of most
existing methods were lower than 95%, so the goal of this paper is to find a more accurate method.
Among supervised classification methods, the SVMs are state-of-the-art classification methods
based on machine learning theory [16-18]. Compared with other methods such as artificial neural
network, decision tree, and Bayesian network, SVMs have significant advantages of high accuracy,
elegant mathematical tractability, and direct geometric interpretation. Besides, it does not need a large
number of training samples to avoid overfitting [19].
Original SVMs are linear classifiers. In this paper, we introduced the kernel SVMs (KSVMs),
which extends original linear SVMs to nonlinear SVM classifiers by applying the kernel function to
replace the dot product form in the original SVMs [20]. The KSVMs allow us to fit the
maximum-margin hyperplane in a transformed feature space. The transformation may be nonlinear and
the transformed space high dimensional; thus though the classifier is a hyperplane in the
high-dimensional feature space, it may be nonlinear in the original input space [21].
The structure of the rest of this paper is organized as follows. Next section 2 gives the detailed
procedures of preprocessing, including the discrete wavelet transform (DWT) and principle component
analysis (PCA). Section 3 first introduces the motivation and principles of linear SVM, and then turns
to the kernel SVM. Section 4 introduces the K-fold cross validation, protecting the classifier from
overfitting. Experiments in section 5 use totally 160 images as the dataset, showing the results of
feature extraction and reduction. Afterwards, we compare our method with different kernels to the
latest methods in the decade. Final section 6 is devoted to conclusions and discussions.
2 Preprocessing
In total, our method consists of three stages:
Step 1. Preprocessing (including feature extraction and feature reduction);
Step 2. Training the kernel SVM;
Step 3. Submit new MRI brains to the trained kernel SVM, and output the prediction.
As shown in Fig. 1, this flowchart is a canonical and standard classification method which has already
been proven as the best classification method [22]. We will explain the detailed procedures of the
preprocessing in the following subsections.
Fig. 1 Methodology of our proposed algorithm
2.1 Feature Extraction
The most conventional tool of signal analysis is Fourier transform (FT), which breaks down a
time domain signal into constituent sinusoids of different frequencies, thus, transforming the signal
from time domain to frequency domain. However, FT has a serious drawback as discarding the time
information of the signal. For example, analyst can not tell when a particular event took place from a
Fourier spectrum. Thus, the quality of the classification decreases as time information is lost..
Gabor adapted the FT to analyze only a small section of the signal at a time. The technique is
called windowing or short time Fourier transform (STFT) [23]. It adds a window of particular shape to
the signal. STFT can be regarded as a compromise between the time information and frequency
information. It provides some information about both time and frequency domain. However, the
precision of the information is limited by the size of the window.
Wavelet transform (WT) represents the next logical step: a windowing technique with variable
size. Thus, it preserves both time and frequency information of the signal. The development of signal
analysis is shown in Fig. 2.
MRI
Brains Feature
Extraction Feature
Reduction
Kernel
SVM
DWT PCA
Training
Preprocessing
New MRI
Brain
Normal or
Abnormal
Output
Fig. 2 The development of signal analysis
Another advantage of WT is that it adopts “scale” instead of traditional “frequency”, namely, it
does not produce a time-frequency view but a time-scale view of the signal. The time-scale view is a
different way to view data, but it is a more natural and powerful way, because compared to “frequency”,
“scale” is commonly used in daily life. Meanwhile, “in large/small scale” is easily understood than “in
high/low frequency”.
2.2 Discrete wavelet transform
The discrete wavelet transform (DWT) is a powerful implementation of the WT using the dyadic
scales and positions [24]. The fundamentals of DWT are introduced as follows. Suppose x(t) is a
square-integrable function, then the continuous WT of x(t) relative to a given wavelet ψ(t) is defined as
,
( , ) ( ) ( )
ab
W a b x t t dt
−
=
(0)
where
,1
( ) ( )
ab ta
tb
a
−
=
(0)
Here, the wavelet ψa,b(t) is calculated from the mother wavelet ψ(t) by translation and dilation: a is the
dilation factor and b is the translation parameter (both real positive numbers). There are several
different kinds of wavelets which have gained popularity throughout the development of wavelet
analysis. The most important wavelet is the Harr wavelet, which is the simplest one and often the
preferred wavelet in a lot of applications [25-27].
Eq. (0) can be discretized by restraining a and b to a discrete lattice (a=2b & a>0) to give the
DWT, which can be expressed as follows.
*
,
*
,
( ) [ ( ) ( 2 )]
( ) [ ( ) ( 2 )]
j
j k j
n
j
j k j
n
ca n DS x n g n k
cd n DS x n h n k
=−
=−
(0)
Here caj,k and cdj,k refer to the coefficients of the approximation components and the detail components,
respectively. g(n) and h(n) denote for the low-pass filter and high-pass filter, respectively. j and k
represent the wavelet scale and translation factors, respectively. DS operator means the downsampling.
Equation (0) is the fundamental of wavelet decomposes. It decomposes signal x(n) into two signals, the
approximation coefficients ca(n) and the detail components cd(n). This procedure is called one-level
decompose.
Fig. 3 A 3-level wavelet decomposition tree
Fourier
Transform
Short Time
Fourier
Transform
Wavelet
Transform
Amplitude
Frequency
Frequency
Time
Scale
Time
S
ca1cd1
ca2cd2
ca3cd3
The above decomposition process can be iterated with successive approximations being
decomposed in turn, so that one signal is broken down into various levels of resolution. The whole
process is called wavelet decomposition tree, shown in Fig. 3.
2.3 2D DWT
Fig. 4 Schematic diagram of 2D DWT
In case of 2D images, the DWT is applied to each dimension separately. Fig. 4 illustrates the
schematic diagram of 2D DWT. As a result, there are 4 sub-band (LL, LH, HH, and HL) images at each
scale. The sub-band LL is used for next 2D DWT.
The LL subband can be regarded as the approximation component of the image, while the LH, HL,
and HH subbands can be regarded as the detailed components of the image. As the level of
decomposition increased, compacter but coarser approximation component was obtained. Thus,
wavelets provide a simple hierarchical framework for interpreting the image information. In our
algorithm, level-3 decomposition via Harr wavelet was utilized to extract features.
The border distortion is a technique issue related to digital filter which is commonly used in the
DWT. As we filter the image, the mask will extend beyond the image at the edges, so the solution is to
pad the pixels outside the images. In our algorithm, symmetric padding method [28] was utilized to
calculate the boundary value.
2.4 Feature Reduction
Excessive features increase computation times and storage memory. Furthermore, they sometimes
make classification more complicated, which is called the curse of dimensionality. It is required to
reduce the number of features.
PCA is an efficient tool to reduce the dimension of a data set consisting of a large number of
interrelated variables while retaining most of the variations. It is achieved by transforming the data set
to a new set of ordered variables according to their variances or importance. This technique has three
effects: it orthogonalizes the components of the input vectors so that uncorrelated with each other, it
orders the resulting orthogonal components so that those with the largest variation come first, and
eliminates those components contributing the least to the variation in the data set.
It should be noted that the input vectors be normalized to have zero mean and unity variance
before performing PCA. The normalization is a standard procedure. Details about PCA could be seen in
Ref. [10].
3 Kernel SVM
The introduction of support vector machine (SVM) is a landmark of the field in machine learning.
The advantages of SVMs include high accuracy, elegant mathematical tractability, and direct geometric
interpretation [29]. Recently, multiple improved SVMs have grown rapidly, among which the kernel
SVMs are the most popular and effective. Kernel SVMs have the following advantages [30]: (1) work
very well in practice and have been remarkably successful in such diverse fields as natural language
categorization, bioinformatics and computer vision; (2) have few tunable parameters; and (3) training
often involves convex quadratic optimization [31]. Hence, solutions are global and usually unique, thus
avoiding the convergence to local minima exhibited by other statistical learning systems, such as neural
networks.
Image
g(n)
h(n)
↓
↓
g(n)
h(n)
↓
↓
g(n)
h(n)
↓
↓
LL
LH
HL
HH
Subband
3.1 Motivation
Suppose some prescribed data points each belong to one of two classes, and the goal is to classify
which class a new data point will be located in. Here a data point is viewed as a p-dimensional vector,
and our task is to create a (p-1)-dimensional hyperplane. There are many possible hyperplanes that
might classify the data successfully. One reasonable choice as the best hyperplane is the one that
represents the largest separation, or margin, between the two classes, since we could expect better
behavior in response to unseen data during training, i.e. better generalization performance. Therefore,
we choose the hyperplane so that the distance from it to the nearest data point on each side is
maximized [32]. Fig. 5 shows the geometric interpolation of linear SVMs, here H1, H2, H3 are three
hyperplanes which can classify the two classes successfully, however, H2 and H3 does not have the
largest margin, so they will not perform well to new test data. The H1 has the maximum margin to the
support vectors (S11, S12, S13, S21, S22, and S23), so it is chosen as the best classification hyperplane
[33].
Fig. 5 The geometric interpolation of linear SVMs (H denotes for the hyperplane, S denotes for the
support vector)
3.2 Principles of Linear SVMs
Given a p-dimensional N-size training dataset of the form
( , ) | , { 1, 1} , 1,...,
p
n n n n
x y x R y n N − + =
(0)
where yn is either -1 or 1 corresponds to the class 1 or 2. Each xn is a p-dimensional vector. The
maximum-margin hyperplane which divides class 1 from class 2 is the support vector machine we want.
Considering that any hyperplane can be written in the form of
0b−=wx
(0)
where denotes the dot product and W the normal vector to the hyperplane. We want to choose the
W and b to maximize the margin between the two parallel (as shown in Fig. 6) hyperplanes as large as
possible while still separating the data. So we define the two parallel hyperplanes by the equations as
1b− = wx
(0)
H1
H3H2
Class1Class2
S11
S12
S13
S21
S22
S23
Maximum Margin
Fig. 6 The concept of parallel hyperplanes (w denotes the weight, and b denotes the bias).
Therefore, the task can be transformed to an optimization problem. That is, we want to maximize
the distance between the two parallel hyperplanes, subject to prevent data falling into the margin. Using
simple mathematical knowledge, the problem can be formulated as
( )
,
min
. . 1, 1,...,
b
nn
st y x b n N− =
ww
w
(0)
In practical situations the ||w|| is usually be replace by
( )
2
,
1
min 2
. . 1, 1,...,
b
nn
st y x b n N− =
ww
w
(0)
The reason leans upon the fact that ||w|| is involved in a square root calculation. After it is superseded
with formula (0), the solution will not change, but the problem is altered into a quadratic programming
optimization that is easy to solve by using Lagrange multipliers [34] and standard quadratic
programming techniques and programs [35, 36].
3.3 Kernel SVMs
Traditional SMVs constructed a hyperplane to classify data, so they cannot deal with
classification problem of which the different types of data located at different sides of a
hypersurface, the kernel strategy is applied to SVMs [37]. The resulting algorithm is formally similar,
except that every dot product is replaced by a nonlinear kernel function. The kernel is related to the
transform φ(xi) by the equation k(xi, xj) = φ(xi) φ(xj). The value w is also in the transformed space,
with w = Σi αi yi φ(xi). Dot products with w for classification can be computed by w·φ(x)= Σi αi yi
k(xi, x).
In another point of view, the KSVMs allow to fit the maximum-margin hyperplane in a
transformed feature space. The transformation may be nonlinear and the transformed space higher
dimensional; thus though the classifier is a hyperplane in the higher-dimensional feature space, it may
be nonlinear in the original input space. Three common kernels [38] are listed in Tab. 1. For each
kernel, there should be at least one adjusting parameter so as to make the kernel flexible and tailor itself
to practical data.
Tab. 1 Three Common Kernels (HPOL, IPOL, and GRB) with their formula and parameters
Name
Formula
Parameter
Homogeneous Polynomial (HPOL)
( , ) ( )d
i j i j
k x x x x=
d
Inhomogeneous Polynomial (IPOL)
( , ) ( 1)d
i j i j
k x x x x=+
d
Gaussian Radial Basis (GRB)
( )
2
( , ) exp || ||
i j i j
k x x x x
= − −
γ
wx-b = -1
wx-b = 1
wx-b = 0
4 K-fold Stratified Cross Validation
Fig. 7 A 5-fold Cross Validation
Since the classifier is trained by a given dataset, so it may achieve high classification accuracy
only for this training dataset not yet other independent datasets. To avoid this overfitting, we need to
integrate cross validation into our method. Cross validation will not increase the final classification
accuracy, but it will make the classifier reliable and can be generalized to other independent datasets.
Cross validation methods consist of three types: Random subsampling, K-fold cross validation,
and leave-one-out validation. The K-fold cross validation is applied due to its properties as simple, easy,
and using all data for training and validation. The mechanism is to create a K-fold partition of the
whole dataset, repeat K times to use K-1 folds for training and a left fold for validation, and finally
average the error rates of K experiments. The schematic diagram of 5-fold cross validation is shown in
Fig. 7.
The K folds can be purely randomly partitioned, however, some folds may have a quite different
distributions from other folds. Therefore, stratified K-fold cross validation was employed, where
every fold has nearly the same class distributions [39]. Another challenge is to determine the number of
folds. If K is set too large, the bias of the true error rate estimator will be small, but the variance of the
estimator will be large and the computation will be time-consuming. Alternatively, if K is set too small,
the computation time will decrease, the variance of the estimator will be small, but the bias of the
estimator will be large [40]. In this study, we empirically determined K as 5 through the trial-and-error
method. That means, we suppose parameter K varies from 3 to 10 with increasing step as 1, and then
we trained the SVM by each value. Finally we select the optimal K value corresponding to the highest
classification accuracy.
5 Experiments and discussions
The experiments were carried out on the platform of P4 IBM with 3GHz processor and 2GB
RAM, running under Windows XP operating system. The algorithm was in-house developed via the
wavelet toolbox, the biostatistical toolbox of Matlab 2011b (The Mathworks ©). We downloaded the
open SVM toolbox, extended it to Kernel SVM, and applied it to the MR brain images classification.
The programs can be run or tested on any computer platforms where Matlab is available.
5.1 Database
The datasets consists of T2-weighted MR brain images in axial plane and 256×256 in-plane
resolution, which were downloaded from the website of Harvard Medical School (URL:
http://med.harvard.edu/AANLIB/), OASIS dataset (URL: http://www.oasis-brains.org/), and ADNI
dataset (URL: http://adni.loni.ucla.edu/)We choose T2 model since T2 images are of higher-contrast
and clearer vision compared to T1 and PET modalities.
The abnormal brain MR images of the dataset consist of the following diseases: glioma,
meningioma, Alzheimer’s disease, Alzheimer’s disease plus visual agnosia, Pick’s disease, sarcoma,
and Huntington’s disease. The samples of each disease are illustrated in Fig. 8.
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Training
Validation
Total Number of Dataset
Fig. 8 Sample of brain MRIs: (a) normal brain; (b) glioma; (c) meningioma (d) Alzheimer’s disease; (e)
Alzheimer’s disease with visual agnosia; (f) Pick’s disease; (g) sarcoma; (h) Huntington’s disease.
We randomly selected 20 images for each type of brain. Since there are one type of normal brain
and seven types of abnormal brain in the dataset, 160 images was selected consisting of 20 normal and
140 (= 7 types of diseases × 20 images/diseases) abnormal brain images. The setting of the training
images and validation images is shown in Tab.2 since 5-fold cross validation was used.
Tab.2 Setting of Training and Validation Images (5-fold Stratified Cross Validation)
Total No.
of images
Training (128)
Validation (32)
Normal
Abnormal
Normal
Abnormal
160
16
112
4
28
5.2 Feature extraction
Fig. 9 The procedures of 3-level 2D DWT: (a) normal brain MRI; (b) level-3 wavelet coefficients
The three levels of wavelet decomposition greatly reduce the input image size as shown in Fig. 9.
The top left corner of the wavelet coefficients image denotes the approximation coefficients of level-3,
whose size is only 32×32 = 1024.
5.3 Feature Reduction
Fig. 10 Variances against No. of principle components (x axis is log scale)
As stated above, the number of extracted features were reduced from 65536 to 1024. However, it
is still too large for calculation. Thus, PCA is used to further reduce the dimensions of features to a
100101102103
0.5
0.6
0.7
0.8
0.9
1
No. of Principle Component
Variances(%)
higher degree. The curve of cumulative sum of variance versus the number of principle components is
shown in Fig. 10.
The variances versus the number of principle components from 1 to 20 are listed in Tab.3. It
shows that only 19 principle components (bold font in table), which are only 1.86% of the original
features, could preserve 95.4% of total variance.
Tab.3 Detailed data of PCA
No. of Prin. Comp.
1
2
3
4
5
6
7
8
9
10
Variance (%)
42.3
55.6
62.4
68.1
72.3
76.2
79.3
82.1
84.0
85.6
No. of Prin. Comp.
11
12
13
14
15
16
17
18
19
20
Variance (%)
87.3
88.6
89.8
91.0
92.0
93.0
93.9
94.6
95.4
96.1
5.4 Classification Accuracy
We tested four SVMs with different kernels (LIN, HPOL, IPOL, and GRB). In the case of using
linear kernel, the KSVM degrades to original linear SVM.
We computed hundreds of simulations in order to estimate the optimal parameters of the kernel
functions, such as the order d in HPOL and IPOL kernel, and the scaling factor γ in GRB kernel. The
confusion matrices of our methods are listed in Tab.4. The element of ith row and jth column represents
the classification accuracy belonging to class i are assigned to class j after the supervised classification.
Tab.4 Confusion matrix of our DWT+PCA+KSVM method (Kernel chose LIN, HPOL, IPOL, and
GRB)
LIN
Normal (O)
Abnormal (O)
Normal (T)
17
3
Abnormal (T)
5
135
HPOL
Normal (O)
Abnormal (O)
Normal (T)
19
1
Abnormal (T)
4
136
IPOL
Normal (O)
Abnormal (O)
Normal (T)
18
2
Abnormal (T)
1
139
GRB
Normal (O)
Abnormal (O)
Normal (T)
20
0
Abnormal (T)
1
139
(O denotes for output, T denotes for Target)
The results showed that the proposed DWT+PCA+KSVM method obtains quite excellent results
on both training and validation images. For LIN kernel, the whole classification accuracy was
(17+135)/160 = 95%; for HPOL kernel, was (19+136)/160 = 96.88%; for IPOL kernel, was
(18+139)/160 = 98.12%; and for the GRB kernel, was (20+139)/160 = 99.38%. Obviously, the GRB
kernel SVM outperformed the other three kernel SVMs.
Moreover, we compared our method with six popular methods (DWT+SOM [12], DWT+SVM
with linear kernel [12], DWT+SVM with RBF based kernel [12], DWT+PCA+ANN [41],
DWT+PCA+kNN [41], and DWT+PCA+ACPSO+FNN [25]) described in the recent literature using
the same MRI dataset and the same number of images. The comparison results were shown in
Tab.5. It indicates that our proposed method DWT+PCA+KSVM with GRB kernel performed best
among the 10 methods, achieving the best classification accuracy as 99.38%. The next is
DWT+PCA+ACPSO+FNN method [25] with 98.75% classification accuracy. The third is our proposed
DWT+PCA+KSVM with IPOL kernel with 98.12% classification accuracy.
Tab.5 Classification Accuracy comparison of 10 different algorithms for the same MRI dataset and
same number of images.
Approach from literatures
Classification Accuracy (%)
DWT+SOM [12]
94
DWT+SVM with linear kernel [12]
96
DWT+SVM with RBF based kernel [12]
98
DWT+PCA+ANN [41]
97
DWT+PCA+kNN [41]
98
DWT+PCA+ACPSO+FNN [25]
98.75
Approach from this paper
Classification Accuracy (%)
DWT+PCA+KSVM (LIN)
95%
DWT+PCA+KSVM (HPOL)
96.88%
DWT+PCA+KSVM (IPOL)
98.12%
DWT+PCA+KSVM (GRB)
99.38%
5.5 Time Analysis
Computation time is another important factor to evaluate the classifier. The time for SVM training
was not considered, since the parameters of the SVM keep unchanged after training. We sent all the
160 images into the classifier, recorded corresponding computation time, computed the average value,
depicted consumed time of different stages shown in Fig. 11.
Fig. 11 Computation times at different stages
For each 256x256 image, the averaged computation time on feature extraction, feature reduction,
and SVM classification is 0.023s, 0.0187s, and 0.0031s, respectively. The feature extraction stage is the
most time-consuming as 0.023s. The feature reduction costs 0.0187s. The SVM classification costs the
least time only 0.0031s.
The total computation time for each 256x256 size image is about 0.0448s, which is rapid enough
for a real time diagnosis.
6 Conclusions and Discussions
In this study we have developed a novel DWT+PCA+KSVM method to distinguish between
normal and abnormal MRIs of the brain. We picked up four different kernels as LIN, HPOL, IPOL
and GRB. The experiments demonstrate that the GRB kernel SVM obtained 99.38% classification
accuracy on the 160 MR images, higher than HPOL, IPOL and GRB kernels, and other popular
methods in recent literatures.
Future work should focus on the following four aspects: First, the proposed SVM based
method could be employed for MR images with other contrast mechanisms such as T1-weighted,
Proton Density weighted, and diffusion weighted images. Second, the computation time could be
accelerated by using advanced wavelet transforms such as the lift-up wavelet. Third, Multi-
classification, which focuses on specific disorders studied using brain MRI, can also be explored.
Forth, novel kernels will be tested to increase the classification accuracy.
The DWT can efficiently extract the information from original MR images with little loss.
The advantage of DWT over Fourier Transforms is the spatial resolution, viz., DWT captures both
frequency and location information. In this study we choose the Harr wavelet, although there are
other outstanding wavelets such as Daubechies series. We will compare the performance of
different families of wavelet in future work. Another research direction lies in the stationary
wavelet transform and the wavelet packet transform.
The importance of PCA was demonstrated in the discussion section. If we omitted the PCA
procedures, we meet a huge search space (as shown in Fig. 10 and Tab.3, PCA reduced the 1024
dimensional search space to 19 dimensional search space) which will cause heavy computation
burden and worsened classification accuracy. There are some other excellent feature
Feature Extraction Feature Reduction SVM Classification
0
0.005
0.01
0.015
0.02
0.025
Processing steps
Averaged Computation Time (s)
transformation methods such as ICA, manifold learning. In the future, we will focus on
investigating the performance of these algorithms.
The proposed DWT+PCA+KSVM with GRB kernel method shows superiority to the LIN,
HPOL, and IPOL kernels SVMs. The reason is the GRB kernel takes the form of exponential
function, which can enlarge the distance between samples to the extent that HPOL can’t reach.
Therefore, we will apply the GRB kernel to other industrial fields.
There are two different schools of classification. One is while-box classification, such as the
decision-trees or rule-based models. The readers can extract reasonable rules from this kind of
classifiers. For example, a typical decision tree can be interpreted as “If age is less than 15, turn to
left node, and then if gender is male, then turn to right node, and …..”. Therefore, the white-box
classifiers make sense to patients.
Another school is black-box classification. That means the classifier is intuitionistic, so the
reader can’t extract reasonable rules even the kind of classifiers works better and gets higher
classification accuracy than the white-box classifiers. From another point of view, this kind of
classifiers is really designed by “artificial intelligence” or “computer intelligence”. The computer
constructed the classifier using its own intelligence not the human sense.
Our method belongs to the latter one. Our goal is to construct a universal classifier not
regarding to the age, gender, brain structure, focus of disease, and the like [42], but merely
centering on the classification accuracy and highly robustness. This kind of classifier may need
further improvements since the patients may need convincing and irrefutable proof to accept the
diagnosis of their diseases.
There are literatures describing wavelet transforms, PCA, and kernel SVMs. The most
important contribution of this paper is to propose a method which combines them as a powerful
tool for identifying normal MR brain from abnormal MR brain. Meanwhile, we tested four kernels,
and find GRB kernel as the most successful kernel. This technique of brain MRI classification
based on PCA and KSVM is a potentially valuable tool to be used in computer assisted clinical
diagnosis.
References
[1] Zhang, Y., L. Wu, and S. Wang, "Magnetic Resonance Brain Image Classification by an
Improved Artificial Bee Colony Algorithm," Progress in Electromagnetics Research, Vol. 116,
No. pp. 65-79, 2011.
[2] Mohsin, S. A., N. M. Sheikh, and U. Saeed, "MRI Induced Heating of Deep Brain Stimulation
Leads: Effect of the Air-Tissue Interface," Progress In Electromagnetics Research, Vol. 83, No.
pp. 81-91, 2008.
[3] Golestanirad, L., et al., "Effect of Realistic Modeling of Deep Brain Stimulation on the
Prediction of Volume of Activated Tissue," Progress In Electromagnetics Research, Vol. 126,
No. pp. 1-16, 2012.
[4] Mohsin, S. A., "Concentration of the Specific Absorption Rate Around Deep Brain Stimulation
Electrodes During MRI," Progress In Electromagnetics Research, Vol. 121, No. pp. 469-484,
2011.
[5] Oikonomou, A., I. S. Karanasiou, and N. K. Uzunoglu, "Phased-Array Near Field Radiometry
for Brain Intracranial Applications," Progress In Electromagnetics Research, Vol. 109, No. pp.
345-360, 2010.
[6] Scapaticci, R., et al., "A Feasibility Study on Microwave Imaging for Brain Stroke
Monitoring," Progress In Electromagnetics Research B, Vol. 40, No. pp. 305-324, 2012.
[7] Asimakis, N. P., et al., "Theoretical Analysis of a Passive Acoustic Brain Monitoring System,"
Progress In Electromagnetics Research B, Vol. 23, No. pp. 165-180, 2010.
[8] Chaturvedi, C. M., et al., "2.45 GHz (Cw) Microwave Irradiation Alters Circadian
Organization, Spatial Memory, Dna Structure in the Brain Cells and Blood Cell Counts of
Male Mice, Mus Musculus," Progress In Electromagnetics Research B, Vol. 29, No. pp. 23-42,
2011.
[9] Emin Tagluk, M., M. Akin, and N. Sezgin, "ClassIfIcation of sleep apnea by using wavelet
transform and artificial neural networks," Expert Systems with Applications, Vol. 37, No. 2, pp.
1600-1607, 2010.
[10] Zhang, Y., L. Wu, and G. Wei, "A New Classifier for Polarimetric SAR Images," Progress in
Electromagnetics Research, Vol. 94, No. pp. 83-104, 2009.
[11] Camacho, J., J. Picó, and A. Ferrer, "Corrigendum to "The best approaches in the on-line
monitoring of batch processes based on PCA: Does the modelling structure matter?" [Anal.
Chim. Acta Volume 642 (2009) 59-68]," Analytica Chimica Acta, Vol. 658, No. 1, pp. 106-106,
2010.
[12] Chaplot, S., L. M. Patnaik, and N. R. Jagannathan, "Classification of magnetic resonance brain
images using wavelets as input to support vector machine and neural network," Biomedical
Signal Processing and Control, Vol. 1, No. 1, pp. 86-92, 2006.
[13] Cocosco, C. A., A. P. Zijdenbos, and A. C. Evans, "A fully automatic and robust brain MRI
tissue classification method," Medical Image Analysis, Vol. 7, No. 4, pp. 513-527, 2003.
[14] Zhang, Y. and L. Wu, "Weights optimization of neural network via improved BCO approach,"
Progress in Electromagnetics Research, Vol. 83, No. pp. 185-198, 2008.
[15] Yeh, J.-Y. and J. C. Fu, "A hierarchical genetic algorithm for segmentation of multi-spectral
human-brain MRI," Expert Systems with Applications, Vol. 34, No. 2, pp. 1285-1295, 2008.
[16] Patil, N. S., et al., "Regression Models Using Pattern Search Assisted Least Square Support
Vector Machines," Chemical Engineering Research and Design, Vol. 83, No. 8, pp. 1030-1037,
2005.
[17] Wang, F.-F. and Y.-R. Zhang, "The Support Vector Machine for Dielectric Target Detection
Through a Wall," Progress In Electromagnetics Research Letters, Vol. 23, No. pp. 119-128,
2011.
[18] Xu, Y., et al., "An Support Vector Regression Based Nonlinear Modeling Method for Sic
Mesfet," Progress In Electromagnetics Research Letters, Vol. 2, No. pp. 103-114, 2008.
[19] Li, D., W. Yang, and S. Wang, "Classification of foreign fibers in cotton lint using machine
vision and multi-class support vector machine," Computers and Electronics in Agriculture, Vol.
74, No. 2, pp. 274-279, 2010.
[20] Gomes, T. A. F., et al., "Combining meta-learning and search techniques to select parameters
for support vector machines," Neurocomputing, Vol. 75, No. 1, pp. 3-13, 2012.
[21] Hable, R., "Asymptotic normality of support vector machine variants and other regularized
kernel methods," Journal of Multivariate Analysis, Vol. 106, No. 0, pp. 92-117, 2012.
[22] Ghosh, A., B. Uma Shankar, and S. K. Meher, "A novel approach to neuro-fuzzy
classification," Neural Networks, Vol. 22, No. 1, pp. 100-109, 2009.
[23] Durak, L., "Shift-invariance of short-time Fourier transform in fractional Fourier domains,"
Journal of the Franklin Institute, Vol. 346, No. 2, pp. 136-146, 2009.
[24] Zhang, Y. and L. Wu, "Crop Classification by forward neural network with adaptive chaotic
particle swarm optimization," Sensors, Vol. 11, No. 5, pp. 4721-4743, 2011.
[25] Zhang, Y., S. Wang, and L. Wu, "A Novel Method for Magnetic Resonance Brain Image
Classification based on Adaptive Chaotic PSO," Progress in Electromagnetics Research, Vol.
109, No. pp. 325-343, 2010.
[26] Ala, G., E. Francomano, and F. Viola, "A Wavelet Operator on the Interval in Solving
Maxwell's Equations," Progress In Electromagnetics Research Letters, Vol. 27, No. pp.
133-140, 2011.
[27] Iqbal, A. and V. Jeoti, "A Novel Wavelet-Galerkin Method for Modeling Radio Wave
Propagation in Tropospheric Ducts," Progress In Electromagnetics Research B, Vol. 36, No.
pp. 35-52, 2012.
[28] Messina, A., "Refinements of damage detection methods based on wavelet analysis of
dynamical shapes," International Journal of Solids and Structures, Vol. 45, No. 14–15, pp.
4068-4097, 2008.
[29] Martiskainen, P., et al., "Cow behaviour pattern recognition using a three-dimensional
accelerometer and support vector machines," Applied Animal Behaviour Science, Vol. 119, No.
1–2, pp. 32-38, 2009.
[30] Bermejo, S., B. Monegal, and J. Cabestany, "Fish age categorization from otolith images using
multi-class support vector machines," Fisheries Research, Vol. 84, No. 2, pp. 247-253, 2007.
[31] Muniz, A. M. S., et al., "Comparison among probabilistic neural network, support vector
machine and logistic regression for evaluating the effect of subthalamic stimulation in
Parkinson disease on ground reaction force during gait," Journal of Biomechanics, Vol. 43, No.
4, pp. 720-726, 2010.
[32] Bishop, C. M., Pattern Recognition and Machine Learning (Information Science and
Statistics): Springer-Verlag New York, Inc., 2006.
[33] Vapnik, V., The nature of statistical learning theory: Springer-Verlag New York, Inc., 1995.
[34] Jeyakumar, V., J. H. Wang, and G. Li, "Lagrange multiplier characterizations of robust best
approximations under constraint data uncertainty," Journal of Mathematical Analysis and
Applications, Vol. 393, No. 1, pp. 285-297, 2012.
[35] Cucker, F. and S. Smale, "On the mathematical foundations of learning," Bulletin of the
American Mathematical Society, Vol. 39, No. pp. 1-49, 2002.
[36] Poggio, T. and S. Smale, "The Mathematics of Learning: Dealing with Data," Notices of the
American Mathematical Society (AMS), Vol. 50, No. 5, pp. 537-544, 2003.
[37] Acevedo-Rodríguez, J., et al., "Computational load reduction in decision functions using
support vector machines," Signal Processing, Vol. 89, No. 10, pp. 2066-2071, 2009.
[38] Deris, A. M., A. M. Zain, and R. Sallehuddin, "Overview of Support Vector Machine in
Modeling Machining Performances," Procedia Engineering, Vol. 24, No. 0, pp. 308-312, 2011.
[39] May, R. J., H. R. Maier, and G. C. Dandy, "Data splitting for artificial neural networks using
SOM-based stratified sampling," Neural Networks, Vol. 23, No. 2, pp. 283-294, 2010.
[40] Armand, S., et al., "Linking clinical measurements and kinematic gait patterns of toe-walking
using fuzzy decision trees," Gait & Posture, Vol. 25, No. 3, pp. 475-484, 2007.
[41] El-Dahshan, E.-S. A., T. Hosny, and A.-B. M. Salem, "Hybrid intelligent techniques for MRI
brain images classification," Digital Signal Processing, Vol. 20, No. 2, pp. 433-441, 2010.
[42] Evans, A. C., et al., "Brain templates and atlases," NeuroImage, Vol. 62, No. 2, pp. 911-922,
2012.