ArticlePDF Available

An Mr Brain Images Classifier via Principal Component Analysis and Kernel Support Vector Machine


Abstract and Figures

Automated and accurate classification of MR brain images is extremely important for medical analysis and interpretation. Over the last decade numerous methods have already been proposed. In this paper, we presented a novel method to classify a given MR brain image as normal or abnormal. The proposed method first employed wavelet transform to extract features from images, followed by applying principle component analysis (PCA) to reduce the dimensions of features. The reduced features were submitted to a kernel support vector machine (KSVM). The strategy of K-fold stratified cross validation was used to enhance generalization of KSVM. We chose seven common brain diseases (glioma, meningioma, Alzheimer's disease, Alzheimer's disease plus visual agnosia, Pick's disease, sarcoma, and Huntington's disease) as abnormal brains, and collected 160 MR brain images (20 normal and 140 abnormal) from Harvard Medical School website. We performed our proposed methods with four different kernels, and found that the GRB kernel achieves the highest classification accuracy as 99.38%. The LIN, HPOL, and IPOL kernel achieves 95%, 96.88%, and 98.12%, respectively. We also compared our method to those from literatures in the last decade, and the results showed our DWT+PCA+KSVM with GRB kernel still achieved the best accurate classification results. The averaged processing time for a 256 × 256 size image on a laptop of P4 IBM with 3 GHz processor and 2 GB RAM is 0.0448 s. From the experimental data, our method was effective and rapid. It could be applied to the field of MR brain image classification and can assist the doctors to diagnose where a patient is normal or abnormal to certain degrees.
Content may be subject to copyright.
An MR Brain Images Classifier via Principal
Component Analysis and Kernel Support Vector
Yudong Zhang, Lenan Wu
School of Information Science and Engineering, Southeast University, Nanjing China
Abstract: Automated and accurate classification of MR brain images is extremely important for
medical analysis and interpretation. Over the last decade numerous methods have already been
proposed. In this paper, we presented a novel method to classify a given MR brain image as normal or
abnormal. The proposed method first employed wavelet transform to extract features from images,
followed by applying principle component analysis (PCA) to reduce the dimensions of features. The
reduced features were submitted to a kernel support vector machine (KSVM). The strategy of K-fold
stratified cross validation was used to enhance generalization of KSVM. We chose seven common
brain diseases (glioma, meningioma, Alzheimer’s disease, Alzheimer’s disease plus visual agnosia,
Pick’s disease, sarcoma, and Huntington’s disease) as abnormal brains, and collected 160 MR brain
images (20 normal and 140 abnormal) from Harvard Medical School website. We performed our
proposed methods with four different kernels, and found that the GRB kernel achieves the highest
classification accuracy as 99.38%. The LIN, HPOL, and IPOL kernel achieves 95%, 96.88%, and
98.12%, respectively. We also compared our method to those from literatures in the last decade, and the
results showed our DWT+PCA+KSVM with GRB kernel still achieved the best accurate classification
results. The average processing time for a 256x256 size image on a laptop of P4 IBM with 3GHz
processor and 2GB RAM is 0.0448s. From the experimental data, our method was effective and rapid.
It could be applied to the field of MR brain image classification and can assist the doctors to diagnose
where a patient is normal or abnormal to certain degrees.
Keyword: Magnetic Resonance Imaging; Digital Wavelet Transform; Principle Component Analysis;
Kernel Support Vector Machine; Classification
1 Introduction
Magnetic resonance imaging (MRI) is an imaging technique that produces high quality images of
the anatomical structures of the human body, especially in the brain, and provides rich information for
clinical diagnosis and biomedical research [1-5]. The diagnostic values of MRI are greatly magnified
by the automated and accurate classification of the MRI images [6-8].
Wavelet transform is an effective tool for feature extraction from MR brain images, because it
allows analysis of images at various levels of resolution due to its multi-resolution analytic property.
However, this technique requires large storage and is computationally expensive [9]. In order to reduce
the feature vector dimensions and increase the discriminative power, the principal component analysis
(PCA) was used [10]. PCA is appealing since it effectively reduces the dimensionality of the data and
therefore reduces the computational cost of analyzing new data [11]. Then, the problem of how to
classify on the input data arises.
In recent years, researchers have proposed a lot of approaches for this goal, which fall into two
categories. One category is supervised classification, including support vector machine (SVM) [12] and
k-nearest neighbors (k-NN) [13]. The other category is unsupervised classification [14], including
self-organization feature map (SOFM) [12] and fuzzy c-means [15]. While all these methods achieved
good results, and yet the supervised classifier performs better than unsupervised classifier in terms of
classification accuracy (success classification rate). However, the classification accuracies of most
existing methods were lower than 95%, so the goal of this paper is to find a more accurate method.
Among supervised classification methods, the SVMs are state-of-the-art classification methods
based on machine learning theory [16-18]. Compared with other methods such as artificial neural
network, decision tree, and Bayesian network, SVMs have significant advantages of high accuracy,
elegant mathematical tractability, and direct geometric interpretation. Besides, it does not need a large
number of training samples to avoid overfitting [19].
Original SVMs are linear classifiers. In this paper, we introduced the kernel SVMs (KSVMs),
which extends original linear SVMs to nonlinear SVM classifiers by applying the kernel function to
replace the dot product form in the original SVMs [20]. The KSVMs allow us to fit the
maximum-margin hyperplane in a transformed feature space. The transformation may be nonlinear and
the transformed space high dimensional; thus though the classifier is a hyperplane in the
high-dimensional feature space, it may be nonlinear in the original input space [21].
The structure of the rest of this paper is organized as follows. Next section 2 gives the detailed
procedures of preprocessing, including the discrete wavelet transform (DWT) and principle component
analysis (PCA). Section 3 first introduces the motivation and principles of linear SVM, and then turns
to the kernel SVM. Section 4 introduces the K-fold cross validation, protecting the classifier from
overfitting. Experiments in section 5 use totally 160 images as the dataset, showing the results of
feature extraction and reduction. Afterwards, we compare our method with different kernels to the
latest methods in the decade. Final section 6 is devoted to conclusions and discussions.
2 Preprocessing
In total, our method consists of three stages:
Step 1. Preprocessing (including feature extraction and feature reduction);
Step 2. Training the kernel SVM;
Step 3. Submit new MRI brains to the trained kernel SVM, and output the prediction.
As shown in Fig. 1, this flowchart is a canonical and standard classification method which has already
been proven as the best classification method [22]. We will explain the detailed procedures of the
preprocessing in the following subsections.
Fig. 1 Methodology of our proposed algorithm
2.1 Feature Extraction
The most conventional tool of signal analysis is Fourier transform (FT), which breaks down a
time domain signal into constituent sinusoids of different frequencies, thus, transforming the signal
from time domain to frequency domain. However, FT has a serious drawback as discarding the time
information of the signal. For example, analyst can not tell when a particular event took place from a
Fourier spectrum. Thus, the quality of the classification decreases as time information is lost..
Gabor adapted the FT to analyze only a small section of the signal at a time. The technique is
called windowing or short time Fourier transform (STFT) [23]. It adds a window of particular shape to
the signal. STFT can be regarded as a compromise between the time information and frequency
information. It provides some information about both time and frequency domain. However, the
precision of the information is limited by the size of the window.
Wavelet transform (WT) represents the next logical step: a windowing technique with variable
size. Thus, it preserves both time and frequency information of the signal. The development of signal
analysis is shown in Fig. 2.
Brains Feature
Extraction Feature
Normal or
Fig. 2 The development of signal analysis
Another advantage of WT is that it adopts “scale” instead of traditional “frequency”, namely, it
does not produce a time-frequency view but a time-scale view of the signal. The time-scale view is a
different way to view data, but it is a more natural and powerful way, because compared to “frequency”,
“scale” is commonly used in daily life. Meanwhile, “in large/small scale” is easily understood than “in
high/low frequency”.
2.2 Discrete wavelet transform
The discrete wavelet transform (DWT) is a powerful implementation of the WT using the dyadic
scales and positions [24]. The fundamentals of DWT are introduced as follows. Suppose x(t) is a
square-integrable function, then the continuous WT of x(t) relative to a given wavelet ψ(t) is defined as
( , ) ( ) ( )
W a b x t t dt
( ) ( )
ab ta
Here, the wavelet ψa,b(t) is calculated from the mother wavelet ψ(t) by translation and dilation: a is the
dilation factor and b is the translation parameter (both real positive numbers). There are several
different kinds of wavelets which have gained popularity throughout the development of wavelet
analysis. The most important wavelet is the Harr wavelet, which is the simplest one and often the
preferred wavelet in a lot of applications [25-27].
Eq. (0) can be discretized by restraining a and b to a discrete lattice (a=2b & a>0) to give the
DWT, which can be expressed as follows.
( ) [ ( ) ( 2 )]
( ) [ ( ) ( 2 )]
j k j
j k j
ca n DS x n g n k
cd n DS x n h n k
Here caj,k and cdj,k refer to the coefficients of the approximation components and the detail components,
respectively. g(n) and h(n) denote for the low-pass filter and high-pass filter, respectively. j and k
represent the wavelet scale and translation factors, respectively. DS operator means the downsampling.
Equation (0) is the fundamental of wavelet decomposes. It decomposes signal x(n) into two signals, the
approximation coefficients ca(n) and the detail components cd(n). This procedure is called one-level
Fig. 3 A 3-level wavelet decomposition tree
Short Time
The above decomposition process can be iterated with successive approximations being
decomposed in turn, so that one signal is broken down into various levels of resolution. The whole
process is called wavelet decomposition tree, shown in Fig. 3.
2.3 2D DWT
Fig. 4 Schematic diagram of 2D DWT
In case of 2D images, the DWT is applied to each dimension separately. Fig. 4 illustrates the
schematic diagram of 2D DWT. As a result, there are 4 sub-band (LL, LH, HH, and HL) images at each
scale. The sub-band LL is used for next 2D DWT.
The LL subband can be regarded as the approximation component of the image, while the LH, HL,
and HH subbands can be regarded as the detailed components of the image. As the level of
decomposition increased, compacter but coarser approximation component was obtained. Thus,
wavelets provide a simple hierarchical framework for interpreting the image information. In our
algorithm, level-3 decomposition via Harr wavelet was utilized to extract features.
The border distortion is a technique issue related to digital filter which is commonly used in the
DWT. As we filter the image, the mask will extend beyond the image at the edges, so the solution is to
pad the pixels outside the images. In our algorithm, symmetric padding method [28] was utilized to
calculate the boundary value.
2.4 Feature Reduction
Excessive features increase computation times and storage memory. Furthermore, they sometimes
make classification more complicated, which is called the curse of dimensionality. It is required to
reduce the number of features.
PCA is an efficient tool to reduce the dimension of a data set consisting of a large number of
interrelated variables while retaining most of the variations. It is achieved by transforming the data set
to a new set of ordered variables according to their variances or importance. This technique has three
effects: it orthogonalizes the components of the input vectors so that uncorrelated with each other, it
orders the resulting orthogonal components so that those with the largest variation come first, and
eliminates those components contributing the least to the variation in the data set.
It should be noted that the input vectors be normalized to have zero mean and unity variance
before performing PCA. The normalization is a standard procedure. Details about PCA could be seen in
Ref. [10].
3 Kernel SVM
The introduction of support vector machine (SVM) is a landmark of the field in machine learning.
The advantages of SVMs include high accuracy, elegant mathematical tractability, and direct geometric
interpretation [29]. Recently, multiple improved SVMs have grown rapidly, among which the kernel
SVMs are the most popular and effective. Kernel SVMs have the following advantages [30]: (1) work
very well in practice and have been remarkably successful in such diverse fields as natural language
categorization, bioinformatics and computer vision; (2) have few tunable parameters; and (3) training
often involves convex quadratic optimization [31]. Hence, solutions are global and usually unique, thus
avoiding the convergence to local minima exhibited by other statistical learning systems, such as neural
3.1 Motivation
Suppose some prescribed data points each belong to one of two classes, and the goal is to classify
which class a new data point will be located in. Here a data point is viewed as a p-dimensional vector,
and our task is to create a (p-1)-dimensional hyperplane. There are many possible hyperplanes that
might classify the data successfully. One reasonable choice as the best hyperplane is the one that
represents the largest separation, or margin, between the two classes, since we could expect better
behavior in response to unseen data during training, i.e. better generalization performance. Therefore,
we choose the hyperplane so that the distance from it to the nearest data point on each side is
maximized [32]. Fig. 5 shows the geometric interpolation of linear SVMs, here H1, H2, H3 are three
hyperplanes which can classify the two classes successfully, however, H2 and H3 does not have the
largest margin, so they will not perform well to new test data. The H1 has the maximum margin to the
support vectors (S11, S12, S13, S21, S22, and S23), so it is chosen as the best classification hyperplane
Fig. 5 The geometric interpolation of linear SVMs (H denotes for the hyperplane, S denotes for the
support vector)
3.2 Principles of Linear SVMs
Given a p-dimensional N-size training dataset of the form
 
( , ) | , { 1, 1} , 1,...,
n n n n
x y x R y n N  − + =
where yn is either -1 or 1 corresponds to the class 1 or 2. Each xn is a p-dimensional vector. The
maximum-margin hyperplane which divides class 1 from class 2 is the support vector machine we want.
Considering that any hyperplane can be written in the form of
where denotes the dot product and W the normal vector to the hyperplane. We want to choose the
W and b to maximize the margin between the two parallel (as shown in Fig. 6) hyperplanes as large as
possible while still separating the data. So we define the two parallel hyperplanes by the equations as
1b = wx
Maximum Margin
Fig. 6 The concept of parallel hyperplanes (w denotes the weight, and b denotes the bias).
Therefore, the task can be transformed to an optimization problem. That is, we want to maximize
the distance between the two parallel hyperplanes, subject to prevent data falling into the margin. Using
simple mathematical knowledge, the problem can be formulated as
( )
. . 1, 1,...,
st y x b n N−  =
In practical situations the ||w|| is usually be replace by
( )
min 2
. . 1, 1,...,
st y x b n N−  =
The reason leans upon the fact that ||w|| is involved in a square root calculation. After it is superseded
with formula (0), the solution will not change, but the problem is altered into a quadratic programming
optimization that is easy to solve by using Lagrange multipliers [34] and standard quadratic
programming techniques and programs [35, 36].
3.3 Kernel SVMs
Traditional SMVs constructed a hyperplane to classify data, so they cannot deal with
classification problem of which the different types of data located at different sides of a
hypersurface, the kernel strategy is applied to SVMs [37]. The resulting algorithm is formally similar,
except that every dot product is replaced by a nonlinear kernel function. The kernel is related to the
transform φ(xi) by the equation k(xi, xj) = φ(xi) φ(xj). The value w is also in the transformed space,
with w = Σi αi yi φ(xi). Dot products with w for classification can be computed by φ(x)= Σi αi yi
k(xi, x).
In another point of view, the KSVMs allow to fit the maximum-margin hyperplane in a
transformed feature space. The transformation may be nonlinear and the transformed space higher
dimensional; thus though the classifier is a hyperplane in the higher-dimensional feature space, it may
be nonlinear in the original input space. Three common kernels [38] are listed in Tab. 1. For each
kernel, there should be at least one adjusting parameter so as to make the kernel flexible and tailor itself
to practical data.
Tab. 1 Three Common Kernels (HPOL, IPOL, and GRB) with their formula and parameters
Homogeneous Polynomial (HPOL)
( , ) ( )d
i j i j
k x x x x=
Inhomogeneous Polynomial (IPOL)
( , ) ( 1)d
i j i j
k x x x x=+
Gaussian Radial Basis (GRB)
( )
( , ) exp || ||
i j i j
k x x x x
= −
wx-b = -1
wx-b = 1
wx-b = 0
4 K-fold Stratified Cross Validation
Fig. 7 A 5-fold Cross Validation
Since the classifier is trained by a given dataset, so it may achieve high classification accuracy
only for this training dataset not yet other independent datasets. To avoid this overfitting, we need to
integrate cross validation into our method. Cross validation will not increase the final classification
accuracy, but it will make the classifier reliable and can be generalized to other independent datasets.
Cross validation methods consist of three types: Random subsampling, K-fold cross validation,
and leave-one-out validation. The K-fold cross validation is applied due to its properties as simple, easy,
and using all data for training and validation. The mechanism is to create a K-fold partition of the
whole dataset, repeat K times to use K-1 folds for training and a left fold for validation, and finally
average the error rates of K experiments. The schematic diagram of 5-fold cross validation is shown in
Fig. 7.
The K folds can be purely randomly partitioned, however, some folds may have a quite different
distributions from other folds. Therefore, stratified K-fold cross validation was employed, where
every fold has nearly the same class distributions [39]. Another challenge is to determine the number of
folds. If K is set too large, the bias of the true error rate estimator will be small, but the variance of the
estimator will be large and the computation will be time-consuming. Alternatively, if K is set too small,
the computation time will decrease, the variance of the estimator will be small, but the bias of the
estimator will be large [40]. In this study, we empirically determined K as 5 through the trial-and-error
method. That means, we suppose parameter K varies from 3 to 10 with increasing step as 1, and then
we trained the SVM by each value. Finally we select the optimal K value corresponding to the highest
classification accuracy.
5 Experiments and discussions
The experiments were carried out on the platform of P4 IBM with 3GHz processor and 2GB
RAM, running under Windows XP operating system. The algorithm was in-house developed via the
wavelet toolbox, the biostatistical toolbox of Matlab 2011b (The Mathworks ©). We downloaded the
open SVM toolbox, extended it to Kernel SVM, and applied it to the MR brain images classification.
The programs can be run or tested on any computer platforms where Matlab is available.
5.1 Database
The datasets consists of T2-weighted MR brain images in axial plane and 256×256 in-plane
resolution, which were downloaded from the website of Harvard Medical School (URL:, OASIS dataset (URL:, and ADNI
dataset (URL: choose T2 model since T2 images are of higher-contrast
and clearer vision compared to T1 and PET modalities.
The abnormal brain MR images of the dataset consist of the following diseases: glioma,
meningioma, Alzheimer’s disease, Alzheimers disease plus visual agnosia, Pick’s disease, sarcoma,
and Huntington’s disease. The samples of each disease are illustrated in Fig. 8.
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Total Number of Dataset
Fig. 8 Sample of brain MRIs: (a) normal brain; (b) glioma; (c) meningioma (d) Alzheimer’s disease; (e)
Alzheimer’s disease with visual agnosia; (f) Pick’s disease; (g) sarcoma; (h) Huntington’s disease.
We randomly selected 20 images for each type of brain. Since there are one type of normal brain
and seven types of abnormal brain in the dataset, 160 images was selected consisting of 20 normal and
140 (= 7 types of diseases × 20 images/diseases) abnormal brain images. The setting of the training
images and validation images is shown in Tab.2 since 5-fold cross validation was used.
Tab.2 Setting of Training and Validation Images (5-fold Stratified Cross Validation)
Total No.
of images
Training (128)
Validation (32)
5.2 Feature extraction
Fig. 9 The procedures of 3-level 2D DWT: (a) normal brain MRI; (b) level-3 wavelet coefficients
The three levels of wavelet decomposition greatly reduce the input image size as shown in Fig. 9.
The top left corner of the wavelet coefficients image denotes the approximation coefficients of level-3,
whose size is only 32×32 = 1024.
5.3 Feature Reduction
Fig. 10 Variances against No. of principle components (x axis is log scale)
As stated above, the number of extracted features were reduced from 65536 to 1024. However, it
is still too large for calculation. Thus, PCA is used to further reduce the dimensions of features to a
No. of Principle Component
higher degree. The curve of cumulative sum of variance versus the number of principle components is
shown in Fig. 10.
The variances versus the number of principle components from 1 to 20 are listed in Tab.3. It
shows that only 19 principle components (bold font in table), which are only 1.86% of the original
features, could preserve 95.4% of total variance.
Tab.3 Detailed data of PCA
No. of Prin. Comp.
Variance (%)
No. of Prin. Comp.
Variance (%)
5.4 Classification Accuracy
We tested four SVMs with different kernels (LIN, HPOL, IPOL, and GRB). In the case of using
linear kernel, the KSVM degrades to original linear SVM.
We computed hundreds of simulations in order to estimate the optimal parameters of the kernel
functions, such as the order d in HPOL and IPOL kernel, and the scaling factor γ in GRB kernel. The
confusion matrices of our methods are listed in Tab.4. The element of ith row and jth column represents
the classification accuracy belonging to class i are assigned to class j after the supervised classification.
Tab.4 Confusion matrix of our DWT+PCA+KSVM method (Kernel chose LIN, HPOL, IPOL, and
Normal (O)
Abnormal (O)
Normal (T)
Abnormal (T)
Normal (O)
Abnormal (O)
Normal (T)
Abnormal (T)
Normal (O)
Abnormal (O)
Normal (T)
Abnormal (T)
Normal (O)
Abnormal (O)
Normal (T)
Abnormal (T)
(O denotes for output, T denotes for Target)
The results showed that the proposed DWT+PCA+KSVM method obtains quite excellent results
on both training and validation images. For LIN kernel, the whole classification accuracy was
(17+135)/160 = 95%; for HPOL kernel, was (19+136)/160 = 96.88%; for IPOL kernel, was
(18+139)/160 = 98.12%; and for the GRB kernel, was (20+139)/160 = 99.38%. Obviously, the GRB
kernel SVM outperformed the other three kernel SVMs.
Moreover, we compared our method with six popular methods (DWT+SOM [12], DWT+SVM
with linear kernel [12], DWT+SVM with RBF based kernel [12], DWT+PCA+ANN [41],
DWT+PCA+kNN [41], and DWT+PCA+ACPSO+FNN [25]) described in the recent literature using
the same MRI dataset and the same number of images. The comparison results were shown in
Tab.5. It indicates that our proposed method DWT+PCA+KSVM with GRB kernel performed best
among the 10 methods, achieving the best classification accuracy as 99.38%. The next is
DWT+PCA+ACPSO+FNN method [25] with 98.75% classification accuracy. The third is our proposed
DWT+PCA+KSVM with IPOL kernel with 98.12% classification accuracy.
Tab.5 Classification Accuracy comparison of 10 different algorithms for the same MRI dataset and
same number of images.
Approach from literatures
Classification Accuracy (%)
DWT+SOM [12]
DWT+SVM with linear kernel [12]
DWT+SVM with RBF based kernel [12]
DWT+PCA+kNN [41]
Approach from this paper
Classification Accuracy (%)
5.5 Time Analysis
Computation time is another important factor to evaluate the classifier. The time for SVM training
was not considered, since the parameters of the SVM keep unchanged after training. We sent all the
160 images into the classifier, recorded corresponding computation time, computed the average value,
depicted consumed time of different stages shown in Fig. 11.
Fig. 11 Computation times at different stages
For each 256x256 image, the averaged computation time on feature extraction, feature reduction,
and SVM classification is 0.023s, 0.0187s, and 0.0031s, respectively. The feature extraction stage is the
most time-consuming as 0.023s. The feature reduction costs 0.0187s. The SVM classification costs the
least time only 0.0031s.
The total computation time for each 256x256 size image is about 0.0448s, which is rapid enough
for a real time diagnosis.
6 Conclusions and Discussions
In this study we have developed a novel DWT+PCA+KSVM method to distinguish between
normal and abnormal MRIs of the brain. We picked up four different kernels as LIN, HPOL, IPOL
and GRB. The experiments demonstrate that the GRB kernel SVM obtained 99.38% classification
accuracy on the 160 MR images, higher than HPOL, IPOL and GRB kernels, and other popular
methods in recent literatures.
Future work should focus on the following four aspects: First, the proposed SVM based
method could be employed for MR images with other contrast mechanisms such as T1-weighted,
Proton Density weighted, and diffusion weighted images. Second, the computation time could be
accelerated by using advanced wavelet transforms such as the lift-up wavelet. Third, Multi-
classification, which focuses on specific disorders studied using brain MRI, can also be explored.
Forth, novel kernels will be tested to increase the classification accuracy.
The DWT can efficiently extract the information from original MR images with little loss.
The advantage of DWT over Fourier Transforms is the spatial resolution, viz., DWT captures both
frequency and location information. In this study we choose the Harr wavelet, although there are
other outstanding wavelets such as Daubechies series. We will compare the performance of
different families of wavelet in future work. Another research direction lies in the stationary
wavelet transform and the wavelet packet transform.
The importance of PCA was demonstrated in the discussion section. If we omitted the PCA
procedures, we meet a huge search space (as shown in Fig. 10 and Tab.3, PCA reduced the 1024
dimensional search space to 19 dimensional search space) which will cause heavy computation
burden and worsened classification accuracy. There are some other excellent feature
Feature Extraction Feature Reduction SVM Classification
Processing steps
Averaged Computation Time (s)
transformation methods such as ICA, manifold learning. In the future, we will focus on
investigating the performance of these algorithms.
The proposed DWT+PCA+KSVM with GRB kernel method shows superiority to the LIN,
HPOL, and IPOL kernels SVMs. The reason is the GRB kernel takes the form of exponential
function, which can enlarge the distance between samples to the extent that HPOL can’t reach.
Therefore, we will apply the GRB kernel to other industrial fields.
There are two different schools of classification. One is while-box classification, such as the
decision-trees or rule-based models. The readers can extract reasonable rules from this kind of
classifiers. For example, a typical decision tree can be interpreted as “If age is less than 15, turn to
left node, and then if gender is male, then turn to right node, and …..”. Therefore, the white-box
classifiers make sense to patients.
Another school is black-box classification. That means the classifier is intuitionistic, so the
reader can’t extract reasonable rules even the kind of classifiers works better and gets higher
classification accuracy than the white-box classifiers. From another point of view, this kind of
classifiers is really designed by “artificial intelligence” or “computer intelligence”. The computer
constructed the classifier using its own intelligence not the human sense.
Our method belongs to the latter one. Our goal is to construct a universal classifier not
regarding to the age, gender, brain structure, focus of disease, and the like [42], but merely
centering on the classification accuracy and highly robustness. This kind of classifier may need
further improvements since the patients may need convincing and irrefutable proof to accept the
diagnosis of their diseases.
There are literatures describing wavelet transforms, PCA, and kernel SVMs. The most
important contribution of this paper is to propose a method which combines them as a powerful
tool for identifying normal MR brain from abnormal MR brain. Meanwhile, we tested four kernels,
and find GRB kernel as the most successful kernel. This technique of brain MRI classification
based on PCA and KSVM is a potentially valuable tool to be used in computer assisted clinical
[1] Zhang, Y., L. Wu, and S. Wang, "Magnetic Resonance Brain Image Classification by an
Improved Artificial Bee Colony Algorithm," Progress in Electromagnetics Research, Vol. 116,
No. pp. 65-79, 2011.
[2] Mohsin, S. A., N. M. Sheikh, and U. Saeed, "MRI Induced Heating of Deep Brain Stimulation
Leads: Effect of the Air-Tissue Interface," Progress In Electromagnetics Research, Vol. 83, No.
pp. 81-91, 2008.
[3] Golestanirad, L., et al., "Effect of Realistic Modeling of Deep Brain Stimulation on the
Prediction of Volume of Activated Tissue," Progress In Electromagnetics Research, Vol. 126,
No. pp. 1-16, 2012.
[4] Mohsin, S. A., "Concentration of the Specific Absorption Rate Around Deep Brain Stimulation
Electrodes During MRI," Progress In Electromagnetics Research, Vol. 121, No. pp. 469-484,
[5] Oikonomou, A., I. S. Karanasiou, and N. K. Uzunoglu, "Phased-Array Near Field Radiometry
for Brain Intracranial Applications," Progress In Electromagnetics Research, Vol. 109, No. pp.
345-360, 2010.
[6] Scapaticci, R., et al., "A Feasibility Study on Microwave Imaging for Brain Stroke
Monitoring," Progress In Electromagnetics Research B, Vol. 40, No. pp. 305-324, 2012.
[7] Asimakis, N. P., et al., "Theoretical Analysis of a Passive Acoustic Brain Monitoring System,"
Progress In Electromagnetics Research B, Vol. 23, No. pp. 165-180, 2010.
[8] Chaturvedi, C. M., et al., "2.45 GHz (Cw) Microwave Irradiation Alters Circadian
Organization, Spatial Memory, Dna Structure in the Brain Cells and Blood Cell Counts of
Male Mice, Mus Musculus," Progress In Electromagnetics Research B, Vol. 29, No. pp. 23-42,
[9] Emin Tagluk, M., M. Akin, and N. Sezgin, "ClassIfIcation of sleep apnea by using wavelet
transform and artificial neural networks," Expert Systems with Applications, Vol. 37, No. 2, pp.
1600-1607, 2010.
[10] Zhang, Y., L. Wu, and G. Wei, "A New Classifier for Polarimetric SAR Images," Progress in
Electromagnetics Research, Vol. 94, No. pp. 83-104, 2009.
[11] Camacho, J., J. Picó, and A. Ferrer, "Corrigendum to "The best approaches in the on-line
monitoring of batch processes based on PCA: Does the modelling structure matter?" [Anal.
Chim. Acta Volume 642 (2009) 59-68]," Analytica Chimica Acta, Vol. 658, No. 1, pp. 106-106,
[12] Chaplot, S., L. M. Patnaik, and N. R. Jagannathan, "Classification of magnetic resonance brain
images using wavelets as input to support vector machine and neural network," Biomedical
Signal Processing and Control, Vol. 1, No. 1, pp. 86-92, 2006.
[13] Cocosco, C. A., A. P. Zijdenbos, and A. C. Evans, "A fully automatic and robust brain MRI
tissue classification method," Medical Image Analysis, Vol. 7, No. 4, pp. 513-527, 2003.
[14] Zhang, Y. and L. Wu, "Weights optimization of neural network via improved BCO approach,"
Progress in Electromagnetics Research, Vol. 83, No. pp. 185-198, 2008.
[15] Yeh, J.-Y. and J. C. Fu, "A hierarchical genetic algorithm for segmentation of multi-spectral
human-brain MRI," Expert Systems with Applications, Vol. 34, No. 2, pp. 1285-1295, 2008.
[16] Patil, N. S., et al., "Regression Models Using Pattern Search Assisted Least Square Support
Vector Machines," Chemical Engineering Research and Design, Vol. 83, No. 8, pp. 1030-1037,
[17] Wang, F.-F. and Y.-R. Zhang, "The Support Vector Machine for Dielectric Target Detection
Through a Wall," Progress In Electromagnetics Research Letters, Vol. 23, No. pp. 119-128,
[18] Xu, Y., et al., "An Support Vector Regression Based Nonlinear Modeling Method for Sic
Mesfet," Progress In Electromagnetics Research Letters, Vol. 2, No. pp. 103-114, 2008.
[19] Li, D., W. Yang, and S. Wang, "Classification of foreign fibers in cotton lint using machine
vision and multi-class support vector machine," Computers and Electronics in Agriculture, Vol.
74, No. 2, pp. 274-279, 2010.
[20] Gomes, T. A. F., et al., "Combining meta-learning and search techniques to select parameters
for support vector machines," Neurocomputing, Vol. 75, No. 1, pp. 3-13, 2012.
[21] Hable, R., "Asymptotic normality of support vector machine variants and other regularized
kernel methods," Journal of Multivariate Analysis, Vol. 106, No. 0, pp. 92-117, 2012.
[22] Ghosh, A., B. Uma Shankar, and S. K. Meher, "A novel approach to neuro-fuzzy
classification," Neural Networks, Vol. 22, No. 1, pp. 100-109, 2009.
[23] Durak, L., "Shift-invariance of short-time Fourier transform in fractional Fourier domains,"
Journal of the Franklin Institute, Vol. 346, No. 2, pp. 136-146, 2009.
[24] Zhang, Y. and L. Wu, "Crop Classification by forward neural network with adaptive chaotic
particle swarm optimization," Sensors, Vol. 11, No. 5, pp. 4721-4743, 2011.
[25] Zhang, Y., S. Wang, and L. Wu, "A Novel Method for Magnetic Resonance Brain Image
Classification based on Adaptive Chaotic PSO," Progress in Electromagnetics Research, Vol.
109, No. pp. 325-343, 2010.
[26] Ala, G., E. Francomano, and F. Viola, "A Wavelet Operator on the Interval in Solving
Maxwell's Equations," Progress In Electromagnetics Research Letters, Vol. 27, No. pp.
133-140, 2011.
[27] Iqbal, A. and V. Jeoti, "A Novel Wavelet-Galerkin Method for Modeling Radio Wave
Propagation in Tropospheric Ducts," Progress In Electromagnetics Research B, Vol. 36, No.
pp. 35-52, 2012.
[28] Messina, A., "Refinements of damage detection methods based on wavelet analysis of
dynamical shapes," International Journal of Solids and Structures, Vol. 45, No. 14–15, pp.
4068-4097, 2008.
[29] Martiskainen, P., et al., "Cow behaviour pattern recognition using a three-dimensional
accelerometer and support vector machines," Applied Animal Behaviour Science, Vol. 119, No.
1–2, pp. 32-38, 2009.
[30] Bermejo, S., B. Monegal, and J. Cabestany, "Fish age categorization from otolith images using
multi-class support vector machines," Fisheries Research, Vol. 84, No. 2, pp. 247-253, 2007.
[31] Muniz, A. M. S., et al., "Comparison among probabilistic neural network, support vector
machine and logistic regression for evaluating the effect of subthalamic stimulation in
Parkinson disease on ground reaction force during gait," Journal of Biomechanics, Vol. 43, No.
4, pp. 720-726, 2010.
[32] Bishop, C. M., Pattern Recognition and Machine Learning (Information Science and
Statistics): Springer-Verlag New York, Inc., 2006.
[33] Vapnik, V., The nature of statistical learning theory: Springer-Verlag New York, Inc., 1995.
[34] Jeyakumar, V., J. H. Wang, and G. Li, "Lagrange multiplier characterizations of robust best
approximations under constraint data uncertainty," Journal of Mathematical Analysis and
Applications, Vol. 393, No. 1, pp. 285-297, 2012.
[35] Cucker, F. and S. Smale, "On the mathematical foundations of learning," Bulletin of the
American Mathematical Society, Vol. 39, No. pp. 1-49, 2002.
[36] Poggio, T. and S. Smale, "The Mathematics of Learning: Dealing with Data," Notices of the
American Mathematical Society (AMS), Vol. 50, No. 5, pp. 537-544, 2003.
[37] Acevedo-Rodríguez, J., et al., "Computational load reduction in decision functions using
support vector machines," Signal Processing, Vol. 89, No. 10, pp. 2066-2071, 2009.
[38] Deris, A. M., A. M. Zain, and R. Sallehuddin, "Overview of Support Vector Machine in
Modeling Machining Performances," Procedia Engineering, Vol. 24, No. 0, pp. 308-312, 2011.
[39] May, R. J., H. R. Maier, and G. C. Dandy, "Data splitting for artificial neural networks using
SOM-based stratified sampling," Neural Networks, Vol. 23, No. 2, pp. 283-294, 2010.
[40] Armand, S., et al., "Linking clinical measurements and kinematic gait patterns of toe-walking
using fuzzy decision trees," Gait & Posture, Vol. 25, No. 3, pp. 475-484, 2007.
[41] El-Dahshan, E.-S. A., T. Hosny, and A.-B. M. Salem, "Hybrid intelligent techniques for MRI
brain images classification," Digital Signal Processing, Vol. 20, No. 2, pp. 433-441, 2010.
[42] Evans, A. C., et al., "Brain templates and atlases," NeuroImage, Vol. 62, No. 2, pp. 911-922,
... They achieved an accuracy of 96.51%. Zhang et al. [10] proposed an automatic method for classification of MRI brain images based kernel support vector machine (KSVM) and wavelet transform (WT) features with Principal Component Analysis (PCA) to reduce the size of features. Usman and Rajpoot [11] investigated wavelet texture features with random forest classifier to predict tumor labels as multiclass classification. ...
... Traditional methods generally extract a series of features such as statistical features and texture features of images, and then, use artificial neural networks, random forests, and support vector machines for segmentation [4]. Traditional methods can be roughly divided into four categories, namely threshold-based segmentation methods [5,6], edge-based segmentation methods [7][8][9], and cluster-based segmentation methods [10][11][12] and region-based segmentation methods [13,14]. However, the pros and cons of extracting features in traditional methods will greatly affect the final results of the experiment. ...
Full-text available
In recent years, the fully convolutional network represented by Unet has been widely used in the field of medical image segmentation. However, due to the diversity of the shapes of lesions and the differences in the structures of different organs, the segmentation of lesions using only Unet structure cannot meet the requirements of accuracy and speed. Therefore, an improved Unet network for brain tumor segmentation is proposed. To reduce the number of parameters while extracting richer features and improving the accuracy of segmentation, this article introduces the inverted residuals block to replace the convolution module in the encoding and decoding stages of Unet to improve the calculation speed and accuracy; to better combine high-order semantic information with low-order semantic information, improve for the quality of detailed features in the training process, an improved Residuals Convolutional Block Attention Module is added between the encoder and the decoder. Combining the above two points of improvement, this article proposes an improved model based on Unet. Based on the Brats2019 dataset, an ablation experiment was performed on the proposed improved Unet model and compared with the TrUE-Net, ConResNet and OM-Net methods, and the Dice coefficient and Hausdorff distance were used as evaluation indicators to analyze the segmentation effect of the model. The experimental results show that the Dice coefficient of the improved Unet network model proposed in this article is 0.020–0.027 higher than other comparative models on average, and the Haushofer distance is reduced by 2.67–10.06.
... But this method can differentiate only glioblastoma, sarcoma and metastatic bronchogenic carcinoma tumors. Zhang and Wu [5] proposed a classification method by using kernel support vector machine (KSVM). Wavelet transforms as DWT was used to extract features from images and reduced the dimensions of those features by PCA method. ...
Full-text available
Magnetic resonance imaging (MRI) of the brain is essential for measuring and visualizing the brains anatomical structure, analyzing brain abnormalities, delineating pathological regions, surgical planning, and image-guided interventions. Image processing techniques are applied to MRI images for identification, detection, and classification of brain diseases. In this paper, an improved feature extraction method is proposed, which can extract the features of brain MRI images for detecting brain tumors as benign and malignant. In the proposed method Log-Polar Transformation (LPT) based feature extraction method is developed to extract the features from brain MRI images which are used as detection factor of brain abnormalities. Kernel support vector machine (KSVM) is applied to the extracted features as image classification tools for the tested brain MRI images. Extensive experiments are simulated over different orientation and conversion of scales in T-1 and T-2 weighted MRI images. A comparative analysis among the proposed method with other promising method is performed, where it is clearly shown the improvement of the proposed method.
The brain tumor classification is implemented through biopsy, which is not normally executed before classic mind surgery. Machine learning (ML) algorithms assist radiologists in tumor analysis, not including obtrusive evaluations. The conventional ML strategies need separate feature extraction to tumor detect thus it needs more computation time to perform classification. Deep learning (DL) based convolution neural networks (CNNs) have been focused on brain tumor detection. In this paper, the CNN algorithm is improved based on meta‐heuristics, which are used for pre‐trained systems for databases to categorize MRI brain tumor images. Pre‐trained DL, binary swallow swarm optimization (BSSO) is used for improving the weight and predispositions of the CNN algorithm. It is a block‐wise calibrating system which is dependent on transfer learning. The current technique is assessed over a publically accessible magnetic resonance imaging (MRI) brain tumor database containing three categories as glioma, meningioma, and pituitary by the most noteworthy rate among everyone brain tumor in medical training. The proposed strategy is assessed over T1‐weighted contrast‐enhanced MRI (CE‐MRI) benchmark data. To assess the execution, utilize the proposed strategies to the CE‐MRI dataset for tumor detection and in the general execution of the BSSO‐CNN model is estimated using the execution assessment measurements such as precision, sensitivity (recall), specificity, F1‐score, and accuracy. Exploratory outcomes demonstrated with the purpose of the proposed strategy higher when compared to other methods to all metrics.
Sentiment Analysis (SA) is the current field of research in text mining field. SA is detecting opinions, sentiments, and subjectivity of text. It is the application of natural language processing techniques and text analytics to identify and extract subjective information from the frequently used sources such as web and microblogs. The main objective of sentiment analysis is to analyse reviews of products and services, and determine the scores of such sentiments. The major problem is that the reviews are mostly unstructured and thus, need classification or clustering to provide meaningful information for future use. This research work presents a survey of several machine learning techniques to enhance the classification accuracy in child YouTube data sentimental analysis. The hybrid Support vector machine model with ant colony optimization technique is employed for improving classification accuracy in child YouTube data sentiment analysis. The proposed hybrid classifier uses the results of the work compared with Naïve Bayes, SVM and Adaboosting + SVM classification techniques. The prediction for YouTube child input test sentences by each classifier is taken and the final output prediction is declared the one that has received harmful and secures decision. The proposed hybrid approach gives better accuracy in classification than the individual machine learning algorithms and also the existing proposed hybrid methods.
Dementia is a brain condition that impairs the cognitive abilities of an individual. Mild cognitive impairment is a mediator phase of healthy and dementia controls. The motivation of this study is to predict dementia using magnetic resonance imaging data, which is significant for the diagnosis of normal control and dementia patients. The proposed model leverages effective methods like Discrete Wavelet Transform, Bag of Features, and Support Vector Machine. The four wavelets haar, Daubechies, symlets, and coiflets are used for image compression. The results of the proposed data intelligence model are promising in terms of accuracy which is 92.32% which is better than the recently proposed models. Also, the proposed data intelligence model is compared with the models which may use curvelet transform, and shearlet transform and with the methods which have gone without using DWT transforms. The comparisons have also been made with the models that have used other prevalent techniques like Principal Component Analysis, Fisher Discriminant Ratio, and Gray Level Co-occurrence Matrix. The outcomes support the usage of each technique in the suggested data intelligence paradigm.
Medical anomaly identification using machine learning is a significant subject that has received a lot of attention. Artificial neural networks’ successor, deep learning, is a well-developed technology with strong computational capabilities. Its popularity has increased in recent years due to the availability of rapid data storage and hardware parallelism. Numerous, sizeable medical imaging datasets have recently been made available to the public, which has sparked interest in the field and increased the number of research studies and publications. The main goal of this study is to give a complete theoretical examination of prominent deep learning algorithms for detecting medical anomalies. The study further presents the architecture of current methodologies, compare and contrasts training algorithms, and gives a robust assessment of current methodologies. A thorough analysis of the state-of-the-art is provided, covering the benefits and limitations associated with using open-source data, and the specifications for clinically relevant systems. This study further identifies the gaps in the body of existing knowledge and suggests future research directions.
Full-text available
Dementia is a neurocognitive brain disease that emerged as a worldwide health challenge. Machine learning and deep learning have been effectively applied for the detection of dementia using magnetic resonance imaging. In this work, the performance of both machine learning and deep learning frameworks along with artificial neural networks are assessed for detecting dementia and normal subjects using MRI images. The first-order and second-order hand-crafted features are used as input for machine learning and artificial neural networks. And automatic feature extraction is used in the last framework with the pre-trained networks. The outcomes show that the framework using the deep neural networks performs better contrasted with the first two methodologies used in terms of various performance measures.
Full-text available
In this paper, a novel Wavelet-Galerkin Method (WGM) is presented to model the radio-wave propagation in tropospheric ducts. Galerkin method, with Daubechies scaling functions, is used to discretize the height operator. Later, a marching algorithm is developed using Crank-Nicolson (CN) method. A new"fictitious domain method"is also developed for parabolic wave equation to incorporate the impedance boundary conditions in WGM. In the end, results are compared with those from Advance Refractive Effects Prediction System (AREPS). Results show that the wavelet based methods are indeed feasible to model the radio wave propagation in troposphere as accurately as AREPS and proposed method can be a good alternative to other conventional methods.
Full-text available
The adoption of microwave imaging as a tool for non-invasive monitoring of brain stroke has recently gained increasing attention. In this respect, the paper aims at providing a twofold contribution. First, we introduce a simple design tool to devise guidelines to properly set the working frequency as well as to choose the optimum matching medium needed to facilitate the penetration of the probing wave into the head. Second, we propose an imaging strategy based on a modified formulation of the linear sampling method, which allows a quasi real time monitoring of the disease's evolution. The accuracy of the design guidelines and performance of the imaging strategy are assessed through numerical examples dealing with 2D anthropomorphic phantoms.
Full-text available
In this paper, a novel approach based on the support vector machine (SVM) for dielectric target detection in through-wall scenario is proposed. Through-wall detection is converted to the establishment and use of a mapping between backscattered data and the dielectric parameter of the target. Then the propagation effects caused by walls, such as refraction and speed change, are included in the mapping that can be regressed after SVM training process. The training and testing data for the SVM is obtained by finite-difference time-domain (FDTD) simulation. Numerical experiments show that once the training phase is completed, this technique only needs computational time in an order of seconds to predict the parameters. Besides, experimental results show that good consistency between the actual parameters and estimated ones is achieved. Through-wall target tracking is also discussed and the results are acceptable.
Full-text available
Automated and accurate classification of magnetic resonance (MR) brain images is an integral component of the analysis and interpretation of neuroimaging. Many different and innovative methods have been proposed to improve upon this technology. In this study, we presented a forward neural network (FNN) based method to classify a given MR brain image as normal or abnormal. This method first employs a wavelet transform to extract features from images, and then applies the technique of principle component analysis (PCA) to reduce the dimensions of features. The reduced features are sent to an FNN, and these parameters are optimized via adaptive chaotic particle swarm optimization (ACPSO). K-fold stratified cross validation was used to enhance generalization. We applied the proposed method on 160 images (20 normal, 140 abnormal), and found that the classification accuracy is as high as 98.75% while the computation time per image is only 0.0452s.
Full-text available
Automated and accurate classification of magnetic resonance (MR) brain images is a hot topic in the field of neuroimaging. Recently many different and innovative methods have been proposed to improve upon this technology. In this study, we presented a hybrid method based on forward neural network (FNN) to classify an MR brain image as normal or abnormal. The method first employed a discrete wavelet transform to extract features from images, and then applied the technique of principle component analysis (PCA) to reduce the size of the features. The reduced features were sent to an FNN, of which the parameters were optimized via an improved artificial bee colony (ABC) algorithm based on both fitness scaling and chaotic theory. We referred to the improved algorithm as scaled chaotic artificial bee colony (SCABC). Moreover, the K-fold stratified cross validation was employed to avoid overfitting. In the experiment, we applied the proposed method on the data set of T2-weighted MRI images consisting of 66 brain images (18 normal and 48 abnormal). The proposed SCABC was compared with traditional training methods such as BP, momentum BP, genetic algorithm, elite genetic algorithm with migration, simulated annealing, and ABC. Each algorithm was run 20 times to reduce randomness. The results show that our SCABC can obtain the least mean MSE and 100% classification accuracy.
Full-text available
In machining, the process of modeling and optimization are challenging tasks and need proper approaches to qualify the requirements in order to produce high quality of products with less cost estimation. There are a lot of modeling techniques that have been discovered by researches. In the recent years the trends were towards modeling of machining using computational approaches such as support vector machine (SVM), artificial neural network (ANN), genetic algorithm (GA), artificial bee colony (ACO) and particle swarm optimization (PSO). This paper reviews the application of SVM, classified as one of the popular trends in modeling techniques for both types of machining operations, conventional and modern machining. Generally, support vector machine is a powerful mathematical tool for data classification, regression and function estimation and also widely used for modeling machining operations. In SVM, there are several types of kernel function that used in SVM training parameters such as linear, polynomial, radial basis function (RBF), sigmoid and Gaussian kernel function. Review shows that RBF kernel function was widely applied in SVM as a kernel function in modeling machining performances.
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
During Magnetic Resonance Imaging (MRI), the presence of an implant such as a Deep Brain Stimulation (DBS) lead in a patient's body can pose a significant risk. This is due to the fact that the MR radiofrequency (RF) field can achieve a very high strength around the DBS electrodes. Thus the specific absorption rate (SAR), which is proportional to the square of the magnitude of the RF electric field, can have a very high concentration in the near-field region of the electrodes. The resulting tissue heating can reach dangerous levels. The degree of heating depends on the level of SAR concentration. The effects can be severe, leading to tissue ablation and brain damage, and significant safety concerns arise whenever a patient with an implanted DBS lead is exposed to MR scanning. In this paper, SAR, electric field, and temperature rise distributions have been found around actual DBS electrodes. The magnitude and spatial distribution of the induced temperature rises are found to be a function of the length and structure of the lead device, tissue properties and the MR stimulation paramete
During the past decades there has been a tremendous increase throughout the scientific community for developing methods of understanding human brain functionality, as diagnosis and treatment of diseases and malfunctions, could be effectively developed through understanding of how the brain works. In parallel, research effort is driven on minimizing drawbacks of existing imaging techniques including potential risks from radiation and invasive attributes of the imaging methodologies. Towards that direction a new near field radiometry imaging system has been theoretically studied, developed and experimentally tested and all of the aforementioned research phases are herein presented. The system operation principle is based on the fact that human tissues emit chaotic thermal type radiation at temperatures above the absolute zero. Using a phase shifted antenna array system, spatial resolution, detection depth and sensitivity are increased. Combining previous research results, as well as new findings, the capabilities of the constructed system, as well as the possibility of using it as a complementary method for brain imaging are discussed in this paper.
In this paper we explain how to characterize the best approximation to any xx in a Hilbert space XX from the set C∩{x∈X:gi(x)≤0,i=1,2,…,m}C∩{x∈X:gi(x)≤0,i=1,2,…,m} in the face of data uncertainty in the convex constraints, gi(x)≤0,i=1,2,…,mgi(x)≤0,i=1,2,…,m, where CC is a closed convex subset of XX. Following the robust optimization approach, we establish Lagrange multiplier characterizations of the robust constrained best approximation that is immunized against data uncertainty. This is done by characterizing the best approximation to any xx from the robust counterpart of the constraints where the constraints are satisfied for all possible uncertainties within the prescribed uncertainty sets. Unlike the traditional Lagrange multiplier characterizations without data uncertainty, for constrained best approximation problems in the face uncertainty, we show that the strong conical hull intersection property (strong CHIP) alone is not sufficient to guarantee the Lagrange multiplier characterizations. We present conditions which guarantee that the strong CHIP is necessary and sufficient for the multiplier characterization. We also establish that the strong CHIP is automatically satisfied for the cases of polyhedral constraints with polytope uncertainty, and linear constraints with interval uncertainty. As an application, we show how robust solutions of shape preserving interpolation problems under ellipsoidal and box uncertainty cases can be obtained in terms of Lagrange multipliers under strict robust feasibility conditions.