Content uploaded by Damian Mingle

Author content

All content in this area was uploaded by Damian Mingle on Jun 26, 2016

Content may be subject to copyright.

Research Article Open Access

International Journal of

Biomedical Data Mining

ISSN: 2090-4924

I

n

t

e

r

n

a

t

i

o

n

a

l

J

o

u

r

n

a

l

o

f

B

i

o

m

e

d

i

c

a

l

D

a

t

a

M

i

n

i

n

g

Mingle, Biomedical Data Mining 2015, 4:1

http://dx.doi.org/10.4172/2090-4924.1000114

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

A Discriminative Feature Space for Detecting and Recognizing Pathologies

of the Vertebral Column

Damian Mingle*

WPC Healthcare, 1802 Williamson Court I, Brentwood, USA

*Corresponding author: Damian Mingle, Chief Data Scientist, WPC

Healthcare, 1802 Williamson Court I, Brentwood, USA, Tel: 615-364-9660;

E-mail: dmingle@wpchealthcare.com

Received June 30, 2015; Accepted August 19, 2015; Published September 15,

2015

Citation: Mingle D (2015) A Discriminative Feature Space for Detecting and

Recognizing Pathologies of the Vertebral Column. Biomedical Data Mining 4: 114.

doi:10.4172/2090-4924.1000114

Copyright: © 2015 Mingle D. This is an open-access article distributed under the

terms of the Creative Commons Attribution License, which permits unrestricted

use, distribution, and reproduction in any medium, provided the original author and

source are credited.

Abstract

Each year it has become more and more difcult for healthcare providers to determine if a patient has a pathology

related to the vertebral column. There is great potential to become more efcient and effective in terms of quality

of care provided to patients through the use of automated systems. However, in many cases automated systems

can allow for misclassication and force providers to have to review more causes than necessary. In this study, we

analyzed methods to increase the True Positives and lower the False Positives while comparing them against state-

of-the-art techniques in the biomedical community. We found that by applying the studied techniques of a data-driven

model, the benets to healthcare providers are signicant and align with the methodologies and techniques utilized

in the current research community.

Keywords: Vertebral column; Feature engineering; Probabilistic

modeling; Pattern recognition

Introduction

Over the years there has been an increase in machine learning (ML)

techniques, such as Random Forrest (RF), Boosting (ADA), Logistic

(GLM), Decision Trees (RPART), Support Vector Machines (SVM),

and Articial Neural Networks (ANN) applied to many medical elds.

A signicant reason this has become the case is the capacity for human

beings to act as diagnostic tools over time. Stress, fatigue, ineciencies,

and lack of knowledge all become barriers to high- quality outcomes.

ere have been studies regarding applications of data mining in

dierent elds, namely: biochemistry, genetics, oncology, neurology

and EEG analysis. However, literature suggests that there are few

comparisons of machine learning algorithms and techniques in medical

and biological areas. Of these ML algorithms, the most common

approach to develop nonparametric and nonlinear classications is

based on ANNs.

In general, the numerous methods of machine learning that have

been applied can be grouped into two sets: knowledge-driven models

and data-driven models. e parameters of the knowledge-driven

models are estimated based on the expert knowledge of detecting and

recognizing pathologies of the vertebral column. On the other hand, the

parameters of data- driven models are estimated based on quantitative

measures of associations between evidential features within the data.

e classication models used in pathologies of the vertebral column

have been SVM.

Studies have shown that ML algorithms are more accurate than

statistical techniques, especially when the feature space is more

complex or the input datasets are expected to have dierent statistical

distributions [1]. ese algorithms have the potential to identify and

model the complex non-linear relationships between the features of the

biomedical data set collected by Dr. da Mota, namely: pelvic incidence

(PI), pelvic tilt (PT), lumbar lordosis angle (LLA), sacral slope (SS),

pelvic radius (PR), and grade of spondylolisthesis (GOS).

ese methods can handle a large number of evidential features that

may be important in detecting abnormalities in the vertebral column.

However, increasing the number of input evidential features may lead

to increased complexity and larger numbers of model parameters, and

in turn the model becomes susceptible to over tting due to the curse

of dimensionality.

is work aims to present medical decision support for those

healthcare providers who are working to diagnosis pathologies of the

vertebral column. is framework is comprised of three subsystems:

feature engineering, feature selection, and model selection.

Pathologies of the vertebral column

Vertebras, invertebrate discs, nerves, muscles, medulla, and

joints make up the vertebral column. e essential functions of the

vertebral column are as follows: (i) human body support (ii) protection

of the nervous roots and medulla spine; and (iii) making the body’s

movement possible [2].

e structure of the intervertebral disc can be injured due to small

or several small traumas in the column. Various pathologies can cause

intense pain, such as disc hernias and spondylolisthesis. Backaches can

be the results of complications that are caused within this complex

system. We briey characterize the biomechanical attributes that

represent each patient in the data set.

Patient characteristics: Dr. Henrique da Mota collected data on

310 patients from sagittal panoramic radiographies of the spine while

at the Centre Medico-Chirurgical de Readaptation des Massues placed

in Lyon, France [3]. 100 patients were volunteers that had no pathology

in their spines (labeled as ‘Normal’). e remainder of patients had disc

hernia (60 patients) or spondylolisthesis (150 patients).

Decision support for orthopedists is automated using ML

algorithms and techniques of real clinical cases that utilize the above

biomechanical attributes. Following, we compare many ML models

evaluated through this study.

Problem statement and standard solutions

Classication refers to the problem of categorizing observations

Citation: Mingle D (2015) A Discriminative Feature Space for Detecting and Recognizing Pathologies of the Vertebral Column. Biomedical Data

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 2 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

into classes. Predictive modeling uses samples of data for which the

class is known to generate a model for classifying new observations. We

are only interested in two possible outcomes: ‘Normal’ and ‘Abnormal’.

Complex datasets make it dicult not to misclassify some observations.

However, our goal was to minimize those errors using the receiver

operating characteristic (ROC) curve.

Literature suggests using an ordinal data approach for detecting

reject regions in combinations with SVM. In addition, selecting the

misclassication costs as follows: Clow cost when classifying a class as

reject and assign Chigh cost when misclassifying.

erefore, Reject=Clow/Chigh=wr is the cost of rejecting (normalized

by the cost of erring). e method accounts to account for the rejections

rate rate and the misclassication rate [2].

Description of the data

It is useful to understand the basic features of the data in our study.

Simple summaries about the sample and the measures, together with

graphical analysis, form a solid basis for our quantitative analysis of

the vertebral column dataset. We conducted univariate analysis which

identies the distribution, central tendency, and dispersion of the data.

e distribution table include the 1st and 3rd quartile, indicating

25% of the values that the observations demonstrate are less than or

greater than the values listed (Table 1).

Distributions: Distribution of Biomechanical Features in class is

specied in Figure 1.

Correlation: A correlation analysis provides insights into the

independence of the numeric input variables. Modeling oen assumes

independence, and better models will result when using independent

input variables. Below is a table of the correlations between each of the

variables (Table 2).

We made use of a Hierarchical dendogram to provide visual clues

to the degree of closeness between variables [4]. e hierarchical

correlation dendrogram produced here presents a view of the variables

of the dataset showing their relationships. e purpose is to eciently

locate groupings of variables that are highly correlated. e length of

the lines in the dendrogram provides a visual indication of the degree of

correlation. For example, shorter lines indicate more tightly correlated

variables (Figure 2).

e feature engineering and data replication method

We developed a method which we termed Feature Bayes. is

method makes use of a probabilistic model from synthetic data creation.

Additionally, the data has been feature engineered and further rened

through automated feature selection. In order to maximize prediction

accuracy we generated 54 additional features. We dene a row vector

as 𝐴=[a1 a2 … a6] using the original six features from the vertebral

column dataset. N is dened as the number of terms.

e features were constructed as follows:

‘Trim mean 80%’ calculates the mean taken by excluding a

percentage of data points from the top and bottom tails of a vector as

such

=

∑

A

xN

ij

a

e

(1)

Information theory, ‘Entropy’, is the expected value of the

information contained in each message received [5] and is generally

constructed as

6

1

2

log

a

na

n

=

∑

(2)

‘Range’ is known as the area of variation between upper and lower

limits and is generally dened as

𝐴max – 𝐴min (3)

We developed ‘Standard Deviation of A’ as a quantity calculated to

indicate the extent of Deviation for a group as a whole,

2

()Xx

n

σ

−

=

∑

(4)

Pelvic_Incidence Pelvic_Tilt Lumbar_Lordosis_Angle Sacral_Slope Pelvic_Radius Degree_Spondylolisthesis

Minimum 26.15 -6.555 14 13.37 70.08 -11.058

1st quarter 45.7 10.759 36.64 33.11 110.66 1.474

Median 59.6 16.481 49.78 42.65 118.15 10.432

Mean 60.96 17.916 52.28 43.04 117.54 27.525

3rd quarter 74.01 21.936 63.31 52.55 125.16 42.81

Maximum 129.83 49.432 125.74 121.43 157.85 418.543

Class

Abnormal 145

Normal 72

Table 1: Descriptive statistics of sample data

Correlation summary using the 'Pearson' covariance

pelvic_radius pelvic_tilt degree_spondylolisthesis lumbar_lordosis_angle sacral_slope pelvic_incidence

pelvic_radius 1 0.01917945 -0.04701219 -0.04345604 -0.34769211 -0.2586922

pelvic_tilt 0.01917945 1 0.37008759 0.45104586 0.04615349 0.6307171

degree_spondylolisthesis -0.04701219 0.37008759 1 0.50847068 0.55060557 0.6478843

lumbar_lordosis_angle -0.04345604 0.45104586 0.50847068 1 0.53161132 0.6812879

sacral slope -0.34769211 0.04615349 0.55060557 0.53161132 1 0.8042957

pelvic_incidence -0.25869222 0.63071714 0. 64788429 0.68128788 0.80429566 1

*Note that only correlations between numeric variables are reported

Table 2: Pearson correlation matrix (Sample)

Citation: Mingle D (2015) A Discriminative Feature Space for Detecting and Recognizing Pathologies of the Vertebral Column. Biomedical Data

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 3 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

‘Cosine of A’ was generated to capture the trigonometric function

that is equal to the proportion of the adjacent side to an acute angle of

the hypotenuse,

cos A

(5)

‘Tangent of A’ was generated to capture the trigonometric

function equal to the proportion of the opposite side over the adjacent

side in a right triangle,

tan A

(6)

‘Sine of A’ was generated to capture the trigonometric function

that is equal to the relationship of the opposite side of a given angle to

the hypotenuse,

sin A

(7)

‘25th Percentile of A’ is the value of vector A such that 25% of the

relevant population is below that value,

25

25 *

100

th Percentile N

=

(8)

Figure 1: Distribution of Biomechanical Features in class.

Figure 2: Hierarchical dendogram of vertebral column (Sample).

Citation: Mingle D (2015) A Discriminative Feature Space for Detecting and Recognizing Pathologies of the Vertebral Column. Biomedical Data

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 4 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

‘20th Percentile of A’ is the value of vector A such that 20% of the

relevant population is below that value,

20

20 *

100

th Percentile N

=

(9)

‘75th Percentile of A’ is the value of vector A such that 75% of the

relevant population is below that value

75

75 *

100

th Percentile N

=

(10)

‘80th Percentile of A’ is the value of vector A such that 80% of the

relevant population is below that value,

80

80 *

100

th Percentile N

=

(11)

‘Pelvic Incidence Squared’ was used to change the pelvic incidence

from a single dimension into an area. Many physical quantities are

integrals of some other quantity,

2

1

a

(12)

For each element of the row vector A we performed a square root

calculation that yields a denite quantity when multiplied by itself,

ij

a

(13)

For each element of the row vector A we created a ‘Natural Log of

𝑎i,j’, more specically a logarithm to the base of e

ln

ij

a

(14)

‘Sum of pelvic incidence and pelvic tilt’,

2

1

a

na

n

=

∑

(15)

For each element of the row vector A we created a ‘Cubed’ value

of 𝑎𝑖j,,

‘Dierence of pelvic incidence and pelvic tilt’,

3

,ij

a

(16)

‘Dierence of pelvic incidence and pelvic tilt’,

a1-a2 (17)

‘Product of pelvic incidence and pelvic tilt’,

2

1

a

na

n

=

∏

(18)

‘Sum of pelvic tilt andlumbar lordosis angle’,

3

2

a

na

n

=

∑

(19)

‘Sum of lumbar lordosis angle and sacral slope’,

4

3

a

na

n

=

∑

(20)

‘Sum of pelvic radius and degree spondylolisthesis’,

5

4

a

na

n

=

∑

(21)

‘Dierence of pelvic tilt and lumbar lordosis angle’,

𝑎2 − 𝑎3 (22)

‘Dierence of lumbar lordosis angle and sacral slope’

𝑎3 – 𝑎4 (23)

‘Dierence of sacral slope and pelvic radius

𝑎4 – 𝑎5

Dierence of pelvic radius and degree spondylolisthesis’,

𝑎5 – 𝑎6 (25)

Quotient of pelvic tilt and pelvic incidence’,

2

1

a

a

(26)

‘Quotient of lumbar lordosis angle and pelvic tilt’,

3

2

a

a

(27)

‘Quotient of sacral slope and lumbar lordosis angle’,

4

3

a

a

(28)

‘Quotient of pelvic radius and sacral slope’,

5

4

a

a

(29)

‘Quotient of degree spondylolisthesis and pelvic radius’,

6

4

a

na

n

=

∑

(30)

‘Sum of elements A’,

6

4

a

na

n

=

∑

(31)

‘Average of A elements’,

A

x

...........(32)

‘Median of A elements’,

1

22

2

th th

nn

term term

Median

++

= (33)

‘Euler’s number raised to the power of ai,j’,

ij

a

e

(34)

Patient data generated with oversampling

e category ‘Normal’ was signicantly underrepresented in the

dataset. We employed the Synthetic minority oversampling technique

(SMOTE) [6]. We chose the class value ‘Normal’ to work with using

ve nearest neighbors to construct an additional 100 instances.

Algorithm SMOTE (T,N,k)

Input: Number of minority class samples T; Amount of SMOTE

N%; Number of nearest neighbors k

Output: (N/100) *T synthetic minority class samples

1. (* If N is less than 100%, randomize the minority class samples as

only a random percent of them will be SMOTEd*)

2. If N<100

3. then Randomize the T minority class samples

4. T=(N/100) * T

5. N=100

6. end if

7. N=(int)(N/100) (*e amount of SMOTE is assumed to be integral

multiples of 100.*)

8. k=Number of nearest neighbors

9. numattrs=Number of attributes

10. Sample[][]: array for original minority class samples

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 5 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

11. newindex: keeps a count of number of synthetic samples generated,

initialized to 0

12. Synthetic[][]: array for synthetic samples (*Compute k nearest

neighbors for each minority class sample only.*)

13. for i ← 1 to T

14. Compute k nearest neighbors for I, and save the indices in the

nnarray

15. Populate(N, i, nnaray)

16. end for Populate (N,i, nnarray) (*Function to generate the synthetic

samples*)

17. while N ≠ 0

18. Choose a random number between 1 and k, call it nn. is step

chooses one of the k nearest neighbors of i.

19. for attr ← 1 to numattrs

20. Compute: dif=Sample[nnarray[nn]] [attr] – Sample[i] [attr]

21. Compute: gap=random number between 0 and 1

22. Synthetic[new index][attr]+gap * dif

23. end for

24. newindex++

25. N=N – 1

26. end while

27. Return (*End of Populate*)

End of Pseudo-Code.

Variance captured while increasing feature space

In an eort to reduce the dimensionality further we opted to use

principal components analysis (PCA) to choose enough eigenvectors

to account for 0.95 of the variance of the sub-selected attributes [7].

We decided to standardize the data rather than center the data, which

allows PCA to be computed by the correlation matrix rather than the

covariance matrix. e maximum number of attributes to include

through this transformation was 10. We then choose 0.95 for the

value of variance covered. is allowed us to retain enough principal

components to account for the appropriate proportion of variance. At

the completion of this process we retained 288 components.

Automated feature selection methods

We utilized a supervised method to select features, a correlation-

based feature subset selection evaluator [7]. is method of evaluation

takes into account the value of a subset of features by analyzing the

individual predictive ability of each feature along with the degree of

sameness between them. e preference is to have low inter-correlation

while having subsets of features that are highly correlated. Furthermore,

we required that the algorithm iteratively add the highest correlated

features with the class given there was not an existing feature in a subset

that had a higher correlation with the feature being analyzed. We

determined that we would search the space of features subsets using

greedy hill climbing improved with a way of retracing. is retracing

was governed by an environment of consecutive non-improving nodes.

We set the direction of the search by starting with the empty set of

attributes and searching forward. Additionally we specied that ve

would be the number of consecutive non-improving nodes to allow

before terminating the search. is method selected 19 attributes from

the 60 features. Of those 19 features, only PT and GOS are original

data inputs, representing approximately 11%; the other 89% are feature

engineered (Table 3).

Evaluation and classier

We used the receiver operator characteristic curves (ROC) which

compare the false positive rate to the true positive rate. We can access

the trade-o of the number of observations that are incorrectly

classied as positives against the number of observations that are

correctly classied as positives.

Area Under the Curve’ (AUC) is the accuracy or total number of

predictions that were correct,

Accuracy=True positive+True Negative/True Positive+False

Negative+False Positive+True Negative

e misclassication rate or the error rate is dened as: Error

rate=1-accuracy

We use other metrics in conjunction with the error rate to help

guide the evaluation process,

namely Recall, Precision, False Positive Rate, True Positive Rate,

False Negative Rate, and F-Measure [8].

Recall is the Sensitivity or True Positive Rate and demonstrates the

ratio of cases that are positive and correctly identied,

Recall=True positive/True Positive+False Negative

e False Positive Rate is dened as the ratio of cases that were

negative and incorrectly classied as positive,

False Positive Rate=False Positive/False Positive+True Negative

e True Negative Rate or Specicity is dened as the ratio of cases

that were negative and classied correctly,

Number of Folds(%) Attribute

10 80th Percentile of A

10 Product of PI and PT

10 Sum of PR and GOS

10 PR Cubed

10 e pelvic tilt

10 e pelvic radius

10 e degree spondylolisthesis

30 PT

30 25th Percentile of A

60 Quotient of PT and PI

70 Square root of PT

90 GOS

90 Range of elements in A

100 Standard Deviation of elements A

100 20th Percentile of A

100 Sum of PR and GOS

100 Difference of PR and GOS

100 Quotient of PR and GOS

100 GOS Cubed

Table 3: Evaluation mode: 10 fold cross validation

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 6 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

True Negative Rate=True Negative/False Positive+True

Negative

e False Negative Rate is the proportion of positive cases that

were incorrectly classied as negative,

False Negative Rate=False Negative/True Positive+False

Negative

Precision is the ratio of the positive cases that were predicted and

classied correctly,

Precision=True positive/True positive+False Positive

F-Measure is computed using the harmonic mean and allows some

average of the information retrieval precision and recall metrics. e

higher the F-Measure value, the higher classication quality,

F-Measure=2(Precision × Recall/Precision+Recall)

We simplied the task for classication by using a Naïve Bayes

classier which assumes attributes have independent distributions, and

thereby estimate

P (d/c j)=p (d1 | cj) x p (d2 | cj) x … x p (dn | cj)

Essentially this is determining the probability of generating instance

d given class cj. e naïve bayes classier is oen represented as the

following graph which states that each class causes certain features with

a certain probability [9] (Figure 3).

In order to emphasize the benets of the incorporation of feature

engineering, feature selection, and PCA, we referenced prior research

using two standard learning models and the rejoSVM classier [2]. All

training and testing was uniformly applied as before.

Furthermore, we abandoned SVM as a base and instead choose to

show the value of incorporating our methods within a simple Naïve

Bayes algorithm [10-13]. Moreover, methods such as Feature Bayes may

be used as a decision support tool for healthcare providers, particularly

for those providers that have minimal resources or limited access to an

ongoing professional peer network [14-16] (Tables 4 and 5).

Methods that produce high true positives and low false positives

are ideal for medical settings. ese allow healthcare providers to have

a higher degree of condence in the diagnoses provided to patients

[17,18]. Given a small dataset, which is typical of biomedical datasets,

feature Bayes helps to maximize the predictive accuracy that could

benet the medical expert in future patient evaluations [19,20] (Table 6).

Conclusion

e analysis of the vertebral column data allowed us to incorporate

feature engineering, feature selection, and model evaluation

techniques. Given these new methods, we were able to provide a more

accurate way of classifying pathologies. e feature Bayes method

proved to be valuable by obtaining higher true positives and lower

false positives than traditional or more current methods such as revo

SVM. is makes it a useful method as a biomedical screening tool to

aide healthcare providers with their medical decisions. Further studies

should be developed surrounding the analysis of the feature Bayes

method. Moreover, a comparison of ensemble learning techniques

using feature Bayes could prove benecial.

References

1. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local

causal and markov blanket induction for causal discovery and feature selection

for classication part i: Algorithms and empirical evaluation. The Journal of

Machine Learning Research 11: 171-234.

2. da Rocha Neto AR, Sousa R, Barreto GDA, Cardoso JS (2011) Diagnostic

of pathology on the vertebral column with embedded reject option. Pattern

Recognition and Image Analysis 6669: 588-595.

3. Berthonnaud E, Dimnet J, Roussouly P, Labelle H (2005) Analysis of the

sagittal balance of the spine and pelvis using shape and orientation parameters.

Journal of spinal disorders & techniques 18: 40-47.

4. Aghagolzadeh M, Soltanian-Zadeh H, Araabi B, Aghagolzadeh A (2007) A

hierarchical clustering based on mutual information maximization. Image

Processing 1: I 277- I 280.

5. Nguyen XV, Chan J, Romano S, Bailey J (2014) Effective global approaches

for mutual information based feature selection. In Proceedings of the 20th ACM

SIGKDD international conference on Knowledge discovery and data mining,

ACM.

6. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic

TP Rate FP Rate Precision Recall F-Measure ROC

Area Class

0.855 0.115 0.883 0.855 0.869 0.935 Abnormal

0.85 0.145 0.857 0.885 0.871 0.935 Normal

Weighted

Avg. 0.87 0.13 0.87 0.87 0.87 0.935

Table 4: Detailed accuracy by class (40% Train).

TP

Rate FP Rate Precision Recall F-Measure ROC

Area Class

0.894 0.029 0.977 0.894 0.933 0.985 Abnormal

0.971 0.106 0.872 0.971 0.919 0.985 Normal

Weighted

Avg. 0.927 0.062 0.932 0.927 0.927 0.985

Table 5: Detailed accuracy by class (80% Train)

Training Size Method Accuracy

40%

SVM (linear) 85

SVM (KMOD) 83.9

rejoSVM (wr=0.04) 96.5

Naïve Bayes (6-original data) 87.7

Naïve Bayes (60-transformed data) 81.8

Feature Bayes 93.5

80%

SVM (linear) 84.3

SVM (KMOD) 85.9

rejoSVM (wr=0.04) 96.9

Naïve Bayes (6-original data) 81.5

Naïve Bayes (60-transformed data) 77.2

Feature Bayes 98.5

Table 6: Comparison of the performance of different methods.

Figure 3: Naïve Bayes Classier.

Mining 4: 114. doi:10.4172/2090-4924.1000114

Page 7 of 7

Volume 4 • Issue 1 • 1000114

Biomedical Data Mining

ISSN: 2090-4924 JBDM, an open access journal

minority over-sampling technique. Journal of articial intelligence research 16:

321-357.

7. Hall MA (1999) Correlation-based feature selection for machine learning.

CiteSeerx

5M.

8. Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC,

informedness, markedness and correlation. CiteSeerx5M.

9. Zhang H (2004) The optimality of naive Bayes. AA 1: 3.

10. Alba E, García-Nieto J, Jourdan L, Talbi EG (2007) Gene selection in cancer

classication using PSO/SVM and GA/SVM hybrid algorithms. Evolutionary

Computation.

11. Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, et

al. (2015) Application of high-dimensional feature selection: evaluation for

genomic prediction in man. Scientic reports 5: 10312.

12. Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood

maximisation: a unifying framework for information theoretic feature selection.

The Journal of Machine Learning Research 13: 27-66.

13. Hand DJ, Yu K (2001) Idiot's Bayes—not so stupid after all? International

statistical review 69: 385-398.

14. Jordan A (2001) On discriminative vs. generative classiers: A comparison of

logistic regression and naive bayes. Advances in neural information processing

systems 14: 841-848.

15. López FG, Torres MG, Batista BM, Pérez JAM, Moreno-Vega JM (2006)

Solving feature subset selection problem by a parallel scatter search. European

Journal of Operational Research 169: 477-489.

16. Murty MN, Devi VS (2011) Pattern recognition: An algorithmic approach.

17. Neto ARR, Barreto GA (2009) On the application of ensembles of classiers to

the diagnosis of pathologies of the vertebral column: A comparative analysis.

Latin America Transactions, IEEE (Revista IEEE America Latina) 7: 487-496.

18. Rennie JD, Shih L, Teevan, J, Karger DR (2003) Tackling the poor assumptions

of naive bayes text classiers. ICML 3: 616-623.

19. Rish I (2001) An empirical study of the naive Bayes classier. In IJCAI 2001

workshop on empirical methods in articial intelligence, IBM, New York.

20. Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High dimensional

feature selection by feature-wise kernelized lasso. Neural computation 26: 185-

207.