# SVM results - worse when standardizing data than without

I have been testing a SVM binary classification on EEG data set. We use some features of 240 components, which are already defined in the interval 0-1. With them I obtain an AUC of approximately 80% (for a reasonable ROC).

When I standardize the data (mean 0 and var 1) the performance decreases to approx AUC 50% (with a ROC more or less equal to random classification). I found this effect of standardization very strange, since all texts advise to standardize data for improving performance.

Any clues? Has anybody observed similar effects?

Thanks for your help.

When I standardize the data (mean 0 and var 1) the performance decreases to approx AUC 50% (with a ROC more or less equal to random classification). I found this effect of standardization very strange, since all texts advise to standardize data for improving performance.

Any clues? Has anybody observed similar effects?

Thanks for your help.

## All Answers (16)

Xinwang Liu· National University of Defense TechnologyMatthias Scheller Lichtenauer· X-rite, Regensdorfsee for instance paragraph 3.2 of:

http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

And why do you use AUC as a criterion to evaluate your classifier? AUC is used in signal detection theory to assess observers with different criteria, but in SVM classification, correct prediction of classification on hold out data (accuracy) would be, IMHO, more appropriate, since SVM does not change its criterion, unless you would use an online learning algorithm, changing with time.

Aureli Soria-Frisch· Starlab Barcelona SLAureli Soria-Frisch· Starlab Barcelona SLAccuracy only won't do, since you are only looking to one type of error (but you have two). I can not follow your statement: "in signal detection theory to assess observers with different criteria".

Aureli Soria-Frisch· Starlab Barcelona SLMatthias Scheller Lichtenauer· X-rite, RegensdorfConcerning criteria: consider a situation in which medical doctors look at xrays and mammographies and have to decide, whether the respectiove patient has cancer or not (I know reality is a bit more complicated). If the observer assumes, that the therapy is easy and not invasive, she may be tempted to decide on 'there is a cancer' in case of doubt, while a person assuming the cure might be as bad as the illness will be more conservative. This attitude is called criterion. With each criterion, a different point in the space of false positive/(true positive+true negative) will be achieved. TSD and AUC are used to find an average performance of all these test persons, respectively to separate sensitivity from criterion.

(see for example Swets, 1973, The ROC in Psychology, in:Science)

If I get your post right, you have a set of EEG's and want to classify them in two classes. So, I do not see how SVM would yield more than just one point of your ROC curve. And this point can usually be defined by accuracy: false positives divided by sum of correct positives and correct negatives

Aureli Soria-Frisch· Starlab Barcelona SLYou perfectly describe the problem of just taking accuracy into account: if you have a classifier that says always the input is from positive class you will get 100% accuracy. You are getting 100% of false positives, but you do not take them into account by just measuring accuracy.

Matthias Scheller Lichtenauer· X-rite, RegensdorfEric David Moyer· Wright State UniversityOne possibility is that some of your features are almost constant except for a small noise-driven variation. This noise would then be amplified greatly by the normalization.

Oscar Luaces· University of OviedoCarlo Gatta· Autonomous University of BarcelonaLambert Schomaker· University of GroningenAureli Soria-Frisch· Starlab Barcelona SLIsrael Vaughn· The University of ArizonaAUC is probably the best general metric to assess performance, because theoretically the ideal observer (the log likelihood ratio) will give a maximum of the AUC, and no other observer, including SVM, can result in a higher AUC. The log likelihood ratio assumes that the probability distributions functions are, however, known, if they are not then the log likelihood ratio cannot be computed.

AUC does not take into account specific requirements, for example if you were more interested in reasonable probability of detection (true postive rate) but at very low false positive rates, AUC sort of maximizes the entire curve, and may not weight the low false positive rates highly enough. In this case, you would need to come up with a different performance metric.

Peshawa Jammal Muhammad Ali· Koya UniversityRead the link

Can you help by adding an answer?