Conference Paper

Asymmetric-margin support vector machines for lung tissue classification.

DOI: 10.1109/IJCNN.2010.5596346 Conference: International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain, 18-23 July, 2010
Source: DBLP

ABSTRACT This paper concerns lung tissue classification using asymmetric-margin support vector machine (ASVM) to handle the imbalance of the positive and negative classes in a one-against-all multiclass classification problem. The hyperparameters of the algorithm are obtained using an optimization of the upper bound of the leave-one-out error of the ASVM. The ASVM is applied on the dataset with its original distribution and oversampled so that the ratio of the examples is equal to the prevalence of patients having the tissue in the database. The two versions of the ASVM models were compared with a model build with a conventional SVM. The ASVM improved the results obtained with a conventional SVM. The incorporation of prior knowledge concerning the prevalence of the patients improved the results obtained with ASVM.

0 Bookmarks
 · 
70 Views
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper extends the utility of support vector machines—a recent innovation in AI being adopted for cancer diagnostics—by analytically modelling the impact of imperfect class labelling of training data. It uses ROC computations to study the SVM's ability to classify new examples correctly even in the presence of misclassified data in training examples. It uses DOE to reveal that misclassifications present in training data affect training quality, and hence performance, albeit not as strongly as the SVM's primary design parameters. Still, our results give strong support for one's striving to develop the best trained SVM that is intended to be utilized, for instance, for medical diagnostics, for misclassified training data shrink decision boundary distance, and increasing generalization error. Further, this study affirms that to be effective, the SVM design optimization objective should incorporate real life costs or consequences of classifying wrongly.
    NMIMS Management Review. 01/2013; XXIII(Oct - Nov):67 - 90.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper extends the utility of asymmetric soft margin support vector machines—by analytically modeling imperfect class labeling in the training data. It uses ROC computations to first establish the strong relationship between the SVM's performance and its ability to classify examples correctly, even in the presence of misclassified training examples. It uses DOE to reveal that misclassification also affects training quality, and hence performance, though not as strongly. Still, our results give strong support for one's striving to develop the best trained SVM that is intended to be utilized, for instance, for medical diagnostics, as misclassifications shrink decision boundary distance and increase generalization error. I. TWO-CLASS CLASSIFICATION AND THE SVM Clinical diagnostics has always depended on the clinician's ability to diagnose pathologies based on the observation of symptoms exhibited by the patient and then classifying his/her condition. Correct diagnosis can make the difference between life and death in the correct and timely intervention, be it hypertension, diabetes, or the various types of malignancy. Similar situations arise where the precise link between cause and effect is not yet established and one is predestined to process a certain amount of data to draw inferences to guide decisions. For hypertension, for example, attempts have been recently made to probe the situation beyond the measurement of systolic/diastolic blood pressures—one finds such studies attempting to predict the occurrence of hypertension based on observations of age, sex, family history, smoking habits, lipoprotein, triglyceride, uric acid, total cholesterol and body mass index, etc. In many such situations the treatment given is based on a binary classification—the ailment is present, or it is not [1]. Classification is challenging not only in respect to acquiring the relevant data through tests about factors known to be associated with the pathology, but also the data analytics adopted to lead to reliable and correct prediction. This present paper looks into one such data analysis technique, now about 20 years in use and known as support vector machine or SVM, that helps one to develop classification models based on statistical principles of learning. Like artificial neural networks, an SVM is data driven—it is trained using a dataset of examples with known class (label), and then utilized to predict the class of new examples. How well an SVM works is measured by the accuracy with which it can predict the class of unseen examples (examples not included in training the SVM). The tone set in the present work is to move beyond the simple notion of "accuracy"—the conventional classifier performance measure—by incorporating analytical modeling of correct/incorrect classification of instances in the training sample. This has not yet been done. We focus specifically on the effect on ROC of imperfect labeling of input data. Support vector machine is an algorithmic approach proposed by Vapnik and his colleagues [2] to the issue of classification of instances (for example, patients who may or may not have diabetes) and it falls in the broader context of supervised learning in artificial intelligence (AI) in computer science. It begins with a set of data comprising feature vectors of instances {x i }, and a class tag or label {y i } attached to each of those instances. The most common application of SVM aims at training a model that learns from those instances, and estimates the model parameters. Subsequently that model is used to predict the class of an instance for which only feature values are available and one is interested in finding its class (y) label, with high degree of correctness. By an elaborate procedure of optimization, an SVM is designed so as to display minimum classification error for unseen instances, an attribute measured by its "generalization error." Binary classification is its most common use.
    POMS 2014; 01/2014

Full-text

View
15 Downloads
Available from
May 26, 2014