Controlling the Sensitivity of Support Vector Machines

Proceedings of International Joint Conference Artificial Intelligence 06/1999;
Source: CiteSeer


For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In this paper we discuss two schemes for adjusting the sensitivity and specificity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on real-life medical diagnostic tasks. 1 Introduction. Since their introduction by Vapnik and coworkers [ Vapnik, 1995; Cortes and Vapnik, 1995 ] , Support Vector Machines (SVMs) have been successfully applied to a number of real world problems such as handwritten character and digit recognition [ Scholkopf, 1997; Cortes, 1995; LeCun et al., 1995; Vapnik, 1995 ] , face detection [ Osuna et al., 1997 ] and speaker identification [ Schmidt, 1996 ] . They e...

112 Reads
  • Source
    • "Here, we apply SBIC on real datasets and compare its performance with Cost-sensitive Support Vector Machine (CSSVM) (Veropoulos et al., 1999) and SMOTE (Chawla et al., 2002). CSSVM is an SVM algorithm designed for imbalanced classification, where the formulation tries a more strict classification for the minority points by assigning a higher penalty to their mis-classification in the training period. "
    [Show abstract] [Hide abstract]
    ABSTRACT: When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the training data into account, SBIC utilizes the concept of absent data, i.e. data from the minority class which can help better find the boundary between the two classes. SBIC simultaneously optimizes the weights of the empirical similarity function and finds the locations of absent data points. As such, SBIC uses an embedded mechanism for synthetic data generation which does not modify the training dataset, but alters the algorithm to suit imbalanced datasets. Therefore, SBIC uses the ideas of both major schools of thoughts in imbalanced classification: Like cost-sensitive approaches SBIC operates on an algorithm level to handle imbalanced structures; and similar to synthetic data generation approaches, it utilizes the properties of unobserved data points from the minority class. The application of SBIC to imbalanced datasets suggests it is comparable to, and in some cases outperforms, other commonly used classification techniques for imbalanced datasets.
    • "The underrepresentation of honeycombing (HC) examples was identified as a potential limitation in the training dataset (only 42 of 1798 VOIs). In order to account for this problem, we compared several different imbalanced data learning approaches, including weighted SVM costs [20], the synthetic minority oversampling technique (SMOTE) [21], SMOTE with different costs [22], and granular SVM – repetitive undersampling [23]. Of these approaches, the SMOTE with different costs method demonstrated the best performance when evaluated on the training dataset, so this is the method that we adopted for all of our experiments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Lack of classifier robustness is a barrier to widespread adoption of computer-aided diagnosis systems for computed tomography (CT). We propose a novel Robustness-Driven Feature Selection (RDFS) algorithm that preferentially selects features robust to variations in CT technical factors. We evaluated RDFS in CT classification of fibrotic interstitial lung disease using 3D texture features. CTs were collected for 99 adult subjects separated into three datasets: training, multi-reconstruction, testing. Two thoracic radiologists provided cubic volumes of interest corresponding to six classes: pulmonary fibrosis, ground-glass opacity, honeycombing, normal lung parenchyma, airway, vessel. The multi-reconstruction dataset consisted of CT raw sinogram data reconstructed by systematically varying slice thickness, reconstruction kernel, and tube current (using a synthetic reduced-tube-current algorithm). Two support vector machine classifiers were created, one using RDFS ("with-RDFS") and one not ("without-RDFS"). Classifier robustness was compared on the multi-reconstruction dataset, using Cohen's kappa to assess classification agreement against a reference reconstruction. Classifier performance was compared on the testing dataset using the extended g-mean (EGM) measure. With-RDFS exhibited superior robustness (kappa 0.899-0.989) compared to without-RDFS (kappa 0.827-0.968). Both classifiers demonstrated similar performance on the testing dataset (EGM 0.778 for with-RDFS; 0.785 for without-RDFS), indicating that RDFS does not compromise classifier performance when discarding nonrobust features. RDFS is highly effective at improving classifier robustness against slice thickness, reconstruction kernel, and tube current without sacrificing performance, a result that has implications for multicenter clinical trials that rely on accurate and reproducible quantitative analysis of CT images collected under varied conditions across multiple sites, scanners, and timepoints.
    07/2015; DOI:10.1109/TMI.2015.2459064
  • Source
    • "These experiment conditions were adopted in related studies [7], [8]. Furthermore, we equalize the number of musical pieces of each class per each subject by excluding some musical pieces randomly in order to prevent imbalance problem [22]–[24]. Thus, N music , which is the total number of musical pieces used in our experiment, is a little different according to each user, and the detail about this is shown in TABLE III. Moreover, the task implemented by each subject in our experiment is shown in Fig. 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a human-centered method for favorite music estimation using EEG-based audio features. In order to estimate user's favorite musical pieces, our method utilizes his/her EEG signals for calculating new audio features suitable for representing the user's music preference. Specifically, projection, which transforms original audio features into the features reflecting the preference, is calculated by applying kernel Canonical Correlation Analysis (CCA) to the audio features and the EEG features which are extracted from the user's EEG signals during listening to favorite musical pieces. By using the obtained projection, the new EEG-based audio features can be derived since this projection provides the best correlation between the user's EEG signals and their corresponding audio signals. Thus, successful estimation of user's favorite musical pieces via a Support Vector Machine (SVM) classifier using the new audio features becomes feasible. Since our method does not need acquisition of EEG signals for obtaining new audio features from new musical pieces after calculating the projection, this indicates the high practicability of our method. Experimental results show that our method outperforms methods using original audio features or EEG features.
    2015 IEEE International Conference on Digital Signal Processing, Singapore; 07/2015
Show more