Controlling the Sensitivity of Support Vector Machines

Proceedings of International Joint Conference Artificial Intelligence 06/1999;
Source: CiteSeer


For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In this paper we discuss two schemes for adjusting the sensitivity and specificity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on real-life medical diagnostic tasks. 1 Introduction. Since their introduction by Vapnik and coworkers [ Vapnik, 1995; Cortes and Vapnik, 1995 ] , Support Vector Machines (SVMs) have been successfully applied to a number of real world problems such as handwritten character and digit recognition [ Scholkopf, 1997; Cortes, 1995; LeCun et al., 1995; Vapnik, 1995 ] , face detection [ Osuna et al., 1997 ] and speaker identification [ Schmidt, 1996 ] . They e...

108 Reads
  • Source
    • "Here, we apply SBIC on real datasets and compare its performance with Cost-sensitive Support Vector Machine (CSSVM) (Veropoulos et al., 1999) and SMOTE (Chawla et al., 2002). CSSVM is an SVM algorithm designed for imbalanced classification, where the formulation tries a more strict classification for the minority points by assigning a higher penalty to their mis-classification in the training period. "
    [Show abstract] [Hide abstract]
    ABSTRACT: When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the training data into account, SBIC utilizes the concept of absent data, i.e. data from the minority class which can help better find the boundary between the two classes. SBIC simultaneously optimizes the weights of the empirical similarity function and finds the locations of absent data points. As such, SBIC uses an embedded mechanism for synthetic data generation which does not modify the training dataset, but alters the algorithm to suit imbalanced datasets. Therefore, SBIC uses the ideas of both major schools of thoughts in imbalanced classification: Like cost-sensitive approaches SBIC operates on an algorithm level to handle imbalanced structures; and similar to synthetic data generation approaches, it utilizes the properties of unobserved data points from the minority class. The application of SBIC to imbalanced datasets suggests it is comparable to, and in some cases outperforms, other commonly used classification techniques for imbalanced datasets.
  • Source
    • "Many methods exist for imbalanced data classification, including sampling [13], ensembling [14], [15], and support vector machine adapting [16], [17]. Some recent reviews and monograph on imbalanced data classification are also available [18], [24], [25]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the \hbox{population}, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph \hbox{samples}. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.
    Cybernetics, IEEE Transactions on 04/2015; 45(5):940-954. DOI:10.1109/TCYB.2014.2341031 · 3.47 Impact Factor
  • Source
    • "A cost-sensitive extension of SVM, developed to cope with imbalanced data, is known as weighted SVM (WSVM) [10]. The main idea is to consider weighting scheme in learning such that the WSVM algorithm builds the decision hyperplane based on the relative contribution of data points in training. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In medical domain, data features often contain missing values. This can create serious bias in the predictive modeling. Typical standard data mining methods often produce poor performance measures. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. The proposed method is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
Show more