Vladimir Vapnik’s research while affiliated with Columbia University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (119)


Figure 1: Data set contains 183 points. A Gaussian kernel was used with ½¼. SVs are surrounded by small circles. (a): Õ ½ (b): Õ ¾¼ (c): Õ ¾¾ (d): Õ . 
A Support Vector Method for Clustering
  • Article
  • Full-text available

February 2001

·

416 Reads

·

104 Citations

·

·

·

Vladimir Vapnik

We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to define a sphere enclosing them. The boundary of the sphere forms in data space a set of closed contours containing the data. Data points enclosed by each contour are defined as a cluster. As the width parameter of the Gaussian kernel is decreased, these contours fit the data more tightly and splitting of contours occurs. The algorithm works by separating clusters according to valleys in the underlying probability distribution, and thus clusters can take on arbitrary geometrical shapes. As in other SV algorithms, outliers can be dealt with by introducing a soft margin constant leading to smoother cluster boundaries. The structure of the data is explored by varying the two parameters. We investigate the dependence of our method on these parameters and apply it to several data sets. 1 Introduction C...

Download


Table 1. Median of the ratio of the test error to the best model for the 12 experiments reported in figures 3 and 4. The last row is an average over the 12 experiments. 
Figure 2. Prediction of the generalization error for the following methods: DEE, CV5 (left), FPE, ADJ (right).  
Figure 3. Approximation ratios for the sinc function. Numerical results can be found in Tables 1 and 2, each letter corresponding to the same experiment.  
Figure 4. Approximation ratios for the step function. Numerical results can be found in Tables 1 and 2, each letter corresponding to the same experiment.  
Model Selection for Small Sample Regression

December 2000

·

745 Reads

·

143 Citations

Machine Learning

Introduction Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeo between overtting and undertting. Previous classical results for linear regression are based on an asymptotical analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on accurate estimate of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix. 2 Risk of the mean square estimor Given a collection of data (x 1 ; y 1 ); :::; (x n ; yn ), where y i = f(x i ; 0 ) + i and x i ,<


Bounds on Error Expectation for SVM

September 2000

·

177 Reads

·

38 Citations

The book provides an overview of recent developments in large margin classifiers, examines connections with other methods (e.g., Bayesian inference), and identifies strengths and weaknesses of the method, as well as directions for future research. The concept of large margins is a unifying principle for the analysis of many different approaches to the classification of data from examples, including boosting, mathematical programming, neural networks, and support vector machines. The fact that it is the margin, or confidence level, of a classification—that is, a scale parameter—rather than a raw training error that matters has become a key tool for dealing with classifiers. This book shows how this idea applies to both the theoretical analysis and the design of algorithms. The book provides an overview of recent developments in large margin classifiers, examines connections with other methods (e.g., Bayesian inference), and identifies strengths and weaknesses of the method, as well as directions for future research. Among the contributors are Manfred Opper, Vladimir Vapnik, and Grace Wahba. Bradford Books imprint



Prior Knowledge in Support Vector Kernels

July 2000

·

67 Reads

·

94 Citations

Advances in Neural Information Processing Systems

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions. 1 INTRODUCTION When we are trying to extract regularities from data, we often have additional knowledge about functions that we estimate. For instance, in image classification tasks, there exist transformations which leave class membership invariant (e.g. local translations); moreover, it is usually the case that images have a local structure in that not all correlations between image regions carry equal amounts of information. The present study investigates the question how to make use of these two sources of knowledge by designing appropriate Support Vector (SV) kernel functions. We start by giving a brief introduction to SV machines (Vapnik & Chervonenkis, 1979; Vapnik, 1995) (Sec. 2). Regarding pri...


Yann LeCun, L. D. Jackel, L'eon Bottou*, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon, Urs A. Muller, Eduard Sackinger, Patrice Simard, and Vladimir Vapnik

July 2000

·

140 Reads

This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassification rates less than a given threshold. 1. Introduction Great strides havebeenachieved in pattern recognition in recentyears. Particularly striking results have been attained in the area of handwritten digit recognition. This rapid progress has resulted from a combination of a number of developments including the proliferation of powerful, inexpensive computers, the invention of new algorithms that takeadvantage of these computers, and the availability of large databases of characters that can be used for training and testing. AtAT&T Bell Laboratories wehave developed a suite of classifier algorithms. In this paper we contrast the ...


Figure 1: Architecture of LeNet 1. Each plane represents a feaure map, i.e. a set of units whose weights are constrained to be identical. Input images are sized to t in a 16 x 16 pixel eld, but enough blank pixels are added around the border of this eld to avoid edge eeects in the convolution calculations.
Figure 2: error rate on the test set (%). The uncertainty in the quoted error rates is about 0.1%.
Figure 3: Percent of test patterns rejected to achieve 0.5% error on the remaining test examples for some of the systems. 0.5 2 2.5 60 10 15 30
Learning Algorithms For Classification: A Comparison On Handwritten Digit Recognition

July 2000

·

2,524 Reads

·

234 Citations

This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassification rates less than a given threshold. 1. Introduction Great strides have been achieved in pattern recognition in recent years. Particularly striking results have been attained in the area of handwritten digit recognition. This rapid progress has resulted from a combination of a number of developments including the proliferation of powerful, inexpensive computers, the invention of new algorithms that take advantage of these computers, and the availability of large databases of characters that can be used for training and testing. At AT&T Bell Laboratories we have developed a suite of classifier algorithms. In this paper we contras...


Extracting Support Data for a Given Task

June 2000

·

166 Reads

·

335 Citations

We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly overlapping small ( 4%) subsets of the data base. This finding opens up the possibility of compressing data bases significantly by disposing of the data which is not important for the solution of a given task. In addition, we show that the theory allows us to predict the classifier that will have the best generalization ability, based solely on performance on the training set and characteristics of the learning machines. This finding is important for cases where the amount of available data is limited. Introduction in: U. M. Fayyad and R. Uthurusamy (eds.): Proceedings, First International Conference on Knowledge Discovery ...


Model Selection for Support Vector Machines

April 2000

·

161 Reads

·

117 Citations

New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.


Citations (90)


... 3) Support Vector Machine Algorithm:: Classification and regression problems are resolved using one of the most well-known supervised learning algorithms, Support Vector Machine, or SVM [23]. But the ML Classification problem makes extensive use of it. ...

Reference:

A Novel Hybrid Deep Learning Model for Detecting Breast Cancer
Reinforced SVM Method and Memorization Mechanisms
  • Citing Article
  • May 2021

Pattern Recognition

... Hence, high-complexity architecture designs do not ensure sufficient explanatory and predictive capacities [86,87]. Accordingly, the statistical learning theory claims that the prevalence of learning with low complexity models should be preferred [88][89][90][91]. Therefore, this research aimed to encounter the simplest competent model to adequately predict the geotechnical soil properties (under consideration). ...

Rethinking statistical learning theory: learning using statistical invariants

Machine Learning

... SVMs are a family of machine learning algorithms developed by Vladimir Vapnik [10], used to solve classification, regression and anomaly detection problems. They aim to separate data into classes using a boundary or hyperplane, while maximizing the distance between the different groups of data and the separating boundary. ...

Knowledge transfer in SVM and neural networks

Annals of Mathematics and Artificial Intelligence

... Obviously, the commonly used kernels such as the Gaussian kernel and polynomial kernel, designed for the vector-form data, can not be directly adopted for matrix-type data. To this end, B. Schölkopf et al. [23] suggested constructing a locality-improved kernel via the general polynomial kernel and local correlations, which results in an incomplete polynomial kernel, see (5.1) for its specific formula. In the same spirit, neighborhood kernels were proposed by V. L. Brailovsky et al. [5] and a histogram intersection kernel to image classification was introduced by A. Barla et al. [2]. ...

Prior knowledge in support vector kernels
  • Citing Article
  • January 1997

... Other approaches to time series prediction include support vector regression. Müller et al. [68] used support vector regression (SVR) for time series forecasting on benchmark problems. Lau et al. [69] implemented SVR for Sunspot time series forecasting with better results than the radial basis function network in relatively long-term prediction. ...

Predicting Time Series with Support Vector Machines
  • Citing Article
  • January 1999

... In this manner, a search for linear relations in the feature space is conducted, which can then determine efficient solutions to nonlinear problems. SVM has been widely applied in the field of pattern recognition and has been applied to such problems as text recognition [16], handwritten numeral recognition [17], face detection [18], system control [19], and many other related applications. e accuracy of SVM classification is highly affected by the kernel function and its parameters since the relationship between the parameters and model classification accuracy in a multimodal function is irregular. ...

Discovering informative patterns and data cleaning
  • Citing Article
  • January 1996

... Related Work: In recent work, Liu et al. [13] proposed a direct change estimator for graphical models based on the ratio of the probability density of the two models [9,10,25,26,31]. They focused on the special case of L 1 norm, i.e., δθ * ∈ R p 2 is sparse, and provided non-asymptotic error bounds for the estimator along with a sample complexity of n 1 = O(s 2 log p) and n 2 = O(n 2 1 ) for an unbounded density ratio model, where s is the number of the changed edges with p being the number of variables. ...

Statistical Inference Problems and Their Rigorous Solutions
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... It means that we could first train a model that predicts missing (privileged) features x * k by the original x k and then replace x * k with their predictions in the decision model trained on both x k and x * k . However, Vapnik and Izmailov [10] developed another approach, which based the LUPI paradigm on a mechanism of knowledge transfer from the space of teacher's explanations to the space of students's decisions. The authors have illustrated it for well known SVM classifier (Boser et al. [11], Cortes and Vapnik [12]). ...

Learning with Intelligent Teacher: Similarity Control and Knowledge Transfer
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... This research utilized the scikit-learn library by calling "from sklearn.svm import SVC" command to assign each kernels into our model, as seen in Table 1 [25]. Afterward, we add kernel as parameters in the SVC function, with its default setting. ...

Support-vector networks
  • Citing Article
  • January 2009

Chemical Biology & Drug Design

... In general, fitting a nonlinear parametric model to a time series is a complex task since there is a wide possible set of nonlinear patterns. However, technological advancements have allowed researchers to consider more flexible modeling techniques, such as support vector machines (SVMs) adapted to regression [13], artificial neural networks (ANNs), and wavelet methods [14]. ...

Support vector regression machines

Advances in Neural Information Processing Systems