Vladimir Vapnik's research while affiliated with Columbia University and other places

Publications (119)

Article
The paper is devoted to two problems: (1) reinforcement of SVM algorithms, and (2) justification of memorization mechanisms for generalization. (1) Current SVM algorithm was designed for the case when the risk for the set of nonnegative slack variables is defined by l1 norm. In this paper, along with that classical l1 norm, we consider risks define...
Article
Full-text available
This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to constru...
Article
Full-text available
The paper considers general machine learning models, where knowledge transfer is positioned as the main method to improve their convergence properties. Previous research was focused on mechanisms of knowledge transfer in the context of SVM framework; the paper shows that this mechanism is applicable to neural network framework as well. The paper de...
Article
This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem. As an example of scores of such monotonic rules we consider here scores of SVM classifiers. In order to construct the optimal s...
Conference Paper
The paper considers several topics on learning with privileged information: (1) general machine learning models, where privileged information is positioned as the main mechanism to improve their convergence properties, (2) existing and novel approaches to leverage that privileged information, (3) algorithmic realization of one of these (namely, kno...
Article
Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the...
Article
This paper presents direct settings and rigorous solutions of the main Statistical Inference problems. It shows that rigorous solutions require solving multidimensional Fredholm integral equations of the first kind in the situation where not only the right-hand side of the equation is an approximation, but the operator in the equation is also defin...
Article
This paper describes a new paradigm of machine learning, in which Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (for example, explanation) of this example. The paper describes two mechanisms tha...
Conference Paper
This paper introduces an advanced setting of machine learning problem in which an Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (explanation) of this example. The paper describes two mechanisms...
Conference Paper
This paper presents direct settings and rigorous solutions of Statistical Inference problems. It shows that rigorous solutions require solving ill-posed Fredholm integral equations of the first kind in the situation where not only the right-hand side of the equation is an approximation, but the operator in the equation is also defined approximately...
Article
We introduce a constructive setting for the problem of density ratio estimation through the solution of a multidimensional integral equation. In this equation, not only its right hand side is approximately known, but also the integral operator is approximately defined. We show that this ill-posed problem has a rigorous solution and obtain the solut...
Conference Paper
Radial basis function (RBF) kernels for SVM have been routinely used in a wide range of classification problems, delivering consistently good performance for those problems where the kernel computations are numerically feasible (high-dimensional problems typically use linear kernels). One of the drawbacks of RBF kernels is the necessity of selectin...
Article
We introduce a general constructive setting of the density ratio estimation problem as a solution of a (multidimensional) integral equation. In this equation, not only its right hand side is known approximately, but also the integral operator is defined approximately. We show that this ill-posed problem has a rigorous solution and obtain the soluti...
Article
We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives no...
Conference Paper
Full-text available
Recently Vapnik et al. [11, 12, 13] introduced a new learning model, called Learning Using Privileged Information (LUPI). In this model, along with standard training data, the teacher supplies the student with additional (privileged) information. In the optimistic case, the LUPI model can improve the bound for the probability of test error from O(1...
Article
Full-text available
In Learning Using Privileged Information (LUPI) paradigm, along with the stan-dard training data in the decision space, a teacher supplies a learner with the priv-ileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimizati...
Conference Paper
In Learning Using Privileged Information (LUPI) paradigm, along with the standard training data in the decision space, a teacher supplies a learner with the privileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimization...
Article
In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM(gamma)+ method) to implement algorithms which address the L...
Article
Full-text available
We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Popper’s work from the p...
Conference Paper
In this paper we consider a new paradigm of learning: learning using hidden information. The classical paradigm of the supervised learning is to learn a decision rule from labeled data (xi, yi), xi isin X, yi isin {-1, 1}, i = 1, hellip, lscr. In this paper we consider a new setting: given training vectors in space X along with labels and descripti...
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high gener...
Conference Paper
We focus on distribution-free transductive learning. In this setting the learning algorithm is given a ‘full sample’ of unlabeled points. Then, a training sample is selected uniformly at random from the full sample and the labels of the training points are revealed. The goal is to predict the labels of the remaining unlabeled points as accurately a...
Article
Full-text available
We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the hypothesis space. The resulting algorithm is defined via a non-convex optimization problem that can s...
Conference Paper
Full-text available
We study classification tasks where one is given a set of labeled examples, and a set of "non-examples" of meaningful concepts in the same domain that do not belong to either class (refered to as the universum). We describe an algorithmic approach to leverage universum points and show experimentally that inference based on the labeled data and the...
Chapter
Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an e insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are consi...
Article
Full-text available
We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is imp...
Conference Paper
Full-text available
We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are c...
Article
Full-text available
Contents 0 A Roadmap 6 0.1 How to read this Thesis . . . . . . . . . . . . . . . . . . . . . . . 6 0.2 A Short Review of Approximation and Regression Estimation . . 7 0.3 The Reason for Support Vectors . . . . . . . . . . . . . . . . . . . 8 1 Introduction 10 1.1 The Regression Problem . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 A Special...
Article
We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.
Article
Full-text available
A new regression technique based on concept of support vectors is introduced. We compare support vector regression with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because...
Article
The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on...
Article
We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points...
Article
DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinc...
Conference Paper
We consider the learning problem of nding a dependency between a general class of objects and another, possibly dierent, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embed...
Article
Full-text available
We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points...
Article
Full-text available
The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractabl...
Article
Full-text available
We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to define a sphere enclosing them. The boundary of the sphere forms in data space a set of closed contours containing the data. Data points enclosed by each contour are define...
Article
Full-text available
Introduction Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeo between overtting and undertting. Previous classical results for linear regression are based on an asymptotical analysis. We present a new penalization method for performing mo...
Article
This paper is an extended version of [12]. Generic author design sample pages 2000/07/31 03:05
Article
We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions. 1 INTRODUCTION When we are trying to extract regularities from data...
Article
This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclass...
Article
Full-text available
This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclass...
Article
We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly o...
Article
We introduce an algorithm for estimating the values of a function at a set of test points x `+1 ; : : : ; x `+m given a set of training points (x 1 ; y 1 ); : : : ; (x ` ; y ` ) without estimating (as an intermediate step) the regression function. We demonstrate that this direct (transductive) way for estimating values of the regression (or classic...
Article
New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of paramete...
Article
Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that Support Vector Machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the fo...
Article
Full-text available
This paper compare the performance of classifier algorithms on a stan- dard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassificati...
Article
This paper compares the performance of classifier algorithms on a standard database of handwritten digits. We consider not raw accuracy, but rejection, training time, recognition time, and memory requirements. "Comparison of Leaning for Handwritten Digit Recognition", International Conference on Neural F. and P. Cie Publishers, 1995 Y. Le L. Bottou...
Conference Paper
The problem of estimating density, conditional probability, and conditional density is considered as an ill-posed problem of solving integral equations. To solve these equations the support vector method (SVM) is used
Article
. Developed only recently, support vector learning machines achieve high generalization ability by minimizing a bound on the expected test error; however, so far there existed no way of adding knowledge about invariances of a classification problem at hand. We present a method of incorporating prior knowledge about transformation invariances by app...
Article
The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for eeciently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classiier to perform a linear separation in some high-dimensional space which...
Conference Paper
Full-text available
We introduce a method of feature selection for Support Vector Machines.
Article
Full-text available
We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and...
Conference Paper
We present a novel kernel method for data clustering using a description of the data by support vectors. The kernel reflects a projection of the data points from data space to a high dimensional feature space. Cluster boundaries are defined as spheres in feature space, which represent complex geometric shapes in data space. We utilize this geometri...
Article
Full-text available
This paper compares the performance of several classi#er algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also rejection, training time, recognition time, and memory requirements. 1 COMPARISON OF LEARNING ALGORITHMS FOR HANDWRITTEN DIGIT RECOGNITION Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes,...
Article
Full-text available
> (x); (1.1) 1. `(x) = ( 1; x ? 0 0; otherwise Generic author design sample pages 1999/07/12 15:50 2 Support Vector Density Estimation where instead of knowing the distribution function F (x) we are given the iid (independently and identically distributed) data x 1 ; : : : ; x ` (1.2) generated by F . The problem of density estimation is known to b...
Article
A Support Vector Machine (SVM) algorithm for multivariate density estimation is developed based on regularization principles and bounds on the convergence of empirical distribution functions. The algorithm is compared to Gaussian Mixture Models (GMMs). Our algorithm outperforms GMMs for data drawn from mixtures of gaussians in IR 2 and IR 6 . Our a...
Article
Full-text available
Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVAD uses ideas from ANOVA decomposition methods and...
Article
Full-text available
this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation [4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with...
Chapter
ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...
Chapter
ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...
Chapter
ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...
Article
Full-text available
A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality...
Article
this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation [4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with...
Article
Full-text available
We address the problem of determining what size test set guarantees statistically significant results in a character recognition task, as a function of the expected error rate. We provide a statistical analysis showing that if, for example, the expected character error rate is around 1 percent, then, with a test set of at least 10,000 statistically...
Article
Full-text available
The support vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights, and threshold that minimize an upper bound on the e...
Conference Paper
. Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are c...
Article
Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even very high-order polynomial classifiers can be trained with a small amount of training data and yet generalize better than classifiers with a smaller...
Article
We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both on-line and off-line algorithms are proposed and experimentally checked on databases of handwrit...
Conference Paper
Full-text available
Two view-based object recognition algorithms are compared: (1) a heuristic algorithm based on oriented filters, and (2) a support vector learning machine trained on low-resolution images of the objects. Classification performance is assessed using a high number of images generated by a computer graphics system under precisely controlled conditions....
Conference Paper
Full-text available
A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality...
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high gener...
Conference Paper
Full-text available
This paper compares the performance of classifier algorithmson a standard database of handwritten digits. We consider not rawaccuracy, but rejection, training time, recognition time, and memoryrequirements."Comparison of Leaning for Handwritten Digit Recognition", International Conference onNeural F. and P. Cie Publishers, 1995Y. Le L. Bottou, C. J...
Conference Paper
We report a novel possibility for extracting asmall subset of a data base which contains allthe information necessary to solve a given classificationtask: using the Support Vector Algorithmto train three different types of handwrittendigit classifiers, we observed that these typesof classifiers construct their decision surface fromstrongly overlapp...
Conference Paper
In an optical character recognition problem, we compare (as a function of training set size) the performance of three neural network based ensemble methods (two versions of boosting and a committee of neural networks trained independently) to that of a single network. In boosting, the number of patterns actually used for training is a subset of all...
Conference Paper
We compare the performance of three types of neural network-based ensemble techniques to that of a single neural network. The ensemble algorithms are two versions of boosting and committees of neural networks trained independently. For each of the four algorithms, we experimentally determine the test and training error curves in an optical characte...