# Vladimir Vapnik's research while affiliated with Columbia University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (119)

The paper is devoted to two problems: (1) reinforcement of SVM algorithms, and (2) justification of memorization mechanisms for generalization.
(1) Current SVM algorithm was designed for the case when the risk for the set of nonnegative slack variables is defined by l1 norm. In this paper, along with that classical l1 norm, we consider risks define...

This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to constru...

The paper considers general machine learning models, where knowledge transfer is positioned as the main method to improve their convergence properties. Previous research was focused on mechanisms of knowledge transfer in the context of SVM framework; the paper shows that this mechanism is applicable to neural network framework as well. The paper de...

This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem. As an example of scores of such monotonic rules we consider here scores of SVM classifiers. In order to construct the optimal s...

The paper considers several topics on learning with privileged information: (1) general machine learning models, where privileged information is positioned as the main mechanism to improve their convergence properties, (2) existing and novel approaches to leverage that privileged information, (3) algorithmic realization of one of these (namely, kno...

Distillation (Hinton et al., 2015) and privileged information (Vapnik &
Izmailov, 2015) are two techniques that enable machines to learn from other
machines. This paper unifies these two techniques into generalized
distillation, a framework to learn from multiple machines and data
representations. We provide theoretical and causal insight about the...

This paper presents direct settings and rigorous solutions of the main Statistical Inference problems. It shows that rigorous solutions require solving multidimensional Fredholm integral equations of the first kind in the situation where not only the right-hand side of the equation is an approximation, but the operator in the equation is also defin...

This paper describes a new paradigm of machine learning, in which Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (for example, explanation) of this example. The paper describes two mechanisms tha...

This paper introduces an advanced setting of machine learning problem in which an Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (explanation) of this example. The paper describes two mechanisms...

This paper presents direct settings and rigorous solutions of Statistical Inference problems. It shows that rigorous solutions require solving ill-posed Fredholm integral equations of the first kind in the situation where not only the right-hand side of the equation is an approximation, but the operator in the equation is also defined approximately...

We introduce a constructive setting for the problem of density ratio estimation through the solution of a multidimensional integral equation. In this equation, not only its right hand side is approximately known, but also the integral operator is approximately defined. We show that this ill-posed problem has a rigorous solution and obtain the solut...

Radial basis function (RBF) kernels for SVM have been routinely used in a wide range of classification problems, delivering consistently good performance for those problems where the kernel computations are numerically feasible (high-dimensional problems typically use linear kernels). One of the drawbacks of RBF kernels is the necessity of selectin...

We introduce a general constructive setting of the density ratio estimation
problem as a solution of a (multidimensional) integral equation. In this
equation, not only its right hand side is known approximately, but also the
integral operator is defined approximately. We show that this ill-posed problem
has a rigorous solution and obtain the soluti...

We describe a method for predicting a classification of an object given
classifications of the objects in the training set, assuming that the pairs
object/classification are generated by an i.i.d. process from a continuous
probability distribution. Our method is a modification of Vapnik's
support-vector machine; its main novelty is that it gives no...

Recently Vapnik et al. [11, 12, 13] introduced a new learning model, called Learning Using Privileged Information (LUPI). In this model, along with standard training data, the teacher supplies the student with additional (privileged) information. In the optimistic case, the LUPI model can improve the bound for the probability of test error from O(1...

In Learning Using Privileged Information (LUPI) paradigm, along with the stan-dard training data in the decision space, a teacher supplies a learner with the priv-ileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimizati...

In Learning Using Privileged Information (LUPI) paradigm, along with the standard training data in the decision space, a teacher supplies a learner with the privileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimization...

In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM(gamma)+ method) to implement algorithms which address the L...

We compare Karl Popper’s ideas concerning the falsifiability of a theory with similar notions from the part of statistical
learning theory known as VC-theory. Popper’s notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located
some divergences, we discuss how best to view Popper’s work from the p...

In this paper we consider a new paradigm of learning: learning using hidden information. The classical paradigm of the supervised learning is to learn a decision rule from labeled data (xi, yi), xi isin X, yi isin {-1, 1}, i = 1, hellip, lscr. In this paper we consider a new setting: given training vectors in space X along with labels and descripti...

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high gener...

We focus on distribution-free transductive learning. In this setting the learning algorithm is given a ‘full sample’ of unlabeled points. Then, a training sample is selected uniformly at random from the full sample and the labels of the training points are revealed. The goal is to predict the labels of the remaining unlabeled points as accurately a...

We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according
to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the
hypothesis space. The resulting algorithm is defined via a non-convex optimization problem that can s...

We study classification tasks where one is given a set of labeled examples, and a set of "non-examples" of meaningful concepts in the same domain that do not belong to either class (refered to as the universum). We describe an algorithmic approach to leverage universum points and show experimentally that inference based on the labeled data and the...

Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an e insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are consi...

We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is imp...

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are c...

Contents 0 A Roadmap 6 0.1 How to read this Thesis . . . . . . . . . . . . . . . . . . . . . . . 6 0.2 A Short Review of Approximation and Regression Estimation . . 7 0.3 The Reason for Support Vectors . . . . . . . . . . . . . . . . . . . 8 1 Introduction 10 1.1 The Regression Problem . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 A Special...

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.

A new regression technique based on concept of support vectors is introduced. We compare support vector regression with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because...

The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on...

We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points...

DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinc...

We consider the learning problem of nding a dependency between a general class of objects and another, possibly dierent, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embed...

We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points...

The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractabl...

We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to define a sphere enclosing them. The boundary of the sphere forms in data space a set of closed contours containing the data. Data points enclosed by each contour are define...

Introduction Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeo between overtting and undertting. Previous classical results for linear regression are based on an asymptotical analysis. We present a new penalization method for performing mo...

This paper is an extended version of [12]. Generic author design sample pages 2000/07/31 03:05

We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions. 1 INTRODUCTION When we are trying to extract regularities from data...

This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclass...

This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclass...

We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly o...

We introduce an algorithm for estimating the values of a function at a set of test points x `+1 ; : : : ; x `+m given a set of training points (x 1 ; y 1 ); : : : ; (x ` ; y ` ) without estimating (as an intermediate step) the regression function. We demonstrate that this direct (transductive) way for estimating values of the regression (or classic...

New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of paramete...

Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that Support Vector Machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the fo...

This paper compare the performance of classifier algorithms on a stan- dard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassificati...

This paper compares the performance of classifier algorithms on a standard database of handwritten digits. We consider not raw accuracy, but rejection, training time, recognition time, and memory requirements. "Comparison of Leaning for Handwritten Digit Recognition", International Conference on Neural F. and P. Cie Publishers, 1995 Y. Le L. Bottou...

The problem of estimating density, conditional probability, and
conditional density is considered as an ill-posed problem of solving
integral equations. To solve these equations the support vector method
(SVM) is used

. Developed only recently, support vector learning machines achieve high generalization ability by minimizing a bound on the expected test error; however, so far there existed no way of adding knowledge about invariances of a classification problem at hand. We present a method of incorporating prior knowledge about transformation invariances by app...

The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for eeciently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classiier to perform a linear separation in some high-dimensional space which...

We introduce a method of feature selection for Support Vector Machines.

We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and...

We present a novel kernel method for data clustering using a
description of the data by support vectors. The kernel reflects a
projection of the data points from data space to a high dimensional
feature space. Cluster boundaries are defined as spheres in feature
space, which represent complex geometric shapes in data space. We
utilize this geometri...

This paper compares the performance of several classi#er algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also rejection, training time, recognition time, and memory requirements. 1 COMPARISON OF LEARNING ALGORITHMS FOR HANDWRITTEN DIGIT RECOGNITION Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes,...

> (x); (1.1) 1. `(x) = ( 1; x ? 0 0; otherwise Generic author design sample pages 1999/07/12 15:50 2 Support Vector Density Estimation where instead of knowing the distribution function F (x) we are given the iid (independently and identically distributed) data x 1 ; : : : ; x ` (1.2) generated by F . The problem of density estimation is known to b...

A Support Vector Machine (SVM) algorithm for multivariate density estimation is developed based on regularization principles and bounds on the convergence of empirical distribution functions. The algorithm is compared to Gaussian Mixture Models (GMMs). Our algorithm outperforms GMMs for data drawn from mixtures of gaussians in IR 2 and IR 6 . Our a...

Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVAD uses ideas from ANOVA decomposition methods and...

this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation [4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with...

ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...

ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...

ContributorsPeter Bartlett, Kristin P. Bennett, Christopher J.C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kreßel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexande...

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality...

this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation [4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with...

We address the problem of determining what size test set
guarantees statistically significant results in a character recognition
task, as a function of the expected error rate. We provide a statistical
analysis showing that if, for example, the expected character error rate
is around 1 percent, then, with a test set of at least 10,000
statistically...

The support vector (SV) machine is a novel type of learning
machine, based on statistical learning theory, which contains polynomial
classifiers, neural networks, and radial basis function (RBF) networks
as special cases. In the RBF case, the SV algorithm automatically
determines centers, weights, and threshold that minimize an upper bound
on the e...

. Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are c...

Large VC-dimension classifiers can learn difficult tasks, but are usually impractical because they generalize well only if they are trained with huge quantities of data. In this paper we show that even very high-order polynomial classifiers can be trained with a small amount of training data and yet generalize better than classifiers with a smaller...

We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both on-line and off-line algorithms are proposed and experimentally checked on databases of handwrit...

Two view-based object recognition algorithms are compared: (1) a heuristic algorithm based on oriented filters, and (2) a support vector learning machine trained on low-resolution images of the objects. Classification performance is assessed using a high number of images generated by a computer graphics system under precisely controlled conditions....

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality...

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high gener...

This paper compares the performance of classifier algorithmson a standard database of handwritten digits. We consider not rawaccuracy, but rejection, training time, recognition time, and memoryrequirements."Comparison of Leaning for Handwritten Digit Recognition", International Conference onNeural F. and P. Cie Publishers, 1995Y. Le L. Bottou, C. J...

We report a novel possibility for extracting asmall subset of a data base which contains allthe information necessary to solve a given classificationtask: using the Support Vector Algorithmto train three different types of handwrittendigit classifiers, we observed that these typesof classifiers construct their decision surface fromstrongly overlapp...

In an optical character recognition problem, we compare (as a function of training set size) the performance of three neural network based ensemble methods (two versions of boosting and a committee of neural networks trained independently) to that of a single network. In boosting, the number of patterns actually used for training is a subset of all...

We compare the performance of three types of neural network-based ensemble techniques to that of a single neural network. The ensemble algorithms are two versions of boosting and committees of neural networks trained independently. For each of the four algorithms, we experimentally determine the test and training error curves in an optical characte...