Vladimir Vapnik’s research while affiliated with Columbia University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (119)


A new learning paradigm: Learning using privileged information
  • Article

July 2009

·

944 Reads

·

738 Citations

Neural Networks

Vladimir Vapnik

·

In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM(gamma)+ method) to implement algorithms which address the LUHI paradigm (Vapnik, 1982-2006, Sections 2.4.2 and 2.5.3 of the Afterword). See also (Vapnik, Vashist, & Pavlovitch, 2008, 2009) for further development of the algorithms. In contrast to the existing machine learning paradigm where a teacher does not play an important role, the advanced learning paradigm considers some elements of human teaching. In the new paradigm along with examples, a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on. This paper discusses details of the new paradigm and corresponding algorithms, introduces some new algorithms, considers several specific forms of privileged information, demonstrates superiority of the new learning paradigm over the classical learning paradigm when solving practical problems, and discusses general questions related to the new ideas.


Learning using hidden information (Learning with teacher)

June 2009

·

53 Reads

·

44 Citations

In this paper we consider a new paradigm of learning: learning using hidden information. The classical paradigm of the supervised learning is to learn a decision rule from labeled data (xi, yi), xi isin X, yi isin {-1, 1}, i = 1, hellip, lscr. In this paper we consider a new setting: given training vectors in space X along with labels and description of this data in another space X*, find in space X a decision rule better than the one found in the classical paradigm.



Support-vector networks

January 2009

·

3,260 Reads

·

24,935 Citations

Chemical Biology & Drug Design

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.


Large Margin vs. Large Volume in Transductive Learning

September 2008

·

8 Reads

·

1 Citation

Lecture Notes in Computer Science

We focus on distribution-free transductive learning. In this setting the learning algorithm is given a ‘full sample’ of unlabeled points. Then, a training sample is selected uniformly at random from the full sample and the labels of the training points are revealed. The goal is to predict the labels of the remaining unlabeled points as accurately as possible. The full sample partitions the transductive hypothesis space into a finite number of equivalence classes. All hypotheses in the same equivalence class, generate the same dichotomy of the full sample. We consider a large volume principle, whereby the priority of each equivalence class is proportional to its “volume” in the hypothesis space.


Table 1. Results for three dataset collections. 
Fig. 1. Large-margin vs. large-volume prior.  
Fig. 2. Visualization of hypothesis space: (a) equivalence classes have different volumes. (b) equivalence classes have the same volume.  
Table 2. UCI datasets taken from [2]. 
Fig. 3. Structure of the function f (ρ). k is the index of the smallest eigenvalue λi such that d i = 0.  
Large margin vs. large volume in transductive learning
  • Article
  • Full-text available

September 2008

·

516 Reads

·

20 Citations

Machine Learning

We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the hypothesis space. The resulting algorithm is defined via a non-convex optimization problem that can still be solved exactly and efficiently. We provide a bound on the test error of the algorithm and compare it to transductive SVM (TSVM) using 31 datasets.

Download

Table 1 . The performance of SVM compared to U-SVM for different amount of training data and different types of Universum data on MNIST 5 vs 8.
Figure 2. Part of a typical scanned sheet used to compile the ABCDETC dataset. The fourth row tells the subject what to write in the five rows below it.
Inference with the Universum

June 2006

·

446 Reads

·

199 Citations

Jason Weston

·

Ronan Collobert

·

·

[...]

·

Vladimir Vapnik

We study classification tasks where one is given a set of labeled examples, and a set of "non-examples" of meaningful concepts in the same domain that do not belong to either class (refered to as the universum). We describe an algorithmic approach to leverage universum points and show experimentally that inference based on the labeled data and the universum can improve over using the labeled data alone, at least in the small sample case. Finally, we list some conjectures describing how and why the Universum helps, and experimentally attempt to test each hypothesis.


Predicting Time Series with Support Vector Machines

April 2006

·

355 Reads

·

781 Citations

Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an e insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are considered: data from (a) a noisy (normal and uniform noise) Mackey Glass equation and (b) the Santa Fe competition (set D). In both cases Support Vector Machines show an excellent performance. In case (b) the Support Vector approach improves the best known result on the benchmark by a factor of 29%.



Multivariate Density Estimation: An SVM Approach

October 2004

·

151 Reads

·

5 Citations

We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in R2R^2 and R12R^{12}. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.


Citations (90)


... 3) Support Vector Machine Algorithm:: Classification and regression problems are resolved using one of the most well-known supervised learning algorithms, Support Vector Machine, or SVM [23]. But the ML Classification problem makes extensive use of it. ...

Reference:

A Novel Hybrid Deep Learning Model for Detecting Breast Cancer
Reinforced SVM Method and Memorization Mechanisms
  • Citing Article
  • May 2021

Pattern Recognition

... Hence, high-complexity architecture designs do not ensure sufficient explanatory and predictive capacities [86,87]. Accordingly, the statistical learning theory claims that the prevalence of learning with low complexity models should be preferred [88][89][90][91]. Therefore, this research aimed to encounter the simplest competent model to adequately predict the geotechnical soil properties (under consideration). ...

Rethinking statistical learning theory: learning using statistical invariants

Machine Learning

... SVMs are a family of machine learning algorithms developed by Vladimir Vapnik [10], used to solve classification, regression and anomaly detection problems. They aim to separate data into classes using a boundary or hyperplane, while maximizing the distance between the different groups of data and the separating boundary. ...

Knowledge transfer in SVM and neural networks

Annals of Mathematics and Artificial Intelligence

... Obviously, the commonly used kernels such as the Gaussian kernel and polynomial kernel, designed for the vector-form data, can not be directly adopted for matrix-type data. To this end, B. Schölkopf et al. [23] suggested constructing a locality-improved kernel via the general polynomial kernel and local correlations, which results in an incomplete polynomial kernel, see (5.1) for its specific formula. In the same spirit, neighborhood kernels were proposed by V. L. Brailovsky et al. [5] and a histogram intersection kernel to image classification was introduced by A. Barla et al. [2]. ...

Prior knowledge in support vector kernels
  • Citing Article
  • January 1997

... Other approaches to time series prediction include support vector regression. Müller et al. [68] used support vector regression (SVR) for time series forecasting on benchmark problems. Lau et al. [69] implemented SVR for Sunspot time series forecasting with better results than the radial basis function network in relatively long-term prediction. ...

Predicting Time Series with Support Vector Machines
  • Citing Article
  • January 1999

... In this manner, a search for linear relations in the feature space is conducted, which can then determine efficient solutions to nonlinear problems. SVM has been widely applied in the field of pattern recognition and has been applied to such problems as text recognition [16], handwritten numeral recognition [17], face detection [18], system control [19], and many other related applications. e accuracy of SVM classification is highly affected by the kernel function and its parameters since the relationship between the parameters and model classification accuracy in a multimodal function is irregular. ...

Discovering informative patterns and data cleaning
  • Citing Article
  • January 1996

... Related Work: In recent work, Liu et al. [13] proposed a direct change estimator for graphical models based on the ratio of the probability density of the two models [9,10,25,26,31]. They focused on the special case of L 1 norm, i.e., δθ * ∈ R p 2 is sparse, and provided non-asymptotic error bounds for the estimator along with a sample complexity of n 1 = O(s 2 log p) and n 2 = O(n 2 1 ) for an unbounded density ratio model, where s is the number of the changed edges with p being the number of variables. ...

Statistical Inference Problems and Their Rigorous Solutions
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... It means that we could first train a model that predicts missing (privileged) features x * k by the original x k and then replace x * k with their predictions in the decision model trained on both x k and x * k . However, Vapnik and Izmailov [10] developed another approach, which based the LUPI paradigm on a mechanism of knowledge transfer from the space of teacher's explanations to the space of students's decisions. The authors have illustrated it for well known SVM classifier (Boser et al. [11], Cortes and Vapnik [12]). ...

Learning with Intelligent Teacher: Similarity Control and Knowledge Transfer
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... This research utilized the scikit-learn library by calling "from sklearn.svm import SVC" command to assign each kernels into our model, as seen in Table 1 [25]. Afterward, we add kernel as parameters in the SVC function, with its default setting. ...

Support-vector networks
  • Citing Article
  • January 2009

Chemical Biology & Drug Design

... In general, fitting a nonlinear parametric model to a time series is a complex task since there is a wide possible set of nonlinear patterns. However, technological advancements have allowed researchers to consider more flexible modeling techniques, such as support vector machines (SVMs) adapted to regression [13], artificial neural networks (ANNs), and wavelet methods [14]. ...

Support vector regression machines

Advances in Neural Information Processing Systems