Vladimir Vapnik’s research while affiliated with Columbia University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (119)


Constructive setting for problems of density ratio estimation: Constructive Setting for Problems of Density Ratio Estimation
  • Article

April 2015

·

32 Reads

·

8 Citations

Statistical Analysis and Data Mining

Vladimir Vapnik

·

·

Rauf Izmailov

We introduce a constructive setting for the problem of density ratio estimation through the solution of a multidimensional integral equation. In this equation, not only its right hand side is approximately known, but also the integral operator is approximately defined. We show that this ill-posed problem has a rigorous solution and obtain the solution in a closed form. The key element of this solution is the novel V-matrix, which captures the geometry of the observed samples. We compare our method with previously proposed ones, using both synthetic and real data. Our experimental results demonstrate the good potential of the new approach.


SuportVectroRegressionMachines
  • Data
  • File available

September 2014

·

1,104 Reads

·

Chris J. C

·

·

[...]

·

Vladimir Vapnik
Download

Multidimensional splines with infinite number of knots as SVM kernels

August 2013

·

95 Reads

·

19 Citations

Radial basis function (RBF) kernels for SVM have been routinely used in a wide range of classification problems, delivering consistently good performance for those problems where the kernel computations are numerically feasible (high-dimensional problems typically use linear kernels). One of the drawbacks of RBF kernels is the necessity of selecting the proper value of the hyperparameter γ in addition to the standard SVM penalty parameter C — this process can lead to overfitting. Another (more obscure) drawback of RBF is its inherent non-optimality as an approximation function. In order to address these issues, we propose to extend the concept of polynomial splines (designed explicitly for approximation purposes) to multidimensional normalized splines with infinite number of knots and use the resulting hyperparameter-free kernel SVMs instead of RBF kernel SVMs. We tested our approach for a number of standard classification datasets used in the literature. The results suggest that new kernels deliver mostly better classification performance than RBF kernel (for problems of moderately large dimensions), but allow faster computation (if measured on large cross-validation grids), with less chance of overfitting.


Constructive Setting of the Density Ratio Estimation Problem and its Rigorous Solution

June 2013

·

27 Reads

·

12 Citations

We introduce a general constructive setting of the density ratio estimation problem as a solution of a (multidimensional) integral equation. In this equation, not only its right hand side is known approximately, but also the integral operator is defined approximately. We show that this ill-posed problem has a rigorous solution and obtain the solution in a closed form. The key element of this solution is the novel V-matrix, which captures the geometry of the observed samples. We compare our method with three well-known previously proposed ones. Our experimental results demonstrate the good potential of the new approach.


Learning by Transduction

January 2013

·

154 Reads

·

224 Citations

We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed.



SMO-Style Algorithms for Learning Using Privileged Information.

January 2010

·

830 Reads

·

55 Citations

Recently Vapnik et al. [11, 12, 13] introduced a new learning model, called Learning Using Privileged Information (LUPI). In this model, along with standard training data, the teacher supplies the student with additional (privileged) information. In the optimistic case, the LUPI model can improve the bound for the probability of test error from O(1 / √ n) to O(1/n), where n is the number of training examples. Since semi-supervised learning model with n labeled and N unlabeled examples can only achieve the bound O(1 / √ n + N) in the optimistic case, the LUPI model can thus significantly outperform it. To implement LUPI model, Vapnik et al. [11, 12, 13] suggested to use an SVM-type algorithm called SVM+, which requires, however, to solve a more difficult optimization problem than the one that is traditionally used to solve SVM. In this paper we develop two new algorithms for solving the optimization problem of SVM+. Our algorithms have the structure similar to the empirically successful SMO algorithm for solving SVM. Our experiments show that in terms of the generalization error/running time tradeoff, one of our algorithms is superior over the widely used interior point optimizer.


On the Theory of Learning with Privileged Information (Full version)

January 2010

·

233 Reads

·

7 Citations

In Learning Using Privileged Information (LUPI) paradigm, along with the stan-dard training data in the decision space, a teacher supplies a learner with the priv-ileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimization algorithm, called Privileged ERM, that takes into ac-count the privileged information in order to find a good function in the decision space. We outline the conditions on the correcting space that, if satisfied, allow Privileged ERM to have much faster learning rate in the decision space than the one of the regular empirical risk minimization.


On the Theory of Learning with Privileged Information

January 2010

·

119 Reads

·

82 Citations

In Learning Using Privileged Information (LUPI) paradigm, along with the standard training data in the decision space, a teacher supplies a learner with the privileged information in the correcting space. The goal of the learner is to find a classifier with a low generalization error in the decision space. We consider an empirical risk minimization algorithm, called Privileged ERM, that takes into account the privileged information in order to find a good function in the decision space. We outline the conditions on the correcting space that, if satisfied, allow Privileged ERM to have much faster learning rate in the decision space than the one of the regular empirical risk minimization. 1


A new learning paradigm: Learning using privileged information

July 2009

·

944 Reads

·

745 Citations

Neural Networks

In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM(gamma)+ method) to implement algorithms which address the LUHI paradigm (Vapnik, 1982-2006, Sections 2.4.2 and 2.5.3 of the Afterword). See also (Vapnik, Vashist, & Pavlovitch, 2008, 2009) for further development of the algorithms. In contrast to the existing machine learning paradigm where a teacher does not play an important role, the advanced learning paradigm considers some elements of human teaching. In the new paradigm along with examples, a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on. This paper discusses details of the new paradigm and corresponding algorithms, introduces some new algorithms, considers several specific forms of privileged information, demonstrates superiority of the new learning paradigm over the classical learning paradigm when solving practical problems, and discusses general questions related to the new ideas.


Citations (90)


... 3) Support Vector Machine Algorithm:: Classification and regression problems are resolved using one of the most well-known supervised learning algorithms, Support Vector Machine, or SVM [23]. But the ML Classification problem makes extensive use of it. ...

Reference:

A Novel Hybrid Deep Learning Model for Detecting Breast Cancer
Reinforced SVM Method and Memorization Mechanisms
  • Citing Article
  • May 2021

Pattern Recognition

... Vapnik et al. propose a machine learning paradigm that reframes a problem from machine learning as a problem of estimating the conditional probability function as opposed to the problem of searching for the function that minimizes a given loss functional. This paradigm accounts for the relationship among elements of the data set and is therefore associated with data quality Vapnik and Izmailov [2019]. ...

Rethinking statistical learning theory: learning using statistical invariants

Machine Learning

... SVMs are a family of machine learning algorithms developed by Vladimir Vapnik [10], used to solve classification, regression and anomaly detection problems. They aim to separate data into classes using a boundary or hyperplane, while maximizing the distance between the different groups of data and the separating boundary. ...

Knowledge transfer in SVM and neural networks

Annals of Mathematics and Artificial Intelligence

... Obviously, the commonly used kernels such as the Gaussian kernel and polynomial kernel, designed for the vector-form data, can not be directly adopted for matrix-type data. To this end, B. Schölkopf et al. [23] suggested constructing a locality-improved kernel via the general polynomial kernel and local correlations, which results in an incomplete polynomial kernel, see (5.1) for its specific formula. In the same spirit, neighborhood kernels were proposed by V. L. Brailovsky et al. [5] and a histogram intersection kernel to image classification was introduced by A. Barla et al. [2]. ...

Prior knowledge in support vector kernels
  • Citing Article
  • January 1997

... Other approaches to time series prediction include support vector regression. Müller et al. [68] used support vector regression (SVR) for time series forecasting on benchmark problems. Lau et al. [69] implemented SVR for Sunspot time series forecasting with better results than the radial basis function network in relatively long-term prediction. ...

Predicting Time Series with Support Vector Machines
  • Citing Article
  • January 1999

... In this manner, a search for linear relations in the feature space is conducted, which can then determine efficient solutions to nonlinear problems. SVM has been widely applied in the field of pattern recognition and has been applied to such problems as text recognition [16], handwritten numeral recognition [17], face detection [18], system control [19], and many other related applications. e accuracy of SVM classification is highly affected by the kernel function and its parameters since the relationship between the parameters and model classification accuracy in a multimodal function is irregular. ...

Discovering informative patterns and data cleaning
  • Citing Article
  • January 1996

... Related Work: In recent work, Liu et al. [13] proposed a direct change estimator for graphical models based on the ratio of the probability density of the two models [9,10,25,26,31]. They focused on the special case of L 1 norm, i.e., δθ * ∈ R p 2 is sparse, and provided non-asymptotic error bounds for the estimator along with a sample complexity of n 1 = O(s 2 log p) and n 2 = O(n 2 1 ) for an unbounded density ratio model, where s is the number of the changed edges with p being the number of variables. ...

Statistical Inference Problems and Their Rigorous Solutions
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... It means that we could first train a model that predicts missing (privileged) features x * k by the original x k and then replace x * k with their predictions in the decision model trained on both x k and x * k . However, Vapnik and Izmailov [10] developed another approach, which based the LUPI paradigm on a mechanism of knowledge transfer from the space of teacher's explanations to the space of students's decisions. The authors have illustrated it for well known SVM classifier (Boser et al. [11], Cortes and Vapnik [12]). ...

Learning with Intelligent Teacher: Similarity Control and Knowledge Transfer
  • Citing Conference Paper
  • April 2015

Lecture Notes in Computer Science

... This research utilized the scikit-learn library by calling "from sklearn.svm import SVC" command to assign each kernels into our model, as seen in Table 1 [25]. Afterward, we add kernel as parameters in the SVC function, with its default setting. ...

Support-vector networks
  • Citing Article
  • January 2009

Chemical Biology & Drug Design

... In general, fitting a nonlinear parametric model to a time series is a complex task since there is a wide possible set of nonlinear patterns. However, technological advancements have allowed researchers to consider more flexible modeling techniques, such as support vector machines (SVMs) adapted to regression [13], artificial neural networks (ANNs), and wavelet methods [14]. ...

Support vector regression machines

Advances in Neural Information Processing Systems