Multivariate Chebyshev Inequalities

The Annals of Mathematical Statistics 12/1960; DOI: 10.1214/aoms/1177705673
Source: OAI

ABSTRACT If $X$ is a random variable with $EX^2 = \sigma^2$, then by Chebyshev's inequality, \begin{equation*}\tag{1.1}P\{|X| \geqq \epsilon\} \leqq \sigma^2/\epsilon^2.\end{equation*} If in addition $EX = 0$, one obtains a corresponding one-sided inequality \begin{equation*}\tag{1.2}\quad P\{X \geqq \epsilon\} \leqq \sigma^2/ (\epsilon^2 + \sigma^2)\end{equation*} (see, e.g., [8] p. 198). In each case a distribution for $X$ is known that results in equality, so that the bounds are sharp. By a change of variable we can take $\epsilon = 1$. There are many possible multivariate extensions of (1.1) and (1.2). Those providing bounds for $P\{\max_{1 \leqq j \leqq k} |X_j| \geqq 1\}$ and $P\{|\max_{1 \leqq j \leqq k} X_j \geqq 1\}$ have been investigated in [3, 5, 9] and [4], respectively. We consider here various inequalities involving (i) the minimum component or (ii) the product of the components of a random vector. Derivations and proofs of sharpness for these two classes of extensions show remarkable similarities. Some of each type occur as special cases of a general theorem in Section 3. Bounds are given under various assumptions concerning variances, covariances and independence. Notation. We denote the vector $(1, \cdots, 1)$ by $e$ and $(0, \cdots, 0)$ by 0; the dimensionality will be clear from the context. If $x = (x_1, \cdots, x_k)$ and $y = (y_1, \cdots, y_k)$, we write $x \geqq y(x > y)$ to mean $x_j \geqq y_j(x_j > y_j), j = 1, 2, \cdots, k$. If $\Sigma = (\sigma_{ij}): k \times k$ is a moment matrix, for convenience we write $\sigma_{jj} = \sigma^2_j, j = 1, \cdots, k$. Unless otherwise stated, we assume that $\Sigma$ is positive definite.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a maximum margin classifier that deals with uncertainty in data input. Specifically, we reformulate the SVM framework such that each input training entity is not solely a feature vector representation, but a multi-dimensional Gaussian distribution with given probability density, i.e., with a given mean and covariance matrix. The latter expresses the uncertainty. We arrive at a convex optimization problem, which is solved in the primal form using a gradient descent approach. The resulting classifier, which we name SVM with Gaussian Sample Uncertainty (SVM-GSU), is tested on synthetic data, as well as on the problem of event detection in video using the large-scale TRECVID MED 2014 dataset, and the problem of image classification using the MNIST dataset of handwritten digits. Experimental results verify the effectiveness of the proposed classifier.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a robust support vector regression (RSVR) method with uncertain input and output data is studied. First, the data uncertainties are investigated under a stochastic framework and two linear robust formulations are derived. Linear formulations robust to ellipsoidal uncertainties are also considered from a geometric perspective. Second, kernelized RSVR formulations are established for nonlinear regression problems. Both linear and nonlinear formulations are converted to second-order cone programming problems, which can be solved efficiently by the interior point method. Simulation demonstrates that the proposed method outperforms existing RSVRs in the presence of both input and output data uncertainties.
    IEEE transactions on neural networks and learning systems 11/2012; 23(11):1690-1700. DOI:10.1109/TNNLS.2012.2212456 · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel Second Order Cone Program- ming (SOCP) formulation for large scale binary classifica- tion tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering al- gorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint en- sures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the sate-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and syn- thetic datasets show that the proposed algorithm outper- forms SVM solvers in terms of training time and achieves similar accuracies.
    Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006; 01/2006


Available from