Multivariate Chebyshev Inequalities

The Annals of Mathematical Statistics 01/1960; DOI: 10.1214/aoms/1177705673
Source: OAI

ABSTRACT If $X$ is a random variable with $EX^2 = \sigma^2$, then by Chebyshev's inequality, \begin{equation*}\tag{1.1}P\{|X| \geqq \epsilon\} \leqq \sigma^2/\epsilon^2.\end{equation*} If in addition $EX = 0$, one obtains a corresponding one-sided inequality \begin{equation*}\tag{1.2}\quad P\{X \geqq \epsilon\} \leqq \sigma^2/ (\epsilon^2 + \sigma^2)\end{equation*} (see, e.g., [8] p. 198). In each case a distribution for $X$ is known that results in equality, so that the bounds are sharp. By a change of variable we can take $\epsilon = 1$. There are many possible multivariate extensions of (1.1) and (1.2). Those providing bounds for $P\{\max_{1 \leqq j \leqq k} |X_j| \geqq 1\}$ and $P\{|\max_{1 \leqq j \leqq k} X_j \geqq 1\}$ have been investigated in [3, 5, 9] and [4], respectively. We consider here various inequalities involving (i) the minimum component or (ii) the product of the components of a random vector. Derivations and proofs of sharpness for these two classes of extensions show remarkable similarities. Some of each type occur as special cases of a general theorem in Section 3. Bounds are given under various assumptions concerning variances, covariances and independence. Notation. We denote the vector $(1, \cdots, 1)$ by $e$ and $(0, \cdots, 0)$ by 0; the dimensionality will be clear from the context. If $x = (x_1, \cdots, x_k)$ and $y = (y_1, \cdots, y_k)$, we write $x \geqq y(x > y)$ to mean $x_j \geqq y_j(x_j > y_j), j = 1, 2, \cdots, k$. If $\Sigma = (\sigma_{ij}): k \times k$ is a moment matrix, for convenience we write $\sigma_{jj} = \sigma^2_j, j = 1, \cdots, k$. Unless otherwise stated, we assume that $\Sigma$ is positive definite.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We show how to compute lower bounds for the supremum Bayes error if the class-conditional distributions must satisfy moment constraints, where the supremum is with respect to the unknown class-conditional distributions. Our approach makes use of Curto and Fialkow's solutions for the truncated moment problem. The lower bound shows that the popular Gaussian assumption is not robust in this regard. We also construct an upper bound for the supremum Bayes error by constraining the decision boundary to be linear.
    IEEE Transactions on Information Theory 01/2012; 58(6):3606-3612. · 2.62 Impact Factor
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.
    Optimization and Engineering 01/2013; 14(2). · 0.83 Impact Factor