Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose a novel approach to concentration for non-independent random variables. The main idea is to “pretend” that the random variables are independent and pay a multiplicative price measuring how far they are from actually being independent. This price is encapsulated in the Hellinger integral between the joint and the product of the marginals, which is then upper bounded leveraging tensorisation properties. Our bounds represent a natural generalisation of concentration inequalities in the presence of dependence: we recover exactly the classical bounds (McDiarmid’s inequality) when the random variables are independent. Furthermore, in a “large deviations” regime, we obtain the same decay in the probability as for the independent case, even when the random variables display non-trivial dependencies. To show this, we consider a number of applications of interest. First, we provide a bound for Markov chains with finite state space. Then, we consider the Simple Symmetric Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting in which the stochastic process depends on its entire past. To conclude, we propose an application to Markov Chain Monte Carlo methods, where our approach leads to an improved lower bound on the minimum burn-in period required to reach a certain accuracy. In all of these settings, we provide a regime of parameters in which our bound fares better than what the state of the art can provide.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
We develop explicit, general bounds for the probability that the normalized partial sums of a function of a Markov chain on a general alphabet will exceed the steady-state mean of that function by a given amount. Our bounds combine simple information-theoretic ideas together with techniques from optimization and some fairly elementary tools from analysis. In one direction, we obtain a general bound for the important class of Doeblin chains; this bound is optimal, in the sense that in the special case of independent and identically distributed random variables it essentially reduces to the classical Hoeffding bound. In another direction, motivated by important problems in simulation, we develop a series of bounds in a form which is particularly suited to these problems, and which apply to the more general class of "geometrically ergodic" Markov chains.
Article
Full-text available
We present a new and simple approach to concentration inequalities in the context of dependent random processes and random fields. Our method is based on coupling and does not use information inequalities. In case one has a uniform control on the coupling, one obtains exponential concentration inequalities. If such a uniform control is no more possible, then one obtains polynomial or stretched-exponential concentration inequalities. Our abstract results apply to Gibbs random fields, both at high and low temperatures and in particular to the low-temperature Ising model which is a concrete example of non-uniformity of the coupling.
Article
Full-text available
In this paper we studynon-interactive correlation distillation (NICD), a generalization of noise sensitivity previously considered in [5, 31, 39]. We extend the model toNICD on trees. In this model there is a fixed undirected tree with players at some of the nodes. One node is given a uniformly random string and this string is distributed throughout the network, with the edges of the tree acting as independent binary symmetric channels. The goal of the players is to agree on a shared random bit without communicating. Our new contributions include the following: • In the case of ak-leaf star graph (the model considered in [31]), we resolve the open question of whether the success probability must go to zero ask » ∞. We show that this is indeed the case and provide matching upper and lower bounds on the asymptotically optimal rate (a slowly-decaying polynomial). • In the case of thek-vertex path graph, we show that it is always optimal for all players to use the same 1-bit function. • In the general case we show that all players should use monotone functions. We also show, somewhat surprisingly, that for certain trees it is better if not all players use the same function. Our techniques include the use of thereverse Bonami-Beckner inequality. Although the usual Bonami-Beckner has been frequently used before, its reverse counterpart seems not to be well known. To demonstrate its strength, we use it to prove a new isoperimetric inequality for the discrete cube and a new result on the mixing of short random walks on the cube. Another tool that we need is a tight bound on the probability that a Markov chain stays inside certain sets; we prove a new theorem generalizing and strengthening previous such bounds [2, 3, 6]. On the probabilistic side, we use the “reflection principle” and the FKG and related inequalities in order to study the problem on general trees.
Article
Full-text available
R\'enyi divergence is related to R\'enyi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by R\'enyi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the R\'enyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of R\'enyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of {\sigma}-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders), and present several other minimax results.
Conference Paper
Full-text available
We develop explicit, general bounds for the probability that the empirical sample averages of a func- tion of a Markov chain on a general alphabet will exceed the steady-state mean of that function by a given amount. Our bounds combine simple information-theoretic ideas together with techniques from optimization and some fairly elementary tools from analysis. In one direction, motivated by central problems in simulation, we de- velop bounds for the general class of "geometrically ergodic" Markov chains. These bounds take a form that is particularly suited to simulation problems, and they naturally lead to a new class of sampling criteria. These are illustrated by several examples. In another direction, we obtain a new bound for the important special class of Doeblin chains; this bound is optimal, in the sense that in the special case of independent and identically distributed random variables it essentially reduces to the classical Hoeffding bound.
Article
Full-text available
We prove the first Chernoff-Hoeffding bounds for general nonreversible finite-state Markov chains based on the standard L_1 (variation distance) mixing-time of the chain. Specifically, consider an ergodic Markov chain M and a weight function f: [n] -> [0,1] on the state space [n] of M with mean mu = E_{v <- pi}[f(v)], where pi is the stationary distribution of M. A t-step random walk (v_1,...,v_t) on M starting from the stationary distribution pi has expected total weight E[X] = mu t, where X = sum_{i=1}^t f(v_i). Let T be the L_1 mixing-time of M. We show that the probability of X deviating from its mean by a multiplicative factor of delta, i.e., Pr [ |X - mu t| >= delta mu t ], is at most exp(-Omega(delta^2 mu t / T)) for 0 <= delta <= 1, and exp(-Omega(delta mu t / T)) for delta > 1. In fact, the bounds hold even if the weight functions f_i's for i in [t] are distinct, provided that all of them have the same mean mu. We also obtain a simplified proof for the Chernoff-Hoeffding bounds based on the spectral expansion lambda of M, which is the square root of the second largest eigenvalue (in absolute value) of M tilde{M}, where tilde{M} is the time-reversal Markov chain of M. We show that the probability Pr [ |X - mu t| >= delta mu t ] is at most exp(-Omega(delta^2 (1-lambda) mu t)) for 0 <= delta <= 1, and exp(-Omega(delta (1-lambda) mu t)) for delta > 1. Both of our results extend to continuous-time Markov chains, and to the case where the walk starts from an arbitrary distribution x, at a price of a multiplicative factor depending on the distribution x in the concentration bounds.
Article
Full-text available
For a pair of random variables, (X,Y)(X, Y) on the space X×Y\mathscr{X} \times \mathscr{Y} and a positive constant, λ\lambda, it is an important problem of information theory to look for subsets A\mathscr{A} of X\mathscr{X} and B\mathscr{B} of Y\mathscr{Y} such that the conditional probability of Y being in B\mathscr{B} supposed X is in A\mathscr{A} is larger than λ\lambda. In many typical situations in order to satisfy this condition, B\mathscr{B} must be chosen much larger than A\mathscr{A}. We shall deal with the most frequently investigated case when X=(X1,,Xn),Y=(Y1,,Yn)X = (X_1,\cdots, X_n), Y = (Y_1,\cdots, Y_n) and (Xi,Yi)(X_i, Y_i) are independent, identically distributed pairs of random variables with a finite range. Suppose that the distribution of (X,Y)(X, Y) is positive for all pairs of values (x,y)(x, y). We show that if A\mathscr{A} and B\mathscr{B} satisfy the above condition with a constant λ\lambda and the probability of B\mathscr{B} goes to 0, then the probability of A\mathscr{A} goes even faster to 0. Generalizations and some exact estimates of the exponents of probabilities are given. Our methods reveal an interesting connection with a so-called hypercontraction phenomenon in theoretical physics.
Article
Full-text available
We prove concentration inequalities for some classes of Markov chains and Φ\Phi-mixing processes, with constants independent of the size of the sample, that extend the inequalities for product measures of Talagrand. The method is based on information inequalities put forwardby Marton in case of contracting Markov chains. Using a simple duality argument on entropy, our results also include the family of logarithmic Sobolev inequalities for convex functions. Applications to bounds on supremum of dependent empirical processes complete this work.
Article
This paper establishes Hoeffding's lemma and inequality for bounded functions of general-state-space and not necessarily reversible Markov chains. The sharpness of these results is characterized by the optimality of the ratio between variance proxies in the Markov-dependent and independent settings. The boundedness of functions is shown necessary for such results to hold in general. To showcase the usefulness of the new results, we apply them for non-asymptotic analyses of MCMC estimation, respondent-driven sampling and high-dimensional covariance matrix estimation on time series data with a Markovian nature. In addition to statistical problems, we also apply them to study the time-discounted rewards in econometric models and the multi-armed bandit problem with Markovian rewards arising from the field of machine learning.
Article
In this work, the probability of an event under some joint distribution is bounded by measuring it with the product of the marginals instead (which is typically easier to analyze) together with a measure of the dependence between the two random variables. These results find applications in adaptive data analysis, where multiple dependencies are introduced and in learning theory, where they can be employed to bound the generalization error of a learning algorithm. Bounds are given in terms of Sibson’s Mutual Information, α\alpha -Divergences, Hellinger Divergences, and f -Divergences. A case of particular interest is the Maximal Leakage (or Sibson’s Mutual Information of order infinity), since this measure is robust to post-processing and composes adaptively. The corresponding bound can be seen as a generalization of classical bounds, such as Hoeffding’s and McDiarmid’s inequalities, to the case of dependent random variables.
Book
This monograph presents a mathematical theory of concentration inequalities for functions of independent random variables. The basic phenomenon under investigation is that if a function of many independent random variables does not depend too much on any of them then it is concentrated around its expected value. This book offers a host of inequalities to quantify this statement. The authors describe the interplay between the probabilistic structure (independence) and a variety of tools ranging from functional inequalities, transportation arguments, to information theory. Applications to the study of empirical processes, random projections, random matrix theory, and threshold phenomena are presented. The book offers a self-contained introduction to concentration inequalities, including a survey of concentration of sums of independent random variables, variance bounds, the entropy method, and the transportation method. Deep connections with isoperimetric problems are revealed. Special attention is paid to applications to the supremum of empirical processes.
Book
Communication Systems and Information Theory. A Measure of Information. Coding for Discrete Sources. Discrete Memoryless Channels and Capacity. The Noisy-Channel Coding Theorem. Techniques for Coding and Decoding. Memoryless Channels with Discrete Time. Waveform Channels. Source Coding with a Fidelity Criterion. Index.
Article
We prove a version of McDiarmid?s bounded differences inequality for Markov chains, with constants proportional to the mixing time of the chain. We also show variance bounds and Bernstein-type inequalities for empirical averages of Markov chains. In the case of non-reversible chains, we introduce a new quantity called the ?pseudo spectral gap?, and show that it plays a similar role for non-reversible chains as the spectral gap plays for reversible chains. Our techniques for proving these results are based on a coupling construction of Katalin Marton, and on spectral techniques due to Pascal Lezaud. The pseudo spectral gap generalises the multiplicative reversiblication approach of Jim Fill.
Article
We prove that an irreducible aperiodic Markov chain is geometrically ergodic if and only if any separately bounded functional of the stationary chain satisfies an appropriate subgaussian deviation inequality from its mean.
Article
The noisiness of a channel can be measured by comparing suitable functionals of the input and output distributions. For instance, then the worst-case ratio of output relative entropy to input relative entropy is bounded from above by unity, by the data processing theorem. However, for a fixed reference input distribution, this quantity may be strictly smaller than one, giving so-called strong data processing inequalities (SDPIs). The same considerations apply to an arbitrary Φ\Phi-divergence. This paper presents a systematic study of optimal constants in SDPIs for discrete channels, including their variational characterizations, upper and lower bounds, structural results for channels on product probability spaces, and the relationship between SDPIs and so-called Φ\Phi-Sobolev inequalities (another class of inequalities that can be used to quantify the noisiness of a channel by controlling entropy-like functionals of the input distribution by suitable measures of input-output correlation). Several applications to information theory, discrete probability, and statistical physics are discussed.
Article
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of σ\sigma -algebras, and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
Conference Paper
We consider the following problem: Alice and Bob observe sequences Xn and Y n respectively where {(Xi, Yi)}i=1∞ are drawn i.i.d. from P(x, y), and they output U and V respectively which is required to have a joint law that is close in total variation to a specified Q(u, v). One important technique to establish impossibility results for this problem is the Hirschfeld-Gebelein-Rényi maximal correlation which was considered by Witsen-hausen [1]. Hypercontractivity studied by Ahlswede and Gács [2] and reverse hypercontractivity recently studied by Mossel et al. [3] provide another approach for proving impossibility results. We consider the tightest impossibility results that can be obtained using hypercontractivity and reverse hypercontractivity and provide a necessary and sufficient condition on the source distribution P(x, y) for when this approach subsumes the maximal correlation approach. We show that the binary pair source distribution with symmetric noise satisfies this condition.
Conference Paper
The noisiness of a channel can be measured by comparing suitable functionals of the input and output distributions. For instance, if we fix a reference input distribution, then the worst-case ratio of output relative entropy to input relative entropy for any other input distribution is bounded by one, by the data processing theorem. However, for a fixed reference input distribution, this quantity may be strictly smaller than one, giving so-called strong data processing inequalities (SDPIs). This paper shows that the problem of determining both the best constant in an SDPI and any input distributions that achieve it can be addressed using so-called logarithmic Sobolev inequalities, which relate input relative entropy to certain measures of input-output correlation. Another contribution is a proof of equivalence between SDPIs and a limiting case of certain strong data processing inequalities for the Rényi divergence.
Article
We formulate the measure concentration inequality, for Hamming distance and Talagrand's ``convex hull'' distance, in the case of dependent random variables, in a more general form than it was done in earlier papers cite{M5} and cite{S}. This makes it possible to obtain measure concentration inequalities for Gibbs states over a box of the u-dimensional lattice with fixed boundary condition, in the case when the Gibbs state satisfies a strong mixing condition which is in between the Dobrushin--Shlosman condition and its weakening in the sense of Olivieri, Picco and Martinelli. We also extend the use of the measure concentration inequality for alagrand's ``convex hull'' distance in the following direction. For rv's Z(Xˆn) satisfying an inequality Z(hatxˆn)Z(xˆn)lesumk=1ˆnalphai(hatxˆn),d(hatxi,xi), Z(hat xˆn)-Z(xˆn)le sum_{k=1}ˆnalpha_i(hat xˆn), d(hat x_i,x_i), we prove bounds for the momentum generating function of Z(Xˆn)bEZ(Xˆn)Z(Xˆn)-bE Z(Xˆn) in terms of the momentum generating function and the expectation of sumk=1ˆnalphaiˆ2(Xˆn)sum_{k=1}ˆnalpha_iˆ2(Xˆn).
Article
Let X={X i } i =−∞ ∞ be a stationary random process with a countable alphabet and distribution q. Let q ∞(·|x − k 0) denote the conditional distribution of X ∞=(X 1,X 2,…,X n ,…) given the k-length past: Write d(1,x 1)=0 if 1=x 1, and d(1,x 1)=1 otherwise. We say that the process X admits a joining with finite distance u if for any two past sequences − k 0=(− k +1,…,0) and x − k 0=(x − k +1,…,x 0), there is a joining of q ∞(·|− k 0) and q ∞(·|x − k 0), say dist(0 ∞,X 0 ∞|− k 0,x − k 0), such that The main result of this paper is the following inequality for processes that admit a joining with finite distance:
Article
We consider Markov chain with spectral gap in L2L^2 space. Assume that f is a bounded function. Then the probabilities of large deviations of average along trajectory satisfy Hoeffding's-type inequalities. These bounds depend only on the stationary mean, spectral gap and the end-points of support of f.
Article
A bound is given for a reversible Markov chain on the probability that the occupation measure of a set exceeds the stationary probability of the set by a positive quantity.
Article
There is a simple inequality by Pinsker between variational distance and informational divergence of probability measures defined on arbitrary probability spaces. We shall consider probability measures on sequences taken from countable alphabets, and derive, from Pinsker's inequality, bounds on the dˉ\bar{d}-distance by informational divergence. Such bounds can be used to prove the "concentration of measure" phenomenon for some nonproduct distributions.
Article
We build optimal exponential bounds for the probabilities of large deviations of sums ∑k=1nf(Xk) where (Xk) is a finite reversible Markov chain and f is an arbitrary bounded function. These bounds depend only on the stationary mean Eπf,{\mathbb {E}}_{\pi}f, the end-points of the support of f, the sample size n and the second largest eigenvalue λ of the transition matrix.
Article
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1993. Includes bibliographical references (p. 122-130). by David Gillman. Ph.D.
Article
The paper deals with the f-divergences of Csiszar generalizing the discrimination information of Kullback, the total variation distance, the Hellinger divergence, and the Pearson divergence. All basic properties of f-divergences including relations to the decision errors are proved in a new manner replacing the classical Jensen inequality by a new generalized Taylor expansion of convex functions. Some new properties are proved too, e.g., relations to the statistical sufficiency and deficiency. The generalized Taylor expansion also shows very easily that all f-divergences are average statistical informations (differences between prior and posterior Bayes errors) mutually differing only in the weights imposed on various prior distributions. The statistical information introduced by De Groot and the classical information of Shannon are shown to be extremal cases corresponding to alpha=0 and alpha=1 in the class of the so-called Arimoto alpha-informations introduced in this paper for 0<alpha<1 by means of the Arimoto alpha-entropies. Some new examples of f-divergences are introduced as well, namely, the Shannon divergences and the Arimoto alpha-divergences leading for alphauarr1 to the Shannon divergences. Square roots of all these divergences are shown to be metrics satisfying the triangle inequality. The last section introduces statistical tests and estimators based on the minimal f-divergence with the empirical distribution achieved in the families of hypothetic distributions. For the Kullback divergence this leads to the classical likelihood ratio test and estimator
Article
The problem of finding a meaningful measure of the "common information" or "common randomness' of two discrete dependent random variables X,Y is studied. The quantity C(X; Y) is defined as the minimum possible value of I(X, Y; W) where the minimum is taken over all distributions defining an auxiliary random variable W in mathcal{W} , a finite set, such that X, Y are conditionally independent given W . The main result of the paper is contained in two theorems which show that C(X; Y) is i) the minimum R_0 such that a sequence of independent copies of (X,Y) can be efficiently encoded into three binary streams W_0, W_1,W_2 with rates R_0,R_1,R_2 , respectively, [sum R_i = H(X, Y)] and X recovered from (W_0, W_1) , and Y recovered from (W_0, W_2) , i.e., W_0 is the common stream; ii) the minimum binary rate R of the common input to independent processors that generate an approximation to X,Y .
Article
We use the martingale method to establish concentration inequalities for a class of dependent random sequences on a countable state space, with the constants in the inequalities expressed in terms of certain mixing coefficients. Along the way, we obtain bounds on certain martingale differences associated with the random sequences, which may be of independent interest. As an application of our result, we also derive a concentration inequality for inhomogeneous Markov chains, and establish an extremal property associated with their martingale difference bounds. This work complements certain concentration inequalities obtained by Marton and Samson, while also providing a different proof of some known results.
Dependency-dependent bounds for sums of dependent random variables
  • Lampert
A measure concentration inequality for contracting Markov chains
  • K Marton
Dependency-dependent bounds for sums of dependent random variables
  • C H Lampert
  • L Ralaivola
  • A Zimin