Publications (214)154.77 Total impact

[Show abstract] [Hide abstract]
ABSTRACT: We consider the following noninteractive simulation problem: Alice and Bob observe sequences $X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y),$ and they output $U$ and $V$ respectively which is required to have a joint law that is close in total variation to a specified $Q(u,v).$ It is known that the maximal correlation of $U$ and $V$ must necessarily be no bigger than that of $X$ and $Y$ if this is to be possible. Our main contribution is to bring hypercontractivity to bear as a tool on this problem. In particular, we show that if $P(x,y)$ is the doubly symmetric binary source, then hypercontractivity provides stronger impossibility results than maximal correlation. Finally, we extend these tools to provide impossibility results for the $k$agent version of this problem. 
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we consider the AWGN channel with a power constraint called the $(\sigma, \rho)$power constraint, which is motivated by energy harvesting communication systems. Given a codeword, the constraint imposes a limit of $\sigma + k \rho$ on the total power of any $k\geq 1$ consecutive transmitted symbols. Such a channel has infinite memory and evaluating its exact capacity is a difficult task. Consequently, we establish an $n$letter capacity expression and seek bounds for the same. We obtain a lower bound on capacity by considering the volume of ${\cal S}_n(\sigma, \rho) \subseteq \mathbb{R}^n$, which is the set of all length $n$ sequences satisfying the $(\sigma, \rho)$power constraints. For a noise power of $\nu$, we obtain an upper bound on capacity by considering the volume of ${\cal S}_n(\sigma, \rho) \oplus B_n(\sqrt{n\nu})$, which is the Minkowski sum of ${\cal S}_n(\sigma, \rho)$ and the $n$dimensional Euclidean ball of radius $\sqrt{n\nu}$. We analyze this bound using a result from convex geometry known as Steiner's formula, which gives the volume of this Minkowski sum in terms of the intrinsic volumes of ${\cal S}_n(\sigma, \rho)$. We show that as the dimension $n$ increases, the logarithm of the sequence of intrinsic volumes of $\{{\cal S}_n(\sigma, \rho)\}$ converges to a limit function under an appropriate scaling. The upper bound on capacity is then expressed in terms of this limit function. We derive the asymptotic capacity in the low and high noise regime for the $(\sigma, \rho)$power constrained AWGN channel, with strengthened results for the special case of $\sigma = 0$, which is the amplitude constrained AWGN channel. 
[Show abstract] [Hide abstract]
ABSTRACT: We are motivated by applications that need rich model classes to represent them. Examples of rich model classes include distributions over large, countably infinite supports, slow mixing Markov processes, etc. But such rich classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). However, these rich classes may still allow for estimators with pointwise guarantees whose performance can be bounded in a model dependent way. The pointwise angle of course has the drawback that the estimator performance is a function of the very unknown model that is being estimated, and is therefore unknown. Therefore, even if the estimator is consistent, how well it is doing may not be clear no matter what the sample size is. Departing from the dichotomy of uniform and pointwise consistency, a new analysis framework is explored by characterizing rich model classes that may only admit pointwise guarantees, yet all the information about the model needed to guage estimator accuracy can be inferred from the sample at hand. To retain focus, we analyze the universal compression problem in this data driven pointwise consistency framework. 
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we revisit the structure of infeasibility results in network information theory, based on a notion of information state. We also discuss ideas for generalizing a known outer bound for lossless transmission of independent sources over a network to one of lossy transmission of dependent sources over the same network. To concretely demonstrate this, we apply our ideas and prove new results for lossy transmission of dependent sources by generalizing: 1) the cutset bound; 2) the best known outer bound on the capacity region of a general broadcast channel; and 3) the outer bound part of the result of Maric, Yates, and Kramer on strong interference channels with a common message.IEEE Transactions on Information Theory 10/2014; 60(10):59926004. DOI:10.1109/TIT.2014.2347301 · 2.65 Impact Factor 
[Show abstract] [Hide abstract]
ABSTRACT: Consider a family of Boolean models, indexed by integers $n \ge 1$, where the $n$th model features a Poisson point process in ${\mathbb{R}}^n$ of intensity $e^{n \rho_n}$ with $\rho_n \to \rho$ as $n \to \infty$, and balls of independent and identically distributed radii distributed like $\bar X_n \sqrt{n}$, with $\bar X_n$ satisfying a large deviations principle. It is shown that there exist three deterministic thresholds: $\tau_d$ the degree threshold; $\tau_p$ the percolation threshold; and $\tau_v$ the volume fraction threshold; such that asymptotically as $n$ tends to infinity, in a sense made precise in the paper: (i) for $\rho < \tau_d$, almost every point is isolated, namely its ball intersects no other ball; (ii) for $\tau_d< \rho< \tau_p$, almost every ball intersects an infinite number of balls and nevertheless there is no percolation; (iii) for $\tau_p< \rho< \tau_v$, the volume fraction is 0 and nevertheless percolation occurs; (iv) for $\tau_d< \rho< \tau_v$, almost every ball intersects an infinite number of balls and nevertheless the volume fraction is 0; (v) for $\rho > \tau_v$, the whole space covered. The analysis of this asymptotic regime is motivated by related problems in information theory, and may be of interest in other applications of stochastic geometry. 
Conference Paper: Data dependent weak universal compression
[Show abstract] [Hide abstract]
ABSTRACT: We are motivated by applications that need rich model classes to represent the application, such as the set of all discrete distributions over large, countably infinite supports. But such rich classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). However, these rich classes may still allow for estimators with pointwise guarantees whose performance can be bounded in a modeldependent way. But the pointwise angle has a drawback as well—estimator performance is a function of the very unknown model that is being estimated, and is therefore unknown. Therefore, even if an estimator is consistent, how well it is doing may not be clear no matter what the sample size. Departing from the uniform/pointwise dichotomy, a new analysis framework is explored by characterizing rich model classes that may only admit pointwise guarantees, yet all information about the unknown model needed to gauge estimator accuracy can be inferred from the sample at hand. To bring focus, we analyze the universal compression problem in this data driven, pointwise consistency framework. Today, data accumulated in many biological, financial, and other statistical problems stands out not just because of its nature or size, but also because the questions we ask of it are unlike anything we asked before. There is often a tension in these big data problems between the need for rich model classes to better represent the application and our ability to handle these classes at all from a mathematical point of view. Consider an example of insuring the risk of exposure to the Internet as opposed to the simple credit monitoring tools available today. Given the significant number of identity thefts, security breaches, and privacy concerns, insurance of this nature may be highly desirable. How would one model loss here? After all, losses suffered can range from direct loss of property to more intangible, yet very significant damage resulting from lowered credit scores. Designing insurance policies with ceilings on claim payments keeps us in familiar territory mathematically, but also misses the point of why one may want this sort of insurance. We therefore want a richer set of candidate loss models that do not impose artificial ceilings on loss. But we will run into a fundamental roadblock here. Richness of model classes is often quantified by metrics such as the VCdimension [1], the Rademacher complexity [2], [3], [4], or the strong compression redundancy [5], [6], [7], [8], [9]. Typically, one looks for estimation algorithms with modelagnostic guarantees based on the sample size—indeed this is the uniform consistency dogma that underlies most formulations of engineering applications today. But any such guarantee on estimators on a model class depends on the complexity metrics above—the more complex a class, the worse the guarantees. In fact, the insurance problem above and many applications in the "big data" regime force us to consider model classes that are too complex to admit estimators with reasonable modelagnostic guarantees (or uniformly consistent estimators). Instead the best we can often do is to have guarantees dependent on not just the sample size but on the underlying model in addition (pointwise consistent). This is not very helpful either—our gauge of how well the estimator is doing is dependent on the very quantity being estimated! As in [10], we challenge the dichotomy of uniform and pointwise consistency in the analysis of statistical estimators. Neither uniform nor pointwise guarantees are particularly suited to the big data problems we have in mind. The former precludes the desired richness of model classes. While the latter allows for rich model classes, it does not provide practical guarantees that can be used in applications. Instead, we consider a new paradigm positioned in between these two extremes. This framework modifies the world of pointwise consistent estimators—keeping as far as possible the richness of model classes possible butIEEE Symposium on Information Theory, Honolulu, HI; 06/2014 
Conference Paper: On hypercontractivity and a data processing inequality
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we provide the correct tight constant to a dataprocessing inequality claimed by Erkip and Cover. The correct constant turns out to be a particular hypercontractivity parameter of (X,Y), rather than their squared maximal correlation. We also provide alternate geometric characterizations for both maximal correlation as well as the hypercontractivity parameter that characterizes the dataprocessing inequality.2014 IEEE International Symposium on Information Theory (ISIT); 06/2014 
Conference Paper: Datadriven weak universal redundancy
[Show abstract] [Hide abstract]
ABSTRACT: In applications involving estimation, the relevant model classes of probability distributions are often too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). While it is often possible to get pointwise guarantees, so that the convergence rate of the estimator can be bounded in a modeldependent way, such pointwise gaurantees are unsatisfactory  estimator performance is a function of the very unknown quantity that is being estimated. Therefore, even if an estimator is consistent, how well it is doing may not be clear no matter what the sample size. Departing from this traditional uniform/pointwise dichotomy, a new analysis framework is explored by characterizing model classes of probability distributions that may only admit pointwise guarantees, yet where all the information about the unknown model needed to gauge estimator accuracy can be inferred from the sample at hand. To provide a focus to this suggested broad new paradigm, we analyze the universal compression problem in this datadriven pointwise consistency framework.2014 IEEE International Symposium on Information Theory (ISIT); 06/2014 
Conference Paper: An energy harvesting AWGN channel with a finite battery
[Show abstract] [Hide abstract]
ABSTRACT: In energy harvesting communication systems, the transmitter is adapted to harvest energy per time slot. The harvested energy is either used right away or is stored in a battery to facilitate future transmissions. We consider the problem of determining the Shannon capacity of an energy harvesting transmitter communicating over an additive white Gaussian noise (AWGN) channel, where the amount of energy harvested per time slot is a constant ρ and the battery has capacity σ. This imposes a new kind of power constraint on the transmitted codewords, and we call the resulting constrained channel a (σ, ρ) power constrained AWGN channel. When σ is 0 or ∞, the capacity of this channel is known. For the finite battery case, we obtain an expression for the channel capacity. We obtain bounds on capacity by considering the volume of Sn(σ, ρ) ⊆ ℝn, which is the set of all length n sequences satisfying the (σ, ρ) constraints.2014 IEEE International Symposium on Information Theory (ISIT); 06/2014 
Conference Paper: Convex relative entropy decay in Markov chains
[Show abstract] [Hide abstract]
ABSTRACT: We look at irreducible continuous time Markov chains with a finite or countably infinite number of states, and a unique stationary distribution π. If the Markov chain has distribution μt at time t, its relative entropy to stationarity is denoted by h(μtπ). This is a monotonically decreasing function of time, and decays to 0 at an exponential rate in most natural examples of Markov chains arising in applications. In this paper, we focus on the second derivative properties of h(μtπ). In particular we examine when relative entropy to stationarity exhibits convex decay, independent of the starting distribution. It has been shown that convexity of h(μtπ) in a Markov chain can lead to sharper bounds on the rate of relative entropy decay, and thus on the mixing time of the Markov chain. We study certain finite state Markov chains as well as countable state Markov chains arising from stable Jackson queueing networks.2014 48th Annual Conference on Information Sciences and Systems (CISS); 03/2014 
[Show abstract] [Hide abstract]
ABSTRACT: Hypercontractivity has had many successful applications in mathematics, physics, and theoretical computer science. In this work we use recently established properties of the hypercontractivity ribbon of a pair of random variables to study a recent conjecture regarding the mutual information between binary functions of the individual marginal sequences of a sequence of pairs of random variables drawn from a doubly symmetric binary source.2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton); 10/2013 
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we provide a new geometric characterization of the HirschfeldGebeleinR\'{e}nyi maximal correlation of a pair of random $(X,Y)$, as well as of the chordal slope of the nontrivial boundary of the hypercontractivity ribbon of $(X,Y)$ at infinity. The new characterizations lead to simple proofs for some of the known facts about these quantities. We also provide a counterexample to a data processing inequality claimed by Erkip and Cover, and find the correct tight constant for this kind of inequality. 
Conference Paper: Improved cardinality bounds on the auxiliary random variables in Marton's inner bound
[Show abstract] [Hide abstract]
ABSTRACT: Marton's region is the best known inner bound for a general discrete memoryless broadcast channel. We establish improved bounds on the cardinalities of the auxiliary random variables. We combine the perturbation technique along with a representation using concave envelopes to achieve this improvement. As a corollary of this result, we show that a randomized time division strategy achieves the entire Marton's region for binary input broadcast channels, extending the previously known result for the sumrate and validating a previous conjecture due to the same authors.Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on; 01/2013 
[Show abstract] [Hide abstract]
ABSTRACT: Motivated by problems in insurance, our task is to predict finite upper bounds on a future draw from an unknown distribution $p$ over the set of natural numbers. We can only use past observations generated independently and identically distributed according to $p$. While $p$ is unknown, it is known to belong to a given collection ${\cal P}$ of probability distributions on the natural numbers. The support of the distributions $p \in {\cal P}$ may be unbounded, and the prediction game goes on for \emph{infinitely} many draws. We are allowed to make observations without predicting upper bounds for some time. But we must, with probability 1, start and then continue to predict upper bounds after a finite time irrespective of which $p \in {\cal P}$ governs the data. If it is possible, without knowledge of $p$ and for any prescribed confidence however close to 1, to come up with a sequence of upper bounds that is never violated over an infinite time window with confidence at least as big as prescribed, we say the model class ${\cal P}$ is \emph{insurable}. We completely characterize the insurability of any class ${\cal P}$ of distributions over natural numbers by means of a condition on how the neighborhoods of distributions in ${\cal P}$ should be, one that is both necessary and sufficient. 
[Show abstract] [Hide abstract]
ABSTRACT: Shannon's Entropy Power Inequality can be viewed as characterizing the minimum differential entropy achievable by the sum of two independent random variables with fixed differential entropies. The entropy power inequality has played a key role in resolving a number of problems in information theory. It is therefore interesting to examine the existence of a similar inequality for discrete random variables. In this paper we obtain an entropy power inequality for random variables taking values in an abelian group of order 2^n, i.e. for such a group G we explicitly characterize the function f_G(x,y) giving the minimum entropy of the sum of two independent Gvalued random variables with respective entropies x and y. Random variables achieving the extremum in this inequality are thus the analogs of Gaussians in this case, and these are also determined. It turns out that f_G(x,y) is convex in x for fixed y and, by symmetry, convex in y for fixed x. This is a generalization to abelian groups of order 2^n of the result known as Mrs. Gerber's Lemma.07/2012; 60(7). DOI:10.1109/ISIT.2013.6620295 
[Show abstract] [Hide abstract]
ABSTRACT: A positive recurrent, aperiodic Markov chain is said to be longrange dependent (LRD) when the indicator function of a particular state is LRD. This happens if and only if the return time distribution for that state has infinite variance. We investigate the question of whether other instantaneous functions of the Markov chain also inherit this property. We provide conditions under which the function has the same degree of longrange dependence as the chain itself. We illustrate our results through three examples in diverse fields: queueing networks, source compression, and finance.Journal of Applied Probability 06/2012; 49(2012). DOI:10.1239/jap/1339878798 · 0.69 Impact Factor 
[Show abstract] [Hide abstract]
ABSTRACT: Stability and convergence properties of stochastic approximation algorithms are analyzed when the noise includes a long range dependent component (modeled by a fractional Brownian motion) and a heavy tailed component (modeled by a symmetric stable process), in addition to the usual ‘martingale noise’. This is motivated by the emergent applications in communications. The proofs are based on comparing suitably interpolated iterates with a limiting ordinary differential equation. Related issues such as asynchronous implementations, Markov noise, etc. are briefly discussed.Queueing Systems 06/2012; 71(12). DOI:10.1007/s1113401292830 · 0.60 Impact Factor 
[Show abstract] [Hide abstract]
ABSTRACT: Seventeen years ago at the ITW that was held in Moscow, I orga nized a similar panel on the future of Information Theory with the partici pation of Dick Blahut, Imre Csiszár, Dave Forney, Prakash Narayan and Mark Pinsker. In preparation for this panel I have asked our panelists to read the transcript of that panel (published in the December 1994 issue of this newsletter) and discuss the ways in which that panel’s predictions were and were not accurate. 
[Show abstract] [Hide abstract]
ABSTRACT: Marton's inner bound is the best known achievable region for a general discrete memoryless broadcast channel. To compute Marton's inner bound one has to solve an optimization problem over a set of joint distributions on the input and auxiliary random variables. The optimizers turn out to be structured in many cases. Finding properties of optimizers not only results in efficient evaluation of the region, but it may also help one to prove factorization of Marton's inner bound (and thus its optimality). The first part of this paper formulates this factorization approach explicitly and states some conjectures and results along this line. The second part of this paper focuses primarily on the structure of the optimizers. This section is inspired by a new binary inequality that recently resulted in a very simple characterization of the sumrate of Marton's inner bound for binary input broadcast channels. This prompted us to investigate whether this inequality can be extended to larger cardinality input alphabets. We show that several of the results for the binary input case do carry over for higher cardinality alphabets and we present a collection of results that help restrict the search space of probability distributions to evaluate the boundary of Marton's inner bound in the general case. We also prove a new inequality for the binary skewsymmetric broadcast channel that yields a very simple characterization of the entire Marton inner bound for this channel.02/2012; DOI:10.1109/ISIT.2012.6284258 
Conference Paper: Noninteractive simulation of joint distributions: The HirschfeldGebeleinRényi maximal correlation and the hypercontractivity ribbon
[Show abstract] [Hide abstract]
ABSTRACT: We consider the following problem: Alice and Bob observe sequences Xn and Y n respectively where {(Xi, Yi)}i=1∞ are drawn i.i.d. from P(x, y), and they output U and V respectively which is required to have a joint law that is close in total variation to a specified Q(u, v). One important technique to establish impossibility results for this problem is the HirschfeldGebeleinRényi maximal correlation which was considered by Witsenhausen [1]. Hypercontractivity studied by Ahlswede and Gács [2] and reverse hypercontractivity recently studied by Mossel et al. [3] provide another approach for proving impossibility results. We consider the tightest impossibility results that can be obtained using hypercontractivity and reverse hypercontractivity and provide a necessary and sufficient condition on the source distribution P(x, y) for when this approach subsumes the maximal correlation approach. We show that the binary pair source distribution with symmetric noise satisfies this condition.Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on; 01/2012
Publication Stats
6k  Citations  
154.77  Total Impact Points  
Top Journals
Institutions

1984–2014

University of California, Berkeley
 Department of Electrical Engineering and Computer Sciences
Berkeley, California, United States


2008

Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
Cambridge, Massachusetts, United States


1987–2008

Cornell University
 Department of Electrical and Computer Engineering
Ithaca, NY, United States


2004

University of Illinois, UrbanaChampaign
 Department of Electrical and Computer Engineering
Urbana, Illinois, United States


2002

University of Maryland, College Park
 Department of Electrical & Computer Engineering
College Park, MD, United States


1999

Stanford University
 Department of Electrical Engineering
Stanford, CA, United States


1995

University of Texas at Austin
 Department of Electrical & Computer Engineering
Austin, Texas, United States
