Publications (208)151.45 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: We are motivated by applications that need rich model classes to represent them. Examples of rich model classes include distributions over large, countably infinite supports, slow mixing Markov processes, etc. But such rich classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). However, these rich classes may still allow for estimators with pointwise guarantees whose performance can be bounded in a model dependent way. The pointwise angle of course has the drawback that the estimator performance is a function of the very unknown model that is being estimated, and is therefore unknown. Therefore, even if the estimator is consistent, how well it is doing may not be clear no matter what the sample size is. Departing from the dichotomy of uniform and pointwise consistency, a new analysis framework is explored by characterizing rich model classes that may only admit pointwise guarantees, yet all the information about the model needed to guage estimator accuracy can be inferred from the sample at hand. To retain focus, we analyze the universal compression problem in this data driven pointwise consistency framework.11/2014;  [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we revisit the structure of infeasibility results in network information theory, based on a notion of information state. We also discuss ideas for generalizing a known outer bound for lossless transmission of independent sources over a network to one of lossy transmission of dependent sources over the same network. To concretely demonstrate this, we apply our ideas and prove new results for lossy transmission of dependent sources by generalizing: 1) the cutset bound; 2) the best known outer bound on the capacity region of a general broadcast channel; and 3) the outer bound part of the result of Maric, Yates, and Kramer on strong interference channels with a common message.IEEE Transactions on Information Theory 10/2014; 60(10):59926004. · 2.65 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Consider a family of Boolean models, indexed by integers $n \ge 1$, where the $n$th model features a Poisson point process in ${\mathbb{R}}^n$ of intensity $e^{n \rho_n}$ with $\rho_n \to \rho$ as $n \to \infty$, and balls of independent and identically distributed radii distributed like $\bar X_n \sqrt{n}$, with $\bar X_n$ satisfying a large deviations principle. It is shown that there exist three deterministic thresholds: $\tau_d$ the degree threshold; $\tau_p$ the percolation threshold; and $\tau_v$ the volume fraction threshold; such that asymptotically as $n$ tends to infinity, in a sense made precise in the paper: (i) for $\rho < \tau_d$, almost every point is isolated, namely its ball intersects no other ball; (ii) for $\tau_d< \rho< \tau_p$, almost every ball intersects an infinite number of balls and nevertheless there is no percolation; (iii) for $\tau_p< \rho< \tau_v$, the volume fraction is 0 and nevertheless percolation occurs; (iv) for $\tau_d< \rho< \tau_v$, almost every ball intersects an infinite number of balls and nevertheless the volume fraction is 0; (v) for $\rho > \tau_v$, the whole space covered. The analysis of this asymptotic regime is motivated by related problems in information theory, and may be of interest in other applications of stochastic geometry.08/2014; 
Conference Paper: Data dependent weak universal compression
[Show abstract] [Hide abstract]
ABSTRACT: We are motivated by applications that need rich model classes to represent the application, such as the set of all discrete distributions over large, countably infinite supports. But such rich classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). However, these rich classes may still allow for estimators with pointwise guarantees whose performance can be bounded in a modeldependent way. But the pointwise angle has a drawback as well—estimator performance is a function of the very unknown model that is being estimated, and is therefore unknown. Therefore, even if an estimator is consistent, how well it is doing may not be clear no matter what the sample size. Departing from the uniform/pointwise dichotomy, a new analysis framework is explored by characterizing rich model classes that may only admit pointwise guarantees, yet all information about the unknown model needed to gauge estimator accuracy can be inferred from the sample at hand. To bring focus, we analyze the universal compression problem in this data driven, pointwise consistency framework. Today, data accumulated in many biological, financial, and other statistical problems stands out not just because of its nature or size, but also because the questions we ask of it are unlike anything we asked before. There is often a tension in these big data problems between the need for rich model classes to better represent the application and our ability to handle these classes at all from a mathematical point of view. Consider an example of insuring the risk of exposure to the Internet as opposed to the simple credit monitoring tools available today. Given the significant number of identity thefts, security breaches, and privacy concerns, insurance of this nature may be highly desirable. How would one model loss here? After all, losses suffered can range from direct loss of property to more intangible, yet very significant damage resulting from lowered credit scores. Designing insurance policies with ceilings on claim payments keeps us in familiar territory mathematically, but also misses the point of why one may want this sort of insurance. We therefore want a richer set of candidate loss models that do not impose artificial ceilings on loss. But we will run into a fundamental roadblock here. Richness of model classes is often quantified by metrics such as the VCdimension [1], the Rademacher complexity [2], [3], [4], or the strong compression redundancy [5], [6], [7], [8], [9]. Typically, one looks for estimation algorithms with modelagnostic guarantees based on the sample size—indeed this is the uniform consistency dogma that underlies most formulations of engineering applications today. But any such guarantee on estimators on a model class depends on the complexity metrics above—the more complex a class, the worse the guarantees. In fact, the insurance problem above and many applications in the "big data" regime force us to consider model classes that are too complex to admit estimators with reasonable modelagnostic guarantees (or uniformly consistent estimators). Instead the best we can often do is to have guarantees dependent on not just the sample size but on the underlying model in addition (pointwise consistent). This is not very helpful either—our gauge of how well the estimator is doing is dependent on the very quantity being estimated! As in [10], we challenge the dichotomy of uniform and pointwise consistency in the analysis of statistical estimators. Neither uniform nor pointwise guarantees are particularly suited to the big data problems we have in mind. The former precludes the desired richness of model classes. While the latter allows for rich model classes, it does not provide practical guarantees that can be used in applications. Instead, we consider a new paradigm positioned in between these two extremes. This framework modifies the world of pointwise consistent estimators—keeping as far as possible the richness of model classes possible butIEEE Symposium on Information Theory, Honolulu, HI; 06/2014 
Conference Paper: On hypercontractivity and a data processing inequality
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we provide the correct tight constant to a dataprocessing inequality claimed by Erkip and Cover. The correct constant turns out to be a particular hypercontractivity parameter of (X,Y), rather than their squared maximal correlation. We also provide alternate geometric characterizations for both maximal correlation as well as the hypercontractivity parameter that characterizes the dataprocessing inequality.2014 IEEE International Symposium on Information Theory (ISIT); 06/2014 
Conference Paper: Datadriven weak universal redundancy
[Show abstract] [Hide abstract]
ABSTRACT: In applications involving estimation, the relevant model classes of probability distributions are often too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire model class as the sample size increases (uniform consistency). While it is often possible to get pointwise guarantees, so that the convergence rate of the estimator can be bounded in a modeldependent way, such pointwise gaurantees are unsatisfactory  estimator performance is a function of the very unknown quantity that is being estimated. Therefore, even if an estimator is consistent, how well it is doing may not be clear no matter what the sample size. Departing from this traditional uniform/pointwise dichotomy, a new analysis framework is explored by characterizing model classes of probability distributions that may only admit pointwise guarantees, yet where all the information about the unknown model needed to gauge estimator accuracy can be inferred from the sample at hand. To provide a focus to this suggested broad new paradigm, we analyze the universal compression problem in this datadriven pointwise consistency framework.2014 IEEE International Symposium on Information Theory (ISIT); 06/2014 
Conference Paper: Convex relative entropy decay in Markov chains
[Show abstract] [Hide abstract]
ABSTRACT: We look at irreducible continuous time Markov chains with a finite or countably infinite number of states, and a unique stationary distribution π. If the Markov chain has distribution μt at time t, its relative entropy to stationarity is denoted by h(μtπ). This is a monotonically decreasing function of time, and decays to 0 at an exponential rate in most natural examples of Markov chains arising in applications. In this paper, we focus on the second derivative properties of h(μtπ). In particular we examine when relative entropy to stationarity exhibits convex decay, independent of the starting distribution. It has been shown that convexity of h(μtπ) in a Markov chain can lead to sharper bounds on the rate of relative entropy decay, and thus on the mixing time of the Markov chain. We study certain finite state Markov chains as well as countable state Markov chains arising from stable Jackson queueing networks.2014 48th Annual Conference on Information Sciences and Systems (CISS); 03/2014  [Show abstract] [Hide abstract]
ABSTRACT: Hypercontractivity has had many successful applications in mathematics, physics, and theoretical computer science. In this work we use recently established properties of the hypercontractivity ribbon of a pair of random variables to study a recent conjecture regarding the mutual information between binary functions of the individual marginal sequences of a sequence of pairs of random variables drawn from a doubly symmetric binary source.2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton); 10/2013  [Show abstract] [Hide abstract]
ABSTRACT: In this paper we provide a new geometric characterization of the HirschfeldGebeleinR\'{e}nyi maximal correlation of a pair of random $(X,Y)$, as well as of the chordal slope of the nontrivial boundary of the hypercontractivity ribbon of $(X,Y)$ at infinity. The new characterizations lead to simple proofs for some of the known facts about these quantities. We also provide a counterexample to a data processing inequality claimed by Erkip and Cover, and find the correct tight constant for this kind of inequality.04/2013; 
Conference Paper: Improved cardinality bounds on the auxiliary random variables in Marton's inner bound
[Show abstract] [Hide abstract]
ABSTRACT: Marton's region is the best known inner bound for a general discrete memoryless broadcast channel. We establish improved bounds on the cardinalities of the auxiliary random variables. We combine the perturbation technique along with a representation using concave envelopes to achieve this improvement. As a corollary of this result, we show that a randomized time division strategy achieves the entire Marton's region for binary input broadcast channels, extending the previously known result for the sumrate and validating a previous conjecture due to the same authors.Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on; 01/2013  [Show abstract] [Hide abstract]
ABSTRACT: Motivated by problems in insurance, our task is to predict finite upper bounds on a future draw from an unknown distribution $p$ over the set of natural numbers. We can only use past observations generated independently and identically distributed according to $p$. While $p$ is unknown, it is known to belong to a given collection ${\cal P}$ of probability distributions on the natural numbers. The support of the distributions $p \in {\cal P}$ may be unbounded, and the prediction game goes on for \emph{infinitely} many draws. We are allowed to make observations without predicting upper bounds for some time. But we must, with probability 1, start and then continue to predict upper bounds after a finite time irrespective of which $p \in {\cal P}$ governs the data. If it is possible, without knowledge of $p$ and for any prescribed confidence however close to 1, to come up with a sequence of upper bounds that is never violated over an infinite time window with confidence at least as big as prescribed, we say the model class ${\cal P}$ is \emph{insurable}. We completely characterize the insurability of any class ${\cal P}$ of distributions over natural numbers by means of a condition on how the neighborhoods of distributions in ${\cal P}$ should be, one that is both necessary and sufficient.12/2012;  [Show abstract] [Hide abstract]
ABSTRACT: Shannon's Entropy Power Inequality can be viewed as characterizing the minimum differential entropy achievable by the sum of two independent random variables with fixed differential entropies. The entropy power inequality has played a key role in resolving a number of problems in information theory. It is therefore interesting to examine the existence of a similar inequality for discrete random variables. In this paper we obtain an entropy power inequality for random variables taking values in an abelian group of order 2^n, i.e. for such a group G we explicitly characterize the function f_G(x,y) giving the minimum entropy of the sum of two independent Gvalued random variables with respective entropies x and y. Random variables achieving the extremum in this inequality are thus the analogs of Gaussians in this case, and these are also determined. It turns out that f_G(x,y) is convex in x for fixed y and, by symmetry, convex in y for fixed x. This is a generalization to abelian groups of order 2^n of the result known as Mrs. Gerber's Lemma.07/2012;  [Show abstract] [Hide abstract]
ABSTRACT: A positive recurrent, aperiodic Markov chain is said to be longrange dependent (LRD) when the indicator function of a particular state is LRD. This happens if and only if the return time distribution for that state has infinite variance. We investigate the question of whether other instantaneous functions of the Markov chain also inherit this property. We provide conditions under which the function has the same degree of longrange dependence as the chain itself. We illustrate our results through three examples in diverse fields: queueing networks, source compression, and finance.Journal of Applied Probability 06/2012; 49(2012). · 0.69 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Stability and convergence properties of stochastic approximation algorithms are analyzed when the noise includes a long range dependent component (modeled by a fractional Brownian motion) and a heavy tailed component (modeled by a symmetric stable process), in addition to the usual ‘martingale noise’. This is motivated by the emergent applications in communications. The proofs are based on comparing suitably interpolated iterates with a limiting ordinary differential equation. Related issues such as asynchronous implementations, Markov noise, etc. are briefly discussed.Queueing Systems 06/2012; 71(12). · 0.60 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Seventeen years ago at the ITW that was held in Moscow, I orga nized a similar panel on the future of Information Theory with the partici pation of Dick Blahut, Imre Csiszár, Dave Forney, Prakash Narayan and Mark Pinsker. In preparation for this panel I have asked our panelists to read the transcript of that panel (published in the December 1994 issue of this newsletter) and discuss the ways in which that panel’s predictions were and were not accurate.IEEE Information Theory Society Newsletter. 03/2012;  [Show abstract] [Hide abstract]
ABSTRACT: Marton's inner bound is the best known achievable region for a general discrete memoryless broadcast channel. To compute Marton's inner bound one has to solve an optimization problem over a set of joint distributions on the input and auxiliary random variables. The optimizers turn out to be structured in many cases. Finding properties of optimizers not only results in efficient evaluation of the region, but it may also help one to prove factorization of Marton's inner bound (and thus its optimality). The first part of this paper formulates this factorization approach explicitly and states some conjectures and results along this line. The second part of this paper focuses primarily on the structure of the optimizers. This section is inspired by a new binary inequality that recently resulted in a very simple characterization of the sumrate of Marton's inner bound for binary input broadcast channels. This prompted us to investigate whether this inequality can be extended to larger cardinality input alphabets. We show that several of the results for the binary input case do carry over for higher cardinality alphabets and we present a collection of results that help restrict the search space of probability distributions to evaluate the boundary of Marton's inner bound in the general case. We also prove a new inequality for the binary skewsymmetric broadcast channel that yields a very simple characterization of the entire Marton inner bound for this channel.02/2012; 
Conference Paper: Prediction over countable alphabets
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of predicting finite upper bounds on unseen samples of an unknown distribution p over the set of natural numbers, using only observations generated i.i.d. from it. While p is unknown, it belongs to a known collection P of possible models. This problem is motivated from an insurance setup. The distribution p is a probabilistic model for loss, and each sample from p stands for the total loss incurred by the insured at a particular time step. The upper bound plays the role of the total built up reserves of an insurer, including past premiums after paying out past losses, as well as the current premium charged in order to cover future losses. Thus, if an insurer can accurately upper bound future unseen losses, premiums can be set so that the insurer will not be bankrupted. However, is it possible for the insurer to set premiums so that the probability of bankruptcy can be made arbitrarily smalleven when the possible loss is unbounded, the underlying loss model unknown, and the game proceeds for an infinitely long time? Equivalently, when is P insurable? We derive a condition that is both necessary and sufficient for any class P of distributions to be insurable.Information Sciences and Systems (CISS), 2012 46th Annual Conference on; 01/2012  [Show abstract] [Hide abstract]
ABSTRACT: We discuss functions of long range dependent Markov chains. We state sufficient conditions under which an instantaneous function of a long range dependent Markov chain has the same Hurst index as the underlying chain. We discuss several applications of the theorem in the fields of information theory, queuing networks, and finance.01/2012; 
Conference Paper: Stable, distributed P2P protocols based on random peer sampling
[Show abstract] [Hide abstract]
ABSTRACT: In a peertopeer file sharing system based on random contacts where the upload capacity of the seed is small, a single chunk of the file may become rare, causing an accumulation of peers who lack the rare chunk. To prevent this from happening, we propose a protocol where each peer samples a small population of peers and makes an intelligent decision to pick which chunk to download based on this sample. We prove that the resulting system is stable under any arrival rate of peers even if the seed has small, bounded upload capacity.Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on; 01/2012 
Conference Paper: Universal algorithms: Building a case for pointwise convergence
[Show abstract] [Hide abstract]
ABSTRACT: We consider algorithms for prediction, compression and entropy estimation in a universal setup. In each case, we estimate some function of an unknown distribution p over the set of natural numbers, using only n observations generated i.i.d. from p. While p is unknown, it belongs to a known collection P of possible models. When the supports of distributions in P are uniformly bounded, consistent algorithms exist for each of the problems. Namely, the convergence of the estimate to the true value can be bounded by a function depending only on the sample size, n, and not on the underlying distribution p. However, when the supports of distributions in P are not uniformly bounded, a more natural approach involves algorithms that are pointwise consistent, namely, the convergence to the true value is at a rate that depends on both n and the underlying (unknown) distribution p. The obvious practical difficulty with pointwise convergence is that the asymptotic consistency of the algorithm may indicate nothing about the performance of the algorithm for any fixed sample size, since the underlying distribution is unknown. In this paper, we first note that for many complex model classes P, we can still circumvent the above practical difficulty with pointwise convergence. Secondly, we take here a preliminary step towards characterizing a broad framework establishing the hierarchy of difficulty of problems involving pointwise convergence. We look for connections among the following problems which we define for a pointwise convergence scenario: (i) predicting good upper bounds on the next unseen sample, (ii) weak universal compression, and (iii) entropy estimation. We construct counterexamples to show that no two properties above imply the third.Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on; 01/2012
Publication Stats
5k  Citations  
151.45  Total Impact Points  
Top Journals
Institutions

1984–2012

University of California, Berkeley
 Department of Electrical Engineering and Computer Sciences
Berkeley, California, United States


2008–2010

Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
Cambridge, MA, United States


1989–2009

University of Michigan
 Department of Electrical Engineering and Computer Science (EECS)
Ann Arbor, MI, United States


1987–2008

Cornell University
 Department of Electrical and Computer Engineering
Ithaca, NY, United States


2007

Tata Institute of Fundamental Research
 School of Technology and Computer Science
Mumbai, Mahārāshtra, India


2006

École Polytechnique Fédérale de Lausanne
Lausanne, Vaud, Switzerland


2004

University of California, Davis
 Department of Electrical and Computer Engineering
Davis, CA, United States 
University of Illinois, UrbanaChampaign
 Department of Electrical and Computer Engineering
Urbana, Illinois, United States


2002

University of Maryland, College Park
 Department of Electrical & Computer Engineering
College Park, MD, United States


1996–1999

Stanford University
 Department of Electrical Engineering
Stanford, CA, United States


1994–1995

University of Texas at Austin
 Department of Electrical & Computer Engineering
Port Aransas, TX, United States


1990

University of Wisconsin, Madison
 Department of Electrical and Computer Engineering
Madison, MS, United States
