# Sergio Verdu's research while affiliated with Princeton University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (232)

Over the last six decades, the representation of error exponent functions for data transmission through noisy channels at rates below capacity has seen three distinct approaches: (1) Through Gallager’s E0 functions (with and without cost constraints); (2) large deviations form, in terms of conditional relative entropy and mutual information; (3) th...

The Brascamp-Lieb inequality in functional analysis can be viewed as a measure of the “uncorrelatedness” of a joint probability distribution. We define the smooth Brascamp-Lieb (BL) divergence as the infimum of the best constant in the Brascamp-Lieb inequality under a perturbation of the joint probability distribution. An information spectrum upper...

Rényi-type generalizations of entropy, relative entropy and mutual information have found numerous applications throughout information theory and beyond. While there is consensus that the ways A. Rényi generalized entropy and relative entropy in 1961 are the “right” ones, several candidates have been put forth as possible mutual informations of ord...

A fundamental tool in network information theory is the covering lemma, which lower bounds the probability that there exists a pair of random variables; among a given number of independently generated candidates, falling within a given set. We use a weighted sum trick and Talagrand’s concentration inequality to prove new mutual covering bounds. We...

We introduce a definition of perfect and quasi-perfect codes for discrete symmetric channels based on the packing and covering properties of generalized spheres whose shape is tilted using an auxiliary probability measure. This notion generalizes previous definitions of perfect and quasi-perfect codes and encompasses maximum distance separable code...

Verd\'u reformulated the covering problem in the non-asymptotic information theoretic setting as a lower bound on the covering probability for \emph{any} set which has a large probability under a given joint distribution. The covering probability is the probability that there exists a pair of random variables among a given number of independently g...

A strong converse shows that no procedure can beat the asymptotic (as blocklength $n\to\infty$) fundamental limit of a given information-theoretic problem for any fixed error probability. A second-order converse strengthens this conclusion by showing that the asymptotic fundamental limit cannot be exceeded by more than $O(\tfrac{1}{\sqrt{n}})$. Whi...

Inspired by the forward and the reverse channels from the image-size characterization problem in network information theory, we introduce a functional inequality that unifies both the Brascamp-Lieb inequality and Barthe’s inequality, which is a reverse form of the Brascamp-Lieb inequality. For Polish spaces, we prove its equivalent entropic formula...

We introduce a definition of perfect and quasi-perfect codes for symmetric channels parametrized by an auxiliary output distribution. This notion generalizes previous definitions of perfect and quasi-perfect codes and encompasses maximum distance separable codes. The error probability of these codes, whenever they exist, is shown to coincide with t...

Inspired by the forward and the reverse channels from the image-size characterization problem in network information theory, we introduce a functional inequality which unifies both the Brascamp-Lieb inequality and Barthe's inequality, which is a reverse form of the Brascamp-Lieb inequality. For Polish spaces, we prove its equivalent entropic formul...

The redundancy for universal lossless compression of discrete memoryless sources in Campbell’s setting is characterized as a minimax Rényi divergence, which is shown to be equal to the maximal α-mutual information via a generalized redundancy-capacity theorem. Special attention is placed on the analysis of the asymptotics of minimax Rényi divergenc...

In this work we relax the usual separability assumption made in rate-distortion literature and propose f-separable distortion measures, which are well suited to model non-linear penalties. The main insight behind f-separable distortion measures is to define an n-letter distortion measure to be an f-mean of single-letter distortions. We prove a rate...

A basic two-terminal secret key generation model is considered, where the interactive communication rate between the terminals may be limited, and in particular may not be enough to achieve the maximum key rate. We first prove a multiletter characterization of the key-communication rate region (where the number of auxiliary random variables depend...

This paper presents conditional versions of Lempel-Ziv (LZ) algorithm for settings where compressor and decompressor have access to the same side information. We propose a fixed-length-parsing LZ algorithm with side information, motivated by the Willems algorithm, and prove the optimality for any stationary processes. In addition, we suggest strate...

This paper considers the problem of lossy source coding with a specific distortion measure: logarithmic loss. The focus of this paper is on the single-shot approach which exposes crisply the connection between lossless source coding with list decoding and lossy source coding with log-loss. Fixed-length and variable length bounds are presented. Fixe...

This paper quantifies the fundamental limits of variable-length transmission of a general (possibly analog) source over a memoryless channel with noiseless feedback, under a distortion constraint. We consider excess distortion, average distortion and guaranteed distortion (d-semifaithful codes). In contrast to the asymptotic fundamental limit, a ge...

We introduce an inequality which may be viewed as a generalization of both the Brascamp-Lieb inequality and its reverse (Barthe's inequality), and prove its information-theoretic (i.e.\ entropic) formulation. This result leads to a unified approach to functional inequalities such as the variational formula of R\'enyi entropy, hypercontractivity and...

This paper gives upper and lower bounds on the minimum error probability of Bayesian $M$-ary hypothesis testing in terms of the Arimoto-R\'enyi conditional entropy of an arbitrary order $\alpha$. The improved tightness of these bounds over their specialized versions with the Shannon conditional entropy ($\alpha=1$) is demonstrated. In particular, i...

The redundancy for universal lossless compression in Campbell's setting is characterized as a minimax R\'enyi divergence, which is shown to be equal to the maximal $\alpha$-mutual information via a generalized redundancy-capacity theorem. Special attention is placed on the analysis of the asymptotics of minimax R\'enyi divergence, which is determin...

This paper develops systematic approaches to obtain f -divergence inequalities, dealing with pairs of probability measures defined on arbitrary alphabets. Functional domination is one such approach, where special emphasis is placed on finding the best possible constant upper bounding a ratio of f -divergences. Another approach used for the derivati...

This paper considers derivation of f-divergence inequalities via the approach of functional domination. Bounds on an f-divergence based on one or several other f-divergences are introduced, dealing with pairs of probability measures defined on arbitrary alphabets. In addition, a variety of bounds are shown to hold under boundedness assumptions on t...

This paper considers derivation of $f$-divergence inequalities via the approach of functional domination. Bounds on an $f$-divergence based on one or several other $f$-divergences are introduced, dealing with pairs of probability measures defined on arbitrary alphabets. In addition, a variety of bounds are shown to hold under boundedness assumption...

We generalize a result by Carlen and Cordero-Erausquin on the equivalence between the Brascamp-Lieb inequality and the subadditivity of relative entropy by allowing for random transformations (a broadcast channel). This leads to a unified perspective on several functional inequalities that have been gaining popularity in the context of proving impo...

We study the infimum of the best constant in a functional inequality, the Brascamp-Lieb-like inequality, over auxiliary measures within a neighborhood of a product distribution. In the finite alphabet and the Gaussian cases, such an infimum converges to the best constant in a mutual information inequality. Implications for strong converse propertie...

The basic two-terminal common randomness (CR) or key generation model is
considered, where the communication between the terminals may be limited, and
in particular may not be enough to achieve the maximal CR/key rate. We
introduce a general framework of $XY$-absolutely continuous distributions and
$XY$-concave function, and characterize the first...

The conventional channel resolvability refers to the minimum rate needed for
an input process to approximate an output distribution of a channel in total
variation distance. In this paper we study $E_{\gamma}$-resolvability, which
replaces total variation distance by the more general $E_{\gamma}$ distance. A
general one-shot achievability bound for...

Variable-length channel codes over discrete memoryless channels subject to probabilistic delay guarantees are examined in the non-vanishing error probability regime. Fundamental limits of these codes in several different settings, which depend on the availability of noiseless feedback and a termination option, are investigated. In stark contrast wi...

This paper studies bounds among various $f$-divergences, dealing with
arbitrary alphabets and deriving bounds on the ratios of various distance
measures. Special attention is placed on bounds in terms of the total variation
distance, including "reverse Pinsker inequalities," as well as on the
$E_\gamma$ divergence, which generalizes the total varia...

We investigate the minimum transmitted energy required to reproduce k source samples with a given fidelity after transmission over a memoryless Gaussian channel. In particular, we analyze the reduction in transmitted energy that accrues thanks to the availability of noiseless feedback. Allowing a nonvanishing excess distortion probability ∈ boosts...

In information theory, the packing and covering lemmas are conventionally used in conjunction with the typical sequence approach in order to prove the asymptotic achievability results for discrete memoryless systems. In contrast, the single-shot approach in information theory provides non-asymptotic achievability and converse results, which are use...

We study the amount of randomness needed for an input process to approximate
a given output distribution of a channel in the $E_{\gamma}$ distance. A
general one-shot achievability bound for the precision of such an approximation
is developed. In the i.i.d.~setting where $\gamma=\exp(nE)$, a (nonnegative)
randomness rate above $\inf_{Q_{\sf U}: D(Q...

A new model of multi-party secret key agreement is proposed, in which one
terminal called the communicator can transmit public messages to other
terminals before all terminals agree on a secret key. A single-letter
characterization of the achievable region is derived in the stationary
memoryless case. The new model generalizes some other (old and n...

By developing one-shot mutual covering lemmas, we derive a one-shot
achievability bound for broadcast with a common message which recovers Marton's
inner bound (with three auxiliary random variables) in the i.i.d.~case. The
encoder employed is deterministic. Relationship between the mutual covering
lemma and a new type of channel resolvability prob...

This paper provides a necessary condition good rate-distortion codes must satisfy. Specifically, it is shown that as the blocklength increases, the distribution of the input given the output of a good lossy code converges to the distribution of the input given the output of the joint distribution achieving the rate-distortion function, in terms of...

We show that for product sources, rate splitting is optimal for secret key
agreement using limited one-way communication between two terminals. This
yields an alternative proof of the tensorization property of a strong data
processing inequality originally studied by Erkip and Cover and amended
recently by Anantharam et al. We derive a "water-filli...

This paper gives non-asymptotic converse bounds on the cumulant generating function of the encoded lengths in variable-rate lossy compression and in variable-to-fixed channel coding. The results are given in terms of the Rényi mutual information and the d-tilted Rényi entropy. We also illustrate the application of the non-asymptotic bounds to obtai...

This paper analyzes the distribution of the codeword lengths of the optimal lossless compression code without prefix constraints both in the non-asymptotic regime and in the asymptotic regime. The technique we use is based on upper and lower bounding the cumulant generating function of the optimum codeword lengths. In the context of prefix codes, t...

We show that for product sources, rate splitting is optimal for secret key
agreement using limited one-way communication at two terminals. This yields an
alternative proof of the tensorization property of a strong data processing
inequality originally studied by Erkip and Cover and amended recently by
Anantharam et al. We derive a `water-filling' s...

We give explicit expressions, upper and lower bounds on the total variation distance between P and Q in terms of the distribution of the random variables log dP/dQ (X) and log dP/dQ(Y), where X and Y are distributed accorκding to P and Q respectively.

This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length strictly lossless compression. In the non-asymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precis...

This paper shows new general nonasymptotic achievability and converse bounds
and performs their dispersion analysis for the lossy compression problem in
which the compressor observes the source through a noisy channel. While this
problem is asymptotically equivalent to a noiseless lossy source coding problem
with a modified distortion function, non...

This paper shows the strong converse and the dispersion of memoryless
channels with cost constraints and performs refined analysis of the third order
term in the asymptotic expansion of the maximum achievable channel coding rate,
showing that it is equal to $\frac 1 2 \frac {\log n}{n}$ in most cases of
interest. The analysis is based on a new non-...

The dependence-testing (DT) bound is one of the strongest achievability bounds for the binary erasure channel (BEC) in the finite block length regime. In this paper, we show that maximum likelihood decoded regular low-density parity-check (LDPC) codes with at least 5 ones per column almost achieve the DT bound. Specifically, using quasi-regular LDP...

This work1 deals with the fundamental limits of strictly-lossless variable-length compression of known sources without prefix constraints. The source dispersion characterizes the time-horizon over which it is necessary to code in order to approach the entropy rate within a pre-specified tolerance. We show that for a large class of sources, the disp...

This paper provides an extensive study of the behavior of the best achievable
rate (and other related fundamental limits) in variable-length lossless
compression. In the non-asymptotic regime, the fundamental limits of
fixed-to-variable lossless compression with and without prefix constraints are
shown to be tightly coupled. Several precise, quanti...

Invoking random coding, but not typical sequences, we give non-asymptotic achievability results for the major setups in multiuser information theory. No limitations, such as memorylessness or discreteness, on sources/channels are imposed. All the bounds given are powerful enough to yield the constructive side of the (asymptotic) capacity regions in...

We revisit the dilemma of whether one should or should not code when operating under delay constraints. In those curious cases when the source and the channel are probabilistically matched so that symbol-by-symbol coding is optimal in terms of the average distortion achieved, we show that it also achieves the dispersion of joint source-channel codi...

This paper shows new finite-blocklength converse bounds applicable to lossy source coding as well as joint source-channel coding, which are tight enough not only to prove the strong converse, but to find the rate-dispersion functions in both setups. In order to state the converses, we introduce the d-tilted information, a random variable whose expe...

This paper considers the distribution of the optimum rate of fixed-to-variable lossless compression. It shows that in the non-asymptotic regime the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are tightly coupled.

Consider a Bernoulli-Gaussian complex n-vector whose components are X<sub>i</sub>B<sub>i</sub>, with B<sub>i</sub> ~Bernoulli-q and X<sub>i</sub> ~ CN(0; σ<sup>2</sup>), iid across i and mutually independent. This random q-sparse vector is multiplied by a random matrix U, and a randomly chosen subset of the components of average size np, p ∈ [0; 1]...

We give a general formula for the degrees of freedom of the K-user real additive-noise interference channel involving maximization of information dimension. Previous results are recovered, and even generalized in certain cases with simplified proofs. Connections to fractal geometry are drawn.

The backoff from capacity due to finite blocklength can be assessed accurately from the channel dispersion. This paper analyzes the dispersion of a single-user, scalar, coherent fading channel with additive Gaussian noise. We obtain a convenient two-term expression for the channel dispersion which shows that, unlike the capacity, it depends crucial...

This paper studies the minimum achievable source coding rate as a function of blocklength n and tolerable distortion level d. Tight general achievability and converse bounds are derived that hold at arbitrary fixed blocklength. For stationary memoryless sources with separable distortion, the minimum rate achievable is shown to be q closely approxim...

If N is standard Gaussian, the minimum mean square error (MMSE) of estimating a random variable X based on √( snr ) X + N vanishes at least as fast as 1/ snr as snr → ∞. We define the MMSE dimension of X as the limit as snr → ∞ of the product of snr and the MMSE. MMSE dimension is also shown to be the asymptotic ratio of nonlinear MMSE to linear MM...

The minimum expected length for fixed-to-variable length encoding of an n-block memoryless source with entropy H grows as nH + O(1), where the term O(1) lies between 0 and 1. However, this well-known performance is obtained under the implicit constraint that the code assigned to the whole n-block is a prefix code. Dropping the prefix constraint, wh...

We explore the duality between lossy compression and channel coding in the operational sense: whether a capacity-achieving encoder-decoder sequence achieves the rate-distortion function of the dual problem when the channel decoder [encoder] is the source compressor [decompressor, resp.], and vice versa. We show that, if used as a lossy compressor,...

Channel dispersion plays a fundamental role in assessing the backoff from capacity due to finite blocklength. This paper analyzes the channel dispersion for a simple channel with memory: the Gilbert-Elliott communication model in which the crossover probability of a binary symmetric channel evolves as a binary symmetric Markov chain, with and witho...

Channel dispersion plays a fundamental role in assessing the backoff from capacity due to finite blocklength. This paper analyzes the channel dispersion for a simple channel with memory: the Gilbert-Elliott communication model in which the crossover probability of a binary symmetric channel evolves as a binary symmetric Markov chain, with and witho...

The objectives of this article are two-fold: First, to present the problem of joint source and channel (JSC) coding from a graphical model perspective and second, to propose a structure that uses a new graphical model for jointly encoding and decoding a redundant source. In the first part of the article, relevant contributions to JSC coding, rangin...

Fano's inequality relates the error probability of guessing a finitely-valued random variable X given another random variable Y and the conditional entropy of X given Y. It is not necessarily tight when the marginal distribution of X is fixed. This paper gives a tight upper bound on the conditional entropy of X given Y in terms of the error probabi...

Denote by C<sub>m</sub>(snr) the Gaussian channel capacity with signal-to-noise ratio SNR and input cardinality m. We show that as m grows, C<sub>m</sub>(snr) approaches C(snr) = 1/2 log(1 + snr) exponentially fast. Lower and upper bounds on the exponent are given as functions of SNR. We propose a family of input constellations based on the roots o...

A random variable with distribution P is observed in Gaussian noise and is estimated by a minimum meansquare estimator that assumes that the distribution is Q. This paper shows that the integral over all signal-to-noise ratios of the excess mean-square estimation error incurred by the mismatched estimator is twice the relative entropy D(P∥Q). This...

We explore the duality between the Gelfand-Pinsker problem of channel coding with side information at the transmitter and the Wyner-Ziv problem of lossy compression with side information at the decompressor in the operational sense: whether a capacity-achieving encoder-decoder sequence achieves the rate distortion function of the dual problem when...

We consider the Wyner-Ziv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by Lempel-Ziv (LZ) compression, while the decom...

In this paper, we investigate the linear precoding and power allocation policies that maximize the mutual information for general multiple-input-multiple-output (MIMO) Gaussian channels with arbitrary input distributions, by capitalizing on the relationship between mutual information and minimum mean-square error (MMSE). The optimal linear precoder...

The energy-distortion tradeoff for lossy transmission of sources over multi-user networks is studied. The energy-distortion function E(D) is defined as the minimum energy required to transmit a source to the receiver within the target distortion D, when there is no restriction on the number of channel uses per source sample. For point-to-point chan...

We find the capacity of discrete-time channels subject to both frequency-selective and time-selective fading, where the channel output is observed in additive Gaussian noise. A coherent model is assumed where the fading coefficients are known at the receiver. Capacity depends on the first-order distributions of the fading processes in frequency and...

The energy-distortion function E(D) for the joint source-channel coding problem in networks is defined and studied. The energy-distortion function E(D) is defined as the minimum energy required to transmit a source to a receiver within the target distortion D, when there is no restriction on the number of channel uses per source sample. For point-t...

This paper considers a three-terminal communication problem with one source node which broadcasts a common message to two destination nodes over a wireless medium. The destination nodes can cooperate over bidirectional wireless links. We study the minimum energy per information bit for this setup when there is no constraint on the available bandwid...

In Shannon theory, lossless source coding deals with the optimal compression of discrete sources. Compressed sensing is a lossless coding strategy for analog sources by means of multiplication by real-valued matrices. In this paper we study almost lossless analog compression for analog memoryless sources in an information-theoretic framework, in wh...

The minimum block-length required to achieve a given rate and error probability can be easily and tightly approximated from two key channel parameters: the capacity and the channel dispersion. The channel dispersion gauges the variability of the channel relative to a deterministic bit pipe with the same capacity. This paper finds the dispersion of...

We consider scaling laws for maximal energy efficiency of communicating a message to all the nodes in a random wireless network, as the number of nodes in the network becomes large. Two cases of large wireless networks are studied - dense random networks and constant density (extended) random networks. We first establish an information-theoretic lo...

Conventional wisdom states that the minimum expected length for fixed-to-variable length encoding of an n-block memoryless source with entropy H grows as nH+O(1). However, this performance is obtained under the constraint that the code assigned to the whole n-block is a prefix code. Dropping this unnecessary constraint we show that the minimum expe...

We propose a scheme for lossy compression of discrete memoryless sources: The compressor is the decoder of a nonlinear channel code, constructed from a sparse graph. We prove asymptotic optimality of the scheme for any separable (letter-by-letter) bounded distortion criterion. We also present a suboptimal compression algorithm, which exhibits near-...

## Citations

... The seventh paper by Verdú [46] is a research and tutorial paper on error exponents and α-mutual information. Similarly to [23] (the second paper in this Special Issue), it relates to Rényi's generalization of the relative entropy and mutual information. ...

... Previous solutions for the Gaussian [11,12] and the binary symmetric [13] cases were based on rearrangement inequalities or constrained optimal transport in 2-norm, which are specialized to those channel distributions. Other approaches based on measure concentration [14][15][16][17] and reverse hypercontractivity [18] (building on a method of [19][20][21]) apply for general channels but are not strong enough in the regime of interest for Cover's problem. ...

... In recent years, tools from noncommutative analysis have proven fruitful in understanding the primitives of Quantum Information Theory (QIT), where-mostly due to the noncommutative nature of the theory-even finding viable quantum analogs of certain informationtheoretic quantities turns out to be nontrivial, and various challenges arise to be overcome (see, e.g., [11,10,40,5,2]). Analytical methods have been exploited in classical information theory just as well, where the quantum phenomena are nonexistent (see, e.g., [35,23,20,24]). ...

... We note that while the conditional relative entropy in (1.6) is the expectation of D(P Y |X (·|X) Q Y |X (·|X)) over X ∼ P X , the conditional Rényi divergence in (1.7) depends on D 1+s (P Y |X (·|X) Q Y |X (·|X)) in a more involved way; indeed, it is a generalized mean of the random variable D 1+s (P Y |X (·|X) Q Y |X (·|X)) evaluated at s. For a more detailed discussion on this point, the reader is referred to Cai and Verdú [32]. We also note that there are other definitions of the conditional Rényi divergence but we will use the definition in (1.7) in this monograph; see [20], [43], [155]. ...

... Further, in order to extend the linear distortion measure as the average distortion to nonlinear distortion measures with respect to the distortion of each data point, f-separable distortion measures using f-mean has been proposed. In particular, for this distortion measure, the rate-distortion function showing the limit of lossy compression was elucidated [17]. ...

... (5.8.19)] for the BSC. The notion of pefect and quasi-perfect codes was generalized beyond binary alphabets in [10]. These codes, whenever they exist, attain the hypothesis-testing bound [7,Th. ...

... , L} are selected upon observing all the random variables U M and V L . A duality between covering and sampling recently observed in [7] shows that the approximation error in total variation is precisely characterized by the left side of (4). Based on this duality observation, we derive the exact error exponent as well as the second-order rates (for a nonvanishing error) of joint distribution simulation of stationary memoryless sources. ...

Reference: Sharp Bounds for Mutual Covering

... More or less motivated by this, the Gaussian optimality problem spurred a lot of research interests recently, e.g. [5][13] [35][11] [41][26] [3]. It was known that Gaussian inputs are optimal for computing the corners of the region [44][9] [43][5] [11] [24][26] [25], but the full region or even the precise slope at Costa's corner point remains open [26]. ...

... Proof: We start from equation (12). The asymptotics of the α-mutual information term indirectly follows from the proof of Theorem 2 in [17]: from [17,Equation (80)] onwards, it is proved that ...

Reference: Alpha-NML Universal Predictors

... [t] in (36) can be eliminated, and the resulting directed information is zero because different components of the vector Y k i are independent due to (18), (19); (38) is by substituting (34); (39) ...

Reference: The CEO problem with inter-block memory