Luc DevroyeMcGill University | McGill · School of Computer Science
Luc Devroye
Doctor of Philosophy
Teaching, research.
Papers: https://luc.devroye.org/devs.html
Books: https://luc.devroye.org/books-luc.html
About
533
Publications
37,098
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,556
Citations
Publications
Publications (533)
The Colijn--Plazzotta ranking is a bijective encoding of the unlabeled binary rooted trees with positive integers. We show that the rank $f(t)$ of a tree $t$ is closely related to its height $h$, the length of the longest path from a leaf to the root. We consider the rank $f(\tau_n)$ of a random $n$-leaf tree $\tau_n$ under each of three models: (i...
We consider the problem of structure recovery in a graphical model of a tree where some variables are latent. Specifically, we focus on the Gaussian case, which can be reformulated as a well-studied problem: recovering a semi-labeled tree from a distance metric. We introduce randomized procedures that achieve query complexity of optimal order. Addi...
In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficientl...
Recommendation systems are pivotal in aiding users amid vast online content. Broutin, Devroye, Lugosi, and Oliveira proposed Subtractive Random Forests (\textsc{surf}), a model that emphasizes temporal user preferences. Expanding on \textsc{surf}, we introduce a model for a multi-choice recommendation system, enabling users to select from two indep...
We provide uniformly efficient random variate generators for a collection of distributions for the hits of the symmetric stable process in $$\mathbb {R}^d$$ R d .
Linear probing continues to be one of the best practical hashing algorithms due to its good average performance, efficiency, and simplicity of implementation. However, the worst-case performance of linear probing seems to degrade with high load factors due to a primary-clustering tendency of one collision to cause more nearby collisions. It is know...
We introduce linear probing hashing schemes that construct a hash table of size $n$, with constant load factor $\alpha$, on which the worst-case unsuccessful search time is asymptotically almost surely $O(\log \log n)$. The schemes employ two linear probe sequences to find empty cells for the keys. Matching lower bounds on the maximum cluster size...
A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all node...
We propose a simple algorithm to generate random variables described by densities equaling squared Hermite functions. Using results from random matrix theory, we utilize this to generate a randomly chosen eigenvalue of a matrix from the Gaussian Unitary Ensemble (GUE) in sublinear expected time in the RAM model.
Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random var...
We study several parameters of a random Bienaymé–Galton–Watson tree $T_n$ of size $n$ defined in terms of an offspring distribution $\xi$ with mean $1$ and nonzero finite variance $\sigma ^2$ . Let $f(s)=\mathbb{E}\{s^\xi \}$ be the generating function of the random variable $\xi$ . We show that the independence number is in probability asymptotic...
This note defines a notion of multiplicity for nodes in a rooted tree and
presents an asymptotic calculation of the maximum multiplicity over all leaves
in a Bienaym\'e-Galton-Watson tree with critical offspring distribution $\xi$,
conditioned on the tree being of size $n$. In particular, we show that if $S_n$
is the maximum multiplicity in a condi...
We propose a novel, simple density estimation algorithm for bounded monotone densities with compact support under a cellular restriction. We show that its expected error ($L_1$ distance) converges at a rate of $n^{-1/3}$, that its expected runtime is sublinear and, in doing so, find a connection to the theory of Galton--Watson processes.
Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton–Watson tree and calculate its probability of being correct.
We revisit the problem of the estimation of the differential entropy
$H(f)$
of a random vector
$X$
in
$R^{d}$
with density
$f$
, assuming that
$H(f)$
exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate
$H_{n}$
of Kozachenko and Leonenko. Without any smoothness condition we show that the...
We study several parameters of a random Bienaymé-Galton-Watson tree Tn of size n defined in terms of an offspring distribution ξ with mean 1 and nonzero finite variance σ 2. Let f (s) = E{s ξ } be the generating function of the random variable ξ. We show that the independence number is in probability asymptotic to qn, where q is the unique solution...
This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaym\'e-Galton-Watson tree with critical offspring distribution $\xi$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a condi...
We provide a uniformly efficient and simple random variate generator for the truncated negative gamma distribution restricted to any interval.
We revisit the problem of the estimation of the differential entropy $H(f)$ of a random vector $X$ in $R^d$ with density $f$, assuming that $H(f)$ exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate $H_n$ of Kozachenko and Leonenko. Without any smoothness condition we show that the estimate is consi...
We study the problem of estimating the common mean $\mu$ of $n$ independent symmetric random variables with different and unknown standard deviations $\sigma_1 \le \sigma_2 \le \cdots \le\sigma_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehat{\mu}$ such that it is invariant...
The Horton-Strahler number of a tree is a measure of its branching complexity; it is also known in the literature as the register function. We show that for critical Galton-Watson trees with finite variance conditioned to be of size $n$, the Horton-Strahler number grows as $\frac{1}{2}\log_2 n$ in probability. We further define some generalizations...
We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (Pemantle’s Min-Plus Binary Tree, 2017. arXiv:1709.07849 [math.PR]), and to the critical random hierarchical latti...
Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton-Watson tree and calculate its probability of being correct.
We study the broadcasting problem when the underlying tree is a random recursive tree. The root of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability $1-q$ and the opposite value with probability $q$, where $q \in [0,1]$. The broadcasting problem consists in estimating the value of the...
This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $\epsilon$ to grow close to every point in the $d$-dimensional unit cube is $\Theta\left(\frac1{\epsilon^d} \log \left(\frac1\epsilon\right)\right)$. Also, the time it takes for the t...
Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget...
A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element U. The value of the root is the key quantity of interest in general. In this study, all node values and f...
We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (2017; arXiv:1709.07849), and to the critical random hierarchical lattice studied by Hambly and Jordan (2004). We...
We define the (random) -cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon [14] except now a node must be cut times before it is destroyed. The first order terms of the expectation and variance of , the -cut number of a path of length , are proved. We also...
We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann’s rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between th...
We study the height of a spanning tree T of a graph G obtained by starting with a single vertex of G and repeatedly selecting, uniformly at random, an edge of G with exactly one endpoint in T and adding this edge to T.
We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes.
We prove a lower bound and an upper bound for the total variation distance between two high-dimensional Gaussians, which are within a constant factor of one another.
We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge o...
We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann's rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between th...
Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multi...
A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node valu...
We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The $k$-cut number of a path of length $n$, $\mathcal{X}_n$, is a generalization of the concept of records in perm...
We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distri...
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
Bousquet, Lochet and Thomass\'e recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we pr...
Bousquet, Lochet and Thomass\'e recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we pr...
We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$.
In this paper we study the problem of estimating a function from n noiseless observations of function values at randomly chosen points. These points are independent copies of a random variable whose density is bounded away from zero on the unit cube and vanishes outside. The function to be estimated is assumed to be (p,C)-smooth, i.e., (roughly spe...
In 1952, von Neumann introduced the rejection method for random variate
generation. We revisit this algorithm when we have a source of perfect bits at
our disposal. In this random bit model, there are universal lower bounds for
generating a random variate with a given density to within an accuracy
$\epsilon$ derived by Knuth and Yao, and refined by...
Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget...
Ionizing radiation interacts with the water molecules of the tissues mostly by ionizations and excitations, which result in the formation of the radiation track structure and the creation of radiolytic species such as H.,.OH, H2, H2O2, and e⁻aq. After their creation, these species diffuse and may chemically react with the neighboring species and wi...
Let $T$ be an infinite rooted tree with weights $w_e$ assigned to its edges.
Denote by $m_n(T)$ the minimum weight of a path from the root to a node of the
$n$th generation. We consider the possible behaviour of $m_n(T)$ with focus on
the two following cases: we say $T$ is explosive if \[ \lim_{n\to \infty}m_n(T)
< \infty, \] and say that $T$ exhib...
We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tr...
We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as f...
We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as f...
$n$ independent random points drawn from a density $f$ in $R^d$ define a
random Voronoi partition. We study the measure of a typical cell of the
partition. We prove that the asymptotic distribution of the probability measure
of the cell centered at a point $x \in R^d$ is independent of $x$ and the
density $f$. We determine all moments of the asympt...
n$ independent random points drawn from a density $f$ in $R^d$ define a random Voronoi partition. We study the measure of a typical cell of the partition. We prove that the asymptotic distribution of the probability measure of the cell centered at a point $x \in R^d$ is independent of $x$ and the density $f$. We determine all moments of the asympto...
If X
i
and X
j
are equidistant from x, i.e., if \(\|\mathbf{X}_{i} -\mathbf{x}\| =\| \mathbf{X}_{j} -\mathbf{x}\|\) for some i ≠ j, then we have a distance tie. By convention, ties are broken by comparing indices, that is, by declaring that X
i
is closer to x than X
j
whenever i < j.
This chapter is devoted to the study of the uniform consistency properties of the k-nearest neighbor density estimate f
n
. Before embarking on the supremum norm convergence, it is useful to understand the behavior of f
n
on bounded densities. We denote the essential supremum (with respect to the Lebesgue measure \(\lambda\)) of the density f by
We start with some basic properties of uniform order statistics. For a general introduction to probability, see Grimmett and Stirzaker (2001). Some of the properties of order statistics presented in this chapter are covered by Rényi (1970); Galambos (1978), and Devroye (1986).
A random vector X taking values in \(\mathbb{R}^{d}\) has a (probability) density f with respect to the Lebesgue measure if, for all Borel sets \(A \subseteq \mathbb{R}^{d}\), \(\mathbb{P}\{\mathbf{X} \in A\} =\int _{A}f(\mathbf{x})\mbox{ d}\mathbf{x}\).
In other words, if A is a small ball about x, the probability that X falls in A is about f(x) t...
Differential entropy, or continuous entropy, is a concept in information theory related to the classical (Shannon) entropy (Shannon, 1948). For a random variable with density f on \(\mathbb{R}^{d}\), it is defined by $$\displaystyle{ \mathcal{E}(f) = -\int _{\mathbb{R}^{d}}f(\mathbf{x})\log f(\mathbf{x})\mbox{ d}\mathbf{x}, }$$ (7.1)
when this inte...
Let (X, Y ) be a pair of random variables taking values in \(\mathbb{R}^{d} \times \mathbb{R}\). The goal of regression analysis is to understand how the values of the response variable Y depend on the values of the observation vector X. The objective is to find a Borel measurable function g such that | Y − g(X) | is small, where “small” could be d...
Supervised classification
(also called pattern recognition
, discrimination, or class prediction) is a specific regression problem, where the observation X takes values in \(\mathbb{R}^{d}\) and the random response Y takes values in {0, 1}. Given X, one has to guess the value of Y (also termed the label
or class), and this guess is called a decisio...
In this chapter, \((\mathbf{X},Y ) \in \mathbb{R}^{d} \times \{ 0,1\}\), and \((\mathbf{X}_{1},Y _{1}),\mathop{\ldots },(\mathbf{X}_{n},Y _{n})\) are reordered according to increasing values of \(\|\mathbf{X}_{i} -\mathbf{x}\|\). Ties are broken as for regression. The reordered sequence is denoted by \((\mathbf{X}_{(1)}(\mathbf{x}),Y _{(1)}(\mathbf...
Our objective in this short chapter is to analyze some elementary consistency properties of the 1-nearest neighbor regression function estimate. This will also offer the opportunity to familiarize the reader with concepts that will be encountered in the next few chapters. Recall that this very simple estimation procedure is defined by setting $$\di...
The supremum creates two problems—first of all, by moving x about \(\mathbb{R}^{d}\), the data ordering changes. We will count the number of possible data permutations in the second section. Second, we need a uniform condition on the “noise” Y − r(X) so that the averaging done by the weights v
ni
is strong enough. This is addressed in the third sec...
Various properties of \(U_{(1)},\mathop{\ldots },U_{(n)}\), uniform [0, 1] order statistics, will be needed in the analysis that follows. These are collected in the present chapter. The first group of properties is directly related to U
(i) (1 ≤ i ≤ n), while the second group deals with random linear combinations of them.
There are different ways to weigh or smooth the k-nearest neighbor density estimate. Some key ideas are surveyed in this chapter. For some of them, consistency theorems are stated.
No study of a density estimate is complete without a discussion of the local behavior of it. That is, given a certain amount of smoothness at x, how fast does f
n
(x) tend to f(x)? It is clear that for any sequence of density estimates, and any sequence \(a_{n} \downarrow 0\), however slow, there exists a density f with x a Lebesgue point of f, suc...
Classical function estimation deals with the estimation of a function r on \(\mathbb{R}^{d}\) from a finite number of points \(\mathbf{x}_{1},\mathop{\ldots },\mathbf{x}_{n}\). Some applications are concerned with L
p
errors with respect to the Lebesgue measure on compacts. Others use it for Monte Carlo purposes, wanting to estimate \(\int _{A}r(\m...
Selecting the estimate within a class of estimates that is optimal in a certain sense is perhaps the ultimate goal of nonparametric estimation. It assumes that the class of estimates is sufficiently rich within the universe of all possible estimates. That the nearest neighbor regression function estimate is rich as a class follows not only from the...
In this chapter, we study the local rate of convergence of r
n
(x) to r(x). We obtain full information on the first asymptotic term of r
n
(x) − r(x), and are rewarded with (i) a central limit theorem for r
n
(x) − r(x), and (ii) a way of helping the user decide how to choose the weights v
ni
of the estimate.
Given weights \((v_{n1},\mathop{\ldots },v_{nn})\) satisfying \(\sum _{i=1}^{n}v_{ni} = 1\), the nearest neighbor classifier is defined for \(\mathbf{x} \in \mathbb{R}^{d}\) by $$\displaystyle{ g_{n}(\mathbf{x}) = \left \{\begin{array}{ll} 1&\mbox{ if $\sum _{i=1}^{n}v_{ni}Y _{(i)}(\mathbf{x}) > 1/2$} \\ 0&\mbox{ otherwise.} \end{array} \right. }
Theorem 11.1 below is a slight extension of a theorem due to Devroye (1981a). It offers sufficient conditions on the probability weight vector guaranteeing that the (raw) nearest neighbor estimate (8.2) satisfies, for all p ≥ 1.
We know that, whenever \(\mathbb{E}Y ^{2} < \infty \), the regression function \(r(\mathbf{x}) = \mathbb{E}[Y \vert \mathbf{X} = \mathbf{x}]\) achieves the minimal value \(L^{\star } = \mathbb{E}\vert Y - r(\mathbf{X})\vert ^{2}\) of the L
2 risk over all square-integrable functions of X. It is also easy to show, using the independence of (X, Y ) a...
We discuss the possibilities and limitations of estimating the mean of a
real-valued random variable from independent and identically distributed
observations from a non-asymptotic point of view. In particular, we define
estimators with a sub-Gaussian behavior even for certain heavy-tailed
distributions. We also prove various impossibility results...
We study online combinatorial optimization problems that a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the follow-the-perturbed-leader algorithm in which the cumulative losses are perturbed by independent symmetric random walks. In the general setting,...
In this paper we explore maximal deviations of large random structures from
their typical behavior. We introduce a model for a high-dimensional random
graph process and ask analogous questions to those of Vapnik and Chervonenkis
for deviations of averages: how "rich" does the process have to be so that one
sees atypical behavior. In particular, we...
Several computer codes simulating chemical reactions in particles systems are based on the Green's functions of the diffusion equation (GFDE). Indeed, many types of chemical systems have been simulated using the exact GFDE, which has also become the gold standard for validating other theoretical models. In this work, a simulation algorithm is prese...
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter
alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$
labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high
probability (whp) in a random $k$-out digraph there is a strongly connected
component (SCC) of linear size that is reacha...
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reacha...
We study the problem of the generation of a continuous random variable when a
source of independent fair coins is available. We first motivate the choice of
a natural criterion for measuring accuracy, the Wasserstein $L_\infty$ metric,
and then show a universal lower bound for the expected number of required fair
coins as a function of the accuracy...
Consider the convex set Rn of semi positive definite matrices of order n with diagonal (1,…,1): If m is a distribution in Rn with second moments, denote by R(µ) Î Rn its correlation matrix. Denote by Cn the set of distributions in [0,1]nwith all margins uniform on [0,1] (called copulas). The paper proves that (Formula presented.) is a surjection fr...
John Bell has shown that the correlations entailed by quantum mechanics cannot be reproduced by a classical process involving non-communicating parties. But can they be simulated with the help of bounded communication? This problem has been studied for more than two decades, and it is now well understood in the case of bipartite entanglement. Howev...