Chapter

Lectures on random geometric graphs

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The theory of random graphs is a vital part of the education of any researcher entering the fascinating world of combinatorics. However, due to their diverse nature, the geometric and structural aspects of the theory often remain an obscure part of the formative study of young combinatorialists and probabilists. Moreover, the theory itself, even in its most basic forms, is often considered too advanced to be part of undergraduate curricula, and those who are interested usually learn it mostly through self-study, covering a lot of its fundamentals but little of the more recent developments. This book provides a self-contained and concise introduction to recent developments and techniques for classical problems in the theory of random graphs. Moreover, it covers geometric and topological aspects of the theory and introduces the reader to the diversity and depth of the methods that have been devised in this context.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The notion of coverage threshold is analogous to that of connectivity threshold in the theory of random geometric graphs [21]. Our results show that the threshold for full coverage by the balls B(X i , r ), is asymptotically twice the threshold for the union of these balls to be connected, if A o is connected, at least when A has a smooth boundary or A is a convex polytope. ...
... Our results show that the threshold for full coverage by the balls B(X i , r ), is asymptotically twice the threshold for the union of these balls to be connected, if A o is connected, at least when A has a smooth boundary or A is a convex polytope. This can be seen from comparison of Theorem 4.1 above with [21,Theorem 13.7], and comparison of Corollary 4.5 above with [22,Theorem 2.5]. ...
... We shall repeatedly use the following lemma. It is based on what in [21] was called the 'subsequence trick'. This result says that if an array of random variables U n,k is monotone in n and k, and U n,k(n) , properly scaled, converges in probability to a constant at rate n −ε , one may be able to improve this to almost sure convergence. ...
Article
Full-text available
Let X1,X2,X_1,X_2, \ldots X 1 , X 2 , … be independent random uniform points in a bounded domain ARdA \subset \mathbb {R}^d A ⊂ R d with smooth boundary. Define the coverage threshold RnR_n R n to be the smallest r such that A is covered by the balls of radius r centred on X1,,XnX_1,\ldots ,X_n X 1 , … , X n . We obtain the limiting distribution of RnR_n R n and also a strong law of large numbers for RnR_n R n in the large- n limit. For example, if A has volume 1 and perimeter A|\partial A| | ∂ A | , if d=3 d = 3 then P[nπRn3logn2log(logn)x]\mathbb {P}[n\pi R_n^3 - \log n - 2 \log (\log n) \le x] P [ n π R n 3 - log n - 2 log ( log n ) ≤ x ] converges to exp(24π5/3Ae2x/3)\exp (-2^{-4}\pi ^{5/3} |\partial A| e^{-2 x/3}) exp ( - 2 - 4 π 5 / 3 | ∂ A | e - 2 x / 3 ) and (nπRn3)/(logn)1(n \pi R_n^3)/(\log n) \rightarrow 1 ( n π R n 3 ) / ( log n ) → 1 almost surely, and if d=2 d = 2 then P[nπRn2lognlog(logn)x]\mathbb {P}[n \pi R_n^2 - \log n - \log (\log n) \le x] P [ n π R n 2 - log n - log ( log n ) ≤ x ] converges to exp(exAπ1/2ex/2)\exp (- e^{-x}- |\partial A|\pi ^{-1/2} e^{-x/2}) exp ( - e - x - | ∂ A | π - 1 / 2 e - x / 2 ) . We give similar results for general d , and also for the case where A is a polytope. We also generalize to allow for multiple coverage. The analysis relies on classical results by Hall and by Janson, along with a careful treatment of boundary effects. For the strong laws of large numbers, we can relax the requirement that the underlying density on A be uniform.
... To prove the CLT, we rely on dependency graphs as in [28, Theorem 2.4]. More recently, [23] developed the Malliavin-Stein calculus to get bounds for a normal approximation. ...
... Thus, it remains to establish the restricted multivariate CLT in Lemma 4.3. The key idea is to proceed as in [27] and rely on Stein's method in the form of [28,Theorem 2.4]. 170 Proof of Lemma 4.3. ...
... Therefore, we can assume without loss of generality, that Var[cA ′ d,t ] → Var[cY t ] = 0. It remains to show that cA ′ d,t ⇒ cY t , which is equivalent to showing that (cA d,t,k0 − E cA d,t,k0 )/ Var[cA d,t,k0 ] converges in distribution to a standard normal random variable.For this we will write cA d,t,k0 as a sum of local contributions and apply Stein's method as presented in[28, Theorem 2.4]. First, by additivity, ...
Preprint
We study topological and geometric functionals of ll_\infty-random geometric graphs on the high-dimensional torus in a sparse regime, where the expected number of neighbors decays exponentially in the dimension. More precisely, we establish moment asymptotics, functional central limit theorems and Poisson approximation theorems for certain functionals that are additive under disjoint unions of graphs. For instance, this includes simplex counts and Betti numbers of the Rips complex, as well as general subgraph counts of the random geometric graph. We also present multi-additive extensions that cover the case of persistent Betti numbers of the Rips complex.
... In particular, the ER graph fails in describing clustering properties of graphs in which the geographical distance is a critical factor, as for example wireless ad-hoc network [7], sensor network [8], and the study of the dynamics of a viral spreading in a specific network of interactions [9], [10]. To properly model such networks, we consider a special class of graphs known as random geometric graphs (RGGs) [11]. Another very important motivation for the study of RGGs is their applications to statistics and learning. ...
... When p = ∞, we obtain the Chebyshev distance, i.e., the maximum of the differences between the coordinates in any dimension of the two points. Such graphs, denoted by G(X n , r n ), are called RGGs and are extensively discussed in [11]. Typically, the function r n is chosen such that r n → 0 when n → ∞. ...
... Let θ (d) denote the volume of the d-dimensional unit hypersphere in T d . Then, the average vertex degree in G(X n , r n ) is equal to a n = θ (d) nr d n [11]. The properties of RGGs, including spectral properties, often depend on the average vertex degree a n . ...
Article
Full-text available
In this work, we study the spectrum of the normalized Laplacian and its regularized version for random geometric graphs (RGGs) in various scaling regimes. Two scaling regimes are of special interest, the connectivity and the thermodynamic regime. In the connectivity regime, the average vertex degree grows logarithmically in the graph size or faster. In the thermodynamic regime, the average vertex degree is a constant. We introduce a deterministic geometric graph (DGG) with nodes in a grid and provide an upper bound to the probability that the Hilbert–Schmidt norm of the difference between the normalized Laplacian matrices of the RGG and DGG is greater than a certain threshold in both the connectivity and thermodynamic regime. Using this result, we show that the RGG and DGG normalized Laplacian matrices are asymptotically equivalent with high probability (w.h.p.) in the full range of the connectivity regime. The equivalence is even stronger and holds almost surely when the average vertex degree ana_n a n satisfies the inequality an>24log(n).a_n > 24 \log (n). a n > 24 log ( n ) . Therefore, we use the regular structure of the DGG to show that the limiting eigenvalue distribution of the RGG normalized Laplacian matrix converges to a distribution with a Dirac atomic measure at zero. In the thermodynamic regime, we approximate the eigenvalues of the regularized normalized Laplacian matrix of the RGG by the eigenvalues of the DGG regularized normalized Laplacian and we provide an error bound which is valid w.h.p. and depends upon the average vertex degree.
... If connectivity in a planner's sampled approximation is too low, then the graph maintained by the planner almost never (i.e., with probability zero) contains a solution [6], [10], [11], but if it is too high, the graph becomes expensive to search due to the high branching factor and resulting high number of edges. As such, although incremental sampling avoids the need for a priori approximations, sampling-based ASAO planners still require a user-defined connectivity metric between samples (e.g., connection radius) for efficiency. ...
... This reduces computational costs by considering fewer edges but will not fully exploit the current approximation and can lead to lower-quality solutions for a given set of samples. If this connectivity is too low then the planner will quickly exploit the approximation but almost never find a valid path [6], [10], [11], and if it is too high then the graph will almost surely contain a high quality solution but becomes increasingly expensive to search as the number of edges increases. ...
Preprint
Improving the performance of motion planning algorithms for high-degree-of-freedom robots usually requires reducing the cost or frequency of computationally expensive operations. Traditionally, and especially for asymptotically optimal sampling-based motion planners, the most expensive operations are local motion validation and querying the nearest neighbours of a configuration. Recent advances have significantly reduced the cost of motion validation by using single instruction/multiple data (SIMD) parallelism to improve solution times for satisficing motion planning problems. These advances have not yet been applied to asymptotically optimal motion planning. This paper presents Fully Connected Informed Trees (FCIT*), the first fully connected, informed, anytime almost-surely asymptotically optimal (ASAO) algorithm. FCIT* exploits the radically reduced cost of edge evaluation via SIMD parallelism to build and search fully connected graphs. This removes the need for nearest-neighbours structures, which are a dominant cost for many sampling-based motion planners, and allows it to find initial solutions faster than state-of-the-art ASAO (VAMP, OMPL) and satisficing (OMPL) algorithms on the MotionBenchMaker dataset while converging towards optimal plans in an anytime manner.
... Thus, for a fixed m ∈ L, the events of finding a Poisson point with a stabilization radius exceeding P n in a box Q are independent for different choices of Q ∈ Q (m) . Therefore, a binomial concentration bound [14,Lemma 1.1] gives that for each m ∈ L, Proof of Lemma 9. Let m ≥ 1 and τ > 0, and let us assume that we are under the event that we would like to bound in Lemma 9. Note that by (7), under {R 3n (X) ≤ n} we have that ...
... where in the last line we used[14, Lemma 1.3]. By (48) we can assume n large enough so thatP (H n (W n , W n ) > μ α − ε/2 | X(W n ) = b d n ≥ 1/2.Hence, combining this with (44), we get that − n d−1 e −( log n) d ( log n) dc max +2 + e − 1 2 ( log n) 2d , as asserted. ...
Article
We study the large-volume asymptotics of the sum of power-weighted edge lengths eEeα\sum_{e \in E}|e|^\alpha in Poisson-based spatial random networks. In the regime α>d\alpha > d , we provide a set of sufficient conditions under which the upper-large-deviation asymptotics are characterized by a condensation phenomenon, meaning that the excess is caused by a negligible portion of Poisson points. Moreover, the rate function can be expressed through a concrete optimization problem. This framework encompasses in particular directed, bidirected, and undirected variants of the k -nearest-neighbor graph, as well as suitable β\beta -skeletons.
... • The random geometric model (GEO) represents proximity relationships between uniformly distributed points in a space (Penrose, 2003). GEO networks are gen-erated by uniformly distributing n points (nodes) in 3-dimensional space and by connecting nodes by edges if the Euclidean distances between the corresponding points are lower than or equal to threshold r, which is set so to obtain the edge density that similar to that of the real network. ...
... Erdős-Rènyi random graphs (ER) (Erdős Paul and Rényi Alfréd, 1959), generalized random graphs with the degree distribution matching to the input graph (ER-DD) (Newman, 2010), Barabási-Albert scale-free networks (SF-BA) (Barabási and Albert, 1999), scalefree networks that model gene duplication and mutation events (SF-GD) (Vazquez et al., 2001), geometric random graphs (GEO) (Penrose, 2003), geometric graphs that model gene duplications and mutations (GEO-GD) (Pržulj et al., 2010), and stickiness-index based networks (Sticky) (Pržulj and Higham, 2006). As the real biological networks have power-law degree distributions (Jeong et al., 2001;Tong et al., 2004), the set of model networks contains four types of networks with power-law degree distribution: ER-DD, SF-BA, SF-GD and Sticky. ...
Thesis
Recent biotechnological advances have led to a wealth of biological network data. Topo- logical analysis of these networks (i.e., the analysis of their structure) has led to break- throughs in biology and medicine. The state-of-the-art topological node and network descriptors are based on graphlets, induced connected subgraphs of different shapes (e.g., paths, triangles). However, current graphlet-based methods ignore neighbourhood infor- mation (i.e., what nodes are connected). Therefore, to capture topology and connectivity information simultaneously, I introduce graphlet adjacency, which considers two nodes adjacent based on their frequency of co-occurrence on a given graphlet. I use graphlet adjacency to generalise spectral methods and apply these on molecular networks. I show that, depending on the chosen graphlet, graphlet spectral clustering uncovers clusters en- riched in different biological functions, and graphlet diffusion of gene mutation scores predicts different sets of cancer driver genes. This demonstrates that graphlet adjacency captures topology-function and topology-disease relationships in molecular networks. To further detail these relationships, I take a pathway-focused approach. To enable this investigation, I introduce graphlet eigencentrality to compute the importance of a gene in a pathway either from the local pathway perspective or from the global network perspective. I show that pathways are best described by the graphlet adjacencies that capture the importance of their functionally critical genes. I also show that cancer driver genes characteristically perform hub roles between pathways. Given the latter finding, I hypothesise that cancer pathways should be identified by changes in their pathway-pathway relationships. Within this context, I propose pathway- driven non-negative matrix tri-factorisation (PNMTF), which fuses molecular network data and pathway annotations to learn an embedding space that captures the organisation of a network as a composition of subnetworks. In this space, I measure the functional importance of a pathway or gene in the cell and its functional disruption in cancer. I apply this method to predict genes and the pathways involved in four major cancers. By using graphlet-adjacency, I can exploit the tendency of cancer-related genes to perform hub roles to improve the prediction accuracy.
... In fact, in order to simplify some of the proofs, we will work with the random geometric graph G ∈ T (n, r) equipped with the torus metric d T (·, ·) instead of d E (·, ·). For more details about these models see, for example, [17]. ...
... In order to simplify some of our proofs, we will make use of a technique known as de-Poissonization, which has many applications in geometric probability (see [17] for a detailed account of the subject). Here we only roughly sketch the idea behind it. ...
Article
Full-text available
The localization game is a two player combinatorial game played on a graph G=(V,E). The cops choose a set of vertices S1⊆V with |S1|=k. The robber then chooses a vertex v∈V whose location is hidden from the cops, but the cops learn the graph distance between the current position of the robber and the vertices in S1. If this information is sufficient to locate the robber, the cops win immediately; otherwise the cops choose another set of vertices S2⊆V with |S2|=k, and the robber may move to a neighboring vertex. The new distances to the robber are presented, and if the cops can deduce the new location of the robber based on all information they accumulated thus far, then they win; otherwise, a new round begins. If the robber has a strategy to avoid being captured, then she wins. The localization number is defined to be the smallest integer k so that the cops win the game. In this paper we determine the localization number (up to poly-logarithmic factors) of the random geometric graph G∈G(n,r) slightly above the connectivity threshold.
... A geometric graph [11] G(V, r) is a graph whose nodes are points in a metric space which are connected by an edge if their distance is below a threshold value r, called radius. Formally, let u, v ∈ V; the edge set is ...
... Clearly, these points are distributed uniformly and independently. The properties of these graphs have been studied when n → ∞ [11]. Surprisingly, certain properties of these graphs appear only when a specific number of nodes is reached. ...
Article
Full-text available
High-Throughput technologies are producing an increasing volume of data that needs large amounts of data storage, effective data models and efficient, possibly parallel analysis algorithms. Pathway and interactomics data are represented as graphs and add a new dimension of analysis, allowing, among other features, graph-based comparison of organisms’ properties. For instance, in biological pathway representation, the nodes can represent proteins, RNA and fat molecules, while the edges represent the interaction between molecules. Otherwise, biological networks such as Protein–Protein Interaction (PPI) Networks, represent the biochemical interactions among proteins by using nodes that model the proteins from a given organism, and edges that model the protein–protein interactions, whereas pathway networks enable the representation of biochemical-reaction cascades that happen within the cells or tissues. In this paper, we discuss the main models for standard representation of pathways and PPI networks, the data models for the representation and exchange of pathway and protein interaction data, the main databases in which they are stored and the alignment algorithms for the comparison of pathways and PPI networks of different organisms. Finally, we discuss the challenges and the limitations of pathways and PPI network representation and analysis. We have identified that network alignment presents a lot of open problems worthy of further investigation, especially concerning pathway alignment.
... In the case with φ = 1 [0,1] , these results were already proved in [7], but the method here provides an alternative and possibly shorter proof (the proof in [7] relies on a lengthy RSW argument from [6]). When φ = 1 [0,1] it is known [6] that θ(φ, λ c (φ)) = 0 since this case is equivalent to a Boolean model. ...
... In the case with φ = 1 [0,1] , these results were already proved in [7], but the method here provides an alternative and possibly shorter proof (the proof in [7] relies on a lengthy RSW argument from [6]). When φ = 1 [0,1] it is known [6] that θ(φ, λ c (φ)) = 0 since this case is equivalent to a Boolean model. ...
Preprint
Full-text available
Consider a 2-dimensional soft random geometric graph G(λ,s,ϕ)G(\lambda,s,\phi), obtained by placing a Poisson(λs2\lambda s^2) number of vertices uniformly at random in a square of side s, with edges placed between each pair x,y of vertices with probability ϕ(xy)\phi(\|x-y\|), where ϕ:R+[0,1]\phi: {\bf R}_+ \to [0,1] is a finite-range connection function. This paper is concerned with the asymptotic behaviour of the graph G(λ,s,ϕ)G(\lambda,s,\phi) in the large-s limit with (λ,ϕ)(\lambda,\phi) fixed. We prove that the proportion of vertices in the largest component converges in probability to the percolation probability for the corresponding random connection model, which is a random graph defined similarly for a Poisson process on the whole plane. We do not cover the case where λ\lambda equals the critical value λc(ϕ)\lambda_c(\phi).
... The Random Geometric Graph model was extended to other latent spaces such as the hypercube [0, 1] d , the Euclidean sphere or compact Lie group Méliot (2019). A large body of literature has been devoted to studying the properties of low-dimensional Random Geometric Graphs Penrose et al. (2003), Dall and Christensen (2002), Bollobás (2001). RGGs have found applications in a very large span of fields. ...
... In this direction, several works tried to identify structure in networks through testing procedure, see for example Bresler and Nagaraj (2018), Ghoshdastidar et al. (2020) or Gao and Lafferty (2017). Regarding RGGs, most of the results have been established in the low dimensional regime d ≤ 3 Ostilli and Bianconi (2015), Penrose (2016), Penrose et al. (2003), Barthélemy (2011). Goel et al. (2005) proved in particular that all monotone graph properties (i.e. ...
Preprint
The Random Geometric Graph (RGG) is a random graph model for network data with an underlying spatial representation. Geometry endows RGGs with a rich dependence structure and often leads to desirable properties of real-world networks such as the small-world phenomenon and clustering. Originally introduced to model wireless communication networks, RGGs are now very popular with applications ranging from network user profiling to protein-protein interactions in biology. RGGs are also of purely theoretical interest since the underlying geometry gives rise to challenging mathematical questions. Their resolutions involve results from probability, statistics, combinatorics or information theory, placing RGGs at the intersection of a large span of research communities. This paper surveys the recent developments in RGGs from the lens of high dimensional settings and non-parametric inference. We also explain how this model differs from classical community based random graph models and we review recent works that try to take the best of both worlds. As a by-product, we expose the scope of the mathematical tools used in the proofs.
... For instance, one could consider non-homogeneous Erdős-Rényi random graphs (the appearance of each edge is still independent, but with different probabilities) or more general graph models where the independence can be somehow localized. In particular, we think that geometric random graphs are an appealing direction to explore [16,45]. However, since our methods heavily rely on the combinatorial analysis of homogeneous Erdős-Rényi random graphs, such extension would require a different approach. ...
Preprint
Full-text available
Opinion and belief dynamics are a central topic in the study of social interactions through dynamical systems. In this work, we study a model where, at each discrete time, all the agents update their opinion as an average of their intrinsic opinion and the opinion of their neighbors. While it is well-known how to compute the stable opinion state for a given network, studying the dynamics becomes challenging when the network is uncertain. Motivated by the task of finding optimal policies by a decision-maker that aims to incorporate the opinion of the agents, we address the question of how well the stable opinions can be approximated when the underlying network is random. We consider Erd\H{o}s-R\'enyi random graphs to model the uncertain network. Under the connectivity regime and an assumption of minimal stubbornness, we show the expected value of the stable opinion E(x(G,))\mathbf{E}(x(G,\infty)) concentrates, as the size of the network grows, around the stable opinion xˉ()\bar{x}(\infty) obtained by considering a mean-field dynamical system, i.e., averaging over the possible network realizations. For both the directed and undirected graph model, the concentration holds under the \ell_{\infty}-norm to measure the gap between E(x(G,))\mathbf{E}(x(G,\infty)) and xˉ()\bar{x}(\infty). We deduce this result by studying a mean-field approximation of general analytic matrix functions. The approximation result for the directed graph model also holds for any ρ\ell_{\rho}-norm with ρ(1,)\rho\in (1,\infty), under a slightly enhanced expected average degree.
... A classical result of Steele [36] shows the convergence of the total length of the minimum spanning tree built from an independent and identically distributed (i.i.d.) sample of n points in the unit cube. There are several generalizations of this work; for notable contributions see McGivney and Yukich [22], Yukich [43], Penrose and Yukich [33], and the monograph of Penrose [29]. ...
Article
Persistent Betti numbers are a major tool in persistent homology, a subfield of topological data analysis. Many tools in persistent homology rely on the properties of persistent Betti numbers considered as a two-dimensional stochastic process (r,s)n1/2(βqr,s(K(n1/dXn))E[βqr,s(K(n1/dXn))]) (r,s) \mapsto n^{-1/2} (\beta^{r,s}_q ( \mathcal{K}(n^{1/d} \mathcal{X}_n))-\mathbb{E}[\beta^{r,s}_q ( \mathcal{K}( n^{1/d} \mathcal{X}_n))]). So far, pointwise limit theorems have been established in various settings. In particular, the pointwise asymptotic normality of (persistent) Betti numbers has been established for stationary Poisson processes and binomial processes with constant intensity function in the so-called critical (or thermodynamic) regime; see Yogeshwaran et al. (Prob. Theory Relat. Fields 167, 2017) and Hiraoka et al. (Ann. Appl. Prob. 28, 2018). In this contribution, we derive a strong stabilization property (in the spirit of Penrose and Yukich, Ann. Appl. Prob. 11, 2001) of persistent Betti numbers, and we generalize the existing results on their asymptotic normality to the multivariate case and to a broader class of underlying Poisson and binomial processes. Most importantly, we show that multivariate asymptotic normality holds for all pairs (r, s), 0rs<0\le r\le s<\infty, and that it is not affected by percolation effects in the underlying random geometric graph.
... To emulate role-based trophic grouping structure in food webs, e.g., between predators, herbivores, and primary producers, the direction of the generated edges were chosen to replicate expected hierarchical structure in food webs (see Materials & Methods). To generate anchor networks with network structure that is fully determined by node attributes, we used the random geometric graph model (RGG) [80] to generate networks with assortative or disassortative structure. Assortative structure is found in food webs when, e.g., node attributes related to environmental conditions correlate with interaction probability (e.g., fish swimming depth [37]). ...
Preprint
Full-text available
Networks are a powerful way to represent the complexity of complex ecological systems. However, most ecological networks are incompletely observed, e.g., food webs typically contain only partial lists of species interactions. Computational methods for inferring such missing links from observed networks can facilitate field work and investigations of the ecological processes that shape food webs. Here, we describe a stacked generalization approach to predicting missing links in food webs that can learn to optimally combine both structural and trait-based predictions, while accounting for link direction and ecological assumptions. Tests of this method on synthetic food webs show that it performs very well on networks with strong group structure, strong trait structure, and various combinations thereof. Applied to a global database of 290 food webs, the method often achieves near-perfect performance for missing link prediction, and performs better when it can exploit both species traits and patterns in connectivity. Furthermore, we find that link predictability varies with ecosystem type, correlates with certain network characteristics like size, and is principally driven by a subset of ecologically-interpretable predictors. These results indicate broad applicability of stacked generalization for studying ecological interactions and understanding the processes that drive link formation in food webs.
... Для стратегий удаления узлов рассмотрим три типа графов в качестве начальных: 1) геометрический граф; 2) граф, порождаемый ПП; 3) граф, порождаемый КП с параметрами ( , ) = (1, 1), см. рис. 1. Напомним, что геометрический граф ненаправленный, в котором узлов расположены равномерно в квадрате с ребром единица, где ребро соединяет два узла только если расстояние между ними меньше чем [29]. ...
Article
Изучается эволюция случайной сети моделями предпочтительного (preferential attachment), кластерного (clustering attachment) и смешанного присоединений для формирования связей вновь присоединенных узлов с существующими узлами. Рассматриваются стратегии удаления узла на каждом шаге эволюции сети: 1) без удаления узлов и связей; 2) удаление наименее влиятельного узла среди наиболее "старых", где в качестве меры влиятельности узла используется его пейджранг; 3) удаление узла с вероятностью, обратно пропорциональной числу его связей. Для этих стратегий удаления моделированием сравниваются зависимости двух характеристик случайных сетей: числа связей узлов и числа их треугольников (т.е. троек связанных узлов, в которые узел вовлечен) и поведение кластерных коэффициентов узлов. Оценивается тяжесть хвоста распределения для числа связей и треугольников. Смешанное кластерно-предпочтительное присоединение предлагается впервые. The evolution of a random network by models of preferential, clustering and mixed attachments to form links between newly appending nodes and existing nodes is studied. Strategies of node deletion at each step of network evolution are considered: 1) without node and edge deletion;~2) deletion the least influential node among the most 'old', where the node's PageRank is used as a measure of the node's influence;~3) deletion a node with a probability inversely proportional to the node degree. For these deletion strategies the dependence of two characteristics of random networks, namely, the node degrees and node triangle counts (that is, the triples of interconnected nodes in which the node is involved), and the behavior of clustering coefficients of nodes are compared by simulation. The heaviness of the distribution tails for the node degrees and the node triangle counts is estimated. The mixed clustering-preferential attachment is proposed here for the first time.
... Assumption 4.1 requires that the cluster covariates {ξ j } m j=1 are independent and identically distributed, which is in a similar manner to the assumptions made for random geometric graphs (Penrose, 2003). The moment condition, i.e., Assumption 4.1.3, ...
Article
Full-text available
The performance of A/B testing in both online and offline experimental settings hinges on mitigating network interference and achieving covariate balancing. These experiments often involve an observable network with identifiable clusters, and measurable cluster-level and individual-level attributes. Exploiting these inherent characteristics holds potential for refining experimental design and subsequent statistical analyses. In this article, we propose a novel cluster-adaptive network A/B testing procedure, which contains a cluster-adaptive randomization (CLAR) and a cluster-adjusted estimator (CAE) to facilitate the design of the experiment and enhance the performance of ATE estimation. The CLAR sequentially assigns clusters to minimize the Mahalanobis distance, which further leads to the balance of the cluster-level covariates and the within-cluster-averaged individual-level covariates. The cluster-adjusted estimator (CAE) is tailored to offset biases caused by network interference. The proposed procedure has the following two folds of the desirable properties. First, we show that the Malanobis distance calculated for the two levels of covariates is O p (m −1), where m represents the number of clusters. This result justifies the simultaneous balance of the cluster-level and individual-level covariates. Under mild conditions, we derive the asymptotic normality of CAE and demonstrate the benefit of covariate balancing on improving the precision for estimating ATE. The proposed A/B testing procedure is easy to calculate, consistent, and achieves higher accuracy. Extensive numerical studies are conducted to demonstrate the finite sample property of the proposed network A/B testing procedure.
... Another common model involves selecting a fixed number of points independently and identically distributed on the space. The two models are closely connected, as observed by Penrose in [24,Section 1.7]. Because it is somewhat more commonly studied in the context of stochastic topology, here we will use the latter model. ...
Preprint
Full-text available
In this paper we explore the connection between the ranks of the magnitude homology groups of a graph and the structure of its subgraphs. To this end, we introduce variants of magnitude homology called eulerian magnitude homology and discriminant magnitude homology. Leveraging the combinatorics of the differential in magnitude homology, we illustrate a close relationship between the ranks of the eulerian magnitude homology groups on the first diagonal and counts of subgraphs which fall in specific classes. We leverage these tools to study limiting behavior of the eulerian magnitude homology groups for Erdős-Rényi random graphs and random geometric graphs, producing for both models a vanishing threshold for the eulerian magnitude homology groups on the first diagonal. This in turn provides a characterization of the generators for the corresponding magnitude homology groups. Finally, we develop an explicit asymptotic estimate the expected rank of eulerian magnitude homology along the first diagonal for these random graph models.
... It is implemented through a k-D tree, T k built using configurations. The radius of the ball decreases at a 'shrinking rate' derived using percolation theory (Penrose et al. 2003). ...
Article
Computing kinodynamically feasible motion plans and repairing them on-the-fly as the environment changes is a challenging, yet relevant problem in robot navigation. We propose an online single-query sampling-based motion re-planning algorithm using finite-time invariant sets, commonly referred to as “ funnels”. We combine concepts from nonlinear systems analysis, sampling-based techniques, and graph-search methods to create a single framework that enables feedback motion re-planning for any general nonlinear dynamical system in dynamic workspaces. A volumetric network of funnels is constructed in the configuration space using sampling-based methods and invariant set theory; and an optimal sequencing of funnels from robot configuration to a desired goal region is then determined by computing the shortest-path subgraph (tree) in the network. Analyzing and formally quantifying the stability of trajectories using Lyapunov level-sets ensures kinodynamic feasibility and guaranteed set-invariance of the solution paths. Though not required, our method is capable of using a pre-computed library of motion primitives to speedup online computation of controllable motion plans that are volumetric in nature. We introduce a novel directed-graph data structure to represent the funnel-network and its inter-sequencibility; helping us leverage discrete graph-based incremental search to quickly rewire feasible and controllable motion plans on-the-fly in response to changes in the environment. We validate our approach on a simulated cart-pole, car-like robot, and 6DOF quadrotor platform in a variety of scenarios within a maze and a random forest environment. Using Monte Carlo methods, we evaluate the performance in terms of algorithm success, length of traversed trajectory, and runtime.
... Using Mecke's formula (cf. [26]), ((0, y), 1) is connected}dy, ...
Article
Full-text available
Persistent homology is a natural tool for probing the topological characteristics of weighted graphs, essentially focusing on their 0-dimensional homology. While this area has been thoroughly studied, we present a new approach to constructing a filtration for cluster analysis via persistent homology. The key advantages of the new filtration is that (a) it provides richer signatures for connected components by introducing non-trivial birth times, and (b) it is robust to outliers. The key idea is that nodes are ignored until they belong to sufficiently large clusters. We demonstrate the computational efficiency of our filtration, its practical effectiveness, and explore into its properties when applied to random graphs.
... Much more is known about Gilbert random geometric graphs (which, together with the closely related k-nearest-neighbour model, have been applied in a variety of contexts, for example to model sensor networks [8] and ad hoc wireless networks [40], and for cluster analysis in spatial statistics [24]). We refer the interested reader to the monograph of Penrose [37] devoted to the topic. ...
Article
Following Bradonjić and Saniee, we study a model of bootstrap percolation on the Gilbert random geometric graph on the 2-dimensional torus. In this model, the expected number of vertices of the graph is n , and the expected degree of a vertex is alogna\log n for some fixed a>1a>1 . Each vertex is added with probability p to a set A0A_0 of initially infected vertices. Vertices subsequently become infected if they have at least θalogn \theta a \log n infected neighbours. Here p,θ[0,1]p, \theta \in [0,1] are taken to be fixed constants. We show that if θ<(1+p)/2\theta < (1+p)/2 , then a sufficiently large local outbreak leads with high probability to the infection spreading globally, with all but o ( n ) vertices eventually becoming infected. On the other hand, for θ>(1+p)/2 \theta > (1+p)/2 , even if one adversarially infects every vertex inside a ball of radius O(logn)O(\sqrt{\log n} ) , with high probability the infection will spread to only o ( n ) vertices beyond those that were initially infected. In addition we give some bounds on the (a,p,θ)(a, p, \theta) regions ensuring the emergence of large local outbreaks or the existence of islands of vertices that never become infected. We also give a complete picture of the (surprisingly complex) behaviour of the analogous 1-dimensional bootstrap percolation model on the circle. Finally we raise a number of problems, and in particular make a conjecture on an ‘almost no percolation or almost full percolation’ dichotomy which may be of independent interest.
... The well-connectedness property is not satisfied by spatial graphs such random geometric graphs [30]. The edges in a spatial graph are 'local', and hence dispatchers in one location cannot assign tasks to servers in spatially distant locations. ...
Preprint
Full-text available
The analysis of large-scale, parallel-server load balancing systems has relied heavily on mean-field analysis. A pivotal assumption for this framework is that the servers are exchangeable. However, modern data-centers have data locality constraints, where tasks of a particular type can only be routed to a small subset of servers. An emerging line of research, therefore, considers load balancing algorithms on bipartite graphs where vertices in the two partitions represent the task types and servers, respectively, and an edge represents the server's ability to process the corresponding task type. Due to the lack of exchangeability in this model, the mean-field techniques fundamentally break down. Recent progress has been made by considering graphs with strong edge-expansion properties, i.e., where any two large subsets of vertices are well-connected. However, data locality often leads to spatial constraints, where edges are local. As a result, these bipartite graphs do not have strong expansion properties. In this paper, we consider the power-of-d choices algorithm and develop a novel coupling-based approach to establish mean-field approximation for a large class of graphs that includes spatial graphs. As a corollary, we also show that, as the system size becomes large, the steady-state occupancy process for arbitrary regular bipartite graphs with diverging degrees, is indistinguishable from a fully flexible system on a complete bipartite graph. The method extends the scope of mean-field analysis far beyond the classical full-flexibility setup. En route, we prove that, starting from suitable states, the occupancy process becomes close to its steady state in a time that is independent of N. Such a large-scale mixing-time result might be of independent interest. Numerical experiments are conducted, which positively support the theoretical results.
... Drawbacks of this model are its locally tree-like nature and the slow convergence to the large-network limit [25]. This motivates to consider extended models such as random geometric graphs [36] or generalizations of the popular hyperbolic random graph [7,28], where the connection probability of two vertices scales as the product of their weights, divided by their distance. What is the maximal value of c(k) for given mean and variance on the degrees and given mean and variance on the inter-distances? ...
Article
Full-text available
Complex network theory crucially depends on the assumptions made about the degree distribution, while fitting degree distributions to network data is challenging, in particular for scale-free networks with power-law degrees. We present a robust assessment of complex networks that does not depend on the entire degree distribution, but only on its mean, range, and dispersion: summary statistics that are easy to obtain for most real-world networks. By solving several semi-infinite linear programs, we obtain tight (the sharpest possible) bounds for correlation and clustering measures, for all networks with degree distributions that share the same summary statistics. We identify various extremal random graphs that attain these tight bounds as the graphs with specific three-point degree distributions. We leverage the tight bounds to obtain robust laws that explain how degree-degree correlations and local clustering evolve as a function of node degrees and network size. These robust laws indicate that power-law networks with diverging variance are among the most extreme networks in terms of correlation and clustering, building a further theoretical foundation for the widely reported scale-free network phenomena such as correlation and clustering decay.
... It has been treated in many works under various names: geometric or proximity graph, interval graph (when d = 1) or disk graph (when d = 2). The book [Pen03] provides a vast background and literature review and we also refer to [LRP13a,LRP13b] for central limit theorems of generalisations of the Gilbert graph and [RS13] for a quantitative CLT on a sum of weighted edge-lengths. For a comprehensive overview of the Gilbert graph in the context of U -statistics, see [LRR16], especially Section 4.3. ...
Preprint
Full-text available
We establish new explicit bounds on the Gaussian approximation of Poisson functionals based on novel estimates of moments of Skorohod integrals. Combining these with the Malliavin-Stein method, we derive bounds in the Wasserstein and Kolmogorov distances whose application requires minimal moment assumptions on add-one cost operators \unicode{x2014} thereby extending the results from (Last, Peccati and Schulte, 2016). Our applications include a CLT for the Online Nearest Neighbour graph, whose validity was conjectured in (Wade, 2009; Penrose and Wade, 2009). We also apply our techniques to derive quantitative CLTs for edge functionals of the Gilbert graph, of the k-Nearest Neighbour graph and of the Radial Spanning Tree, both in cases where qualitative CLTs are known and unknown.
... Poisson convergence for the number of such isolated vertices in different regimes has been extensively studied, see, e.g., [12,Ch. 8]. ...
... Random geometric graphs In sensor network localization, it is common to model the network as a random geometric graph [44]. Such a graph has node set representing points that are drawn iid from some distribution on R p and edges between any two of these points within distance r. ...
Preprint
While classical scaling, just like principal component analysis, is parameter-free, most other methods for embedding multivariate data require the selection of one or several parameters. This tuning can be difficult due to the unsupervised nature of the situation. We propose a simple, almost obvious, approach to supervise the choice of tuning parameter(s): minimize a notion of stress. We substantiate this choice by reference to rigidity theory. We extend a result by Aspnes et al. (IEEE Mobile Computing, 2006), showing that general random geometric graphs are trilateration graphs with high probability. And we provide a stability result \`a la Anderson et al. (SIAM Discrete Mathematics, 2010). We illustrate this approach in the context of the MDS-MAP(P) algorithm of Shang and Ruml (IEEE INFOCOM, 2004). As a prototypical patch-stitching method, it requires the choice of patch size, and we use the stress to make that choice data-driven. In this context, we perform a number of experiments to illustrate the validity of using the stress as the basis for tuning parameter selection. In so doing, we uncover a bias-variance tradeoff, which is a phenomenon which may have been overlooked in the multidimensional scaling literature. By turning MDS-MAP(P) into a method for manifold learning, we obtain a local version of Isomap for which the minimization of the stress may also be used for parameter tuning.
... Random geometric graphs (RGG) [22,3] in 2D are obtained by sprinkling N points uniformly at random into the unit square [0, 1] 2 , then connecting every pair of points that are within a prescribed (Euclidean) distance R. The random network created that way will have an average degree of k = πR 2 N and a Poisson degree distribution just like the ER graphs. However, unlike the ER graphs, RGG-s are spatially embedded, have a high clustering coefficient and have no shortcuts. ...
Preprint
Full-text available
To better understand the temporal characteristics and the lifetime of fluctuations in stochastic processes in networks, we investigated diffusive persistence in various graphs. Global diffusive persistence is defined as the fraction of nodes for which the diffusive field at a site (or node) has not changed sign up to time t (or in general, that the node remained active/inactive in discrete models). Here we investigate disordered and random networks and show that the behavior of the persistence depends on the topology of the network. In two-dimensional (2D) disordered networks, we find that above the percolation threshold diffusive persistence scales similarly as in the original 2D regular lattice, according to a power law P(t,L)tθP(t,L)\sim t^{-\theta} with an exponent θ0.186\theta \simeq 0.186, in the limit of large linear system size L. At the percolation threshold, however, the scaling exponent changes to θ0.141\theta \simeq 0.141, as the result of the interplay of diffusive persistence and the underlying structural transition in the disordered lattice at the percolation threshold. Moreover, studying finite-size effects for 2D lattices at and above the percolation threshold, we find that at the percolation threshold, the long-time asymptotic value obeys a power-law P(t,L)LzθP(t,L)\sim L^{-z\theta} with z2.86z\simeq 2.86 instead of the value of z=2 normally associated with finite-size effects on 2D regular lattices. In contrast, we observe that in random networks without a local regular structure, such as Erd\H{o}s-R\'enyi networks, no simple power-law scaling behavior exists above the percolation threshold.
... EIT* is an almost-surely asymptotically optimal anytime sampling-based path planning algorithm that is based on an asymmetric search which simultaneously calculates and exploits problem-specific heuristics. Both EIT* and EIRM* sample batches of states, and view these states as a series of edgeimplicit random geometric graphs (RGGs) [26], as in BIT* [13]. The edges in each RGG are processed in a reverse search informed by an a priori heuristic. ...
Preprint
Multiquery planning algorithms find paths between various different starts and goals in a single search space. They are designed to do so efficiently by reusing information across planning queries. This information may be computed before or during the search and often includes knowledge of valid paths. Using known valid paths to solve an individual planning query takes less computational effort than finding a completely new solution. This allows multiquery algorithms, such as PRM*, to outperform single-query algorithms, such as RRT*, on many problems but their relative performance depends on how much information is reused. Despite this, few multiquery planners explicitly seek to maximize path reuse and, as a result, many do not consistently outperform single-query alternatives. This paper presents Effort Informed Roadmaps (EIRM*), an almost-surely asymptotically optimal multiquery planning algorithm that explicitly prioritizes reusing computational effort. EIRM* uses an asymmetric bidirectional search to identify existing paths that may help solve an individual planning query and then uses this information to order its search and reduce computational effort. This allows it to find initial solutions up to an order-of-magnitude faster than state-of-the-art planning algorithms on the tested abstract and robotic multiquery planning problems.
... Contrary to the rumor spreading literature, we do not want to impose a specific structure on the graph (e.g. to be regular). We thus chose to use Random Geometric Graphs [35] which is a large class of graphs that is very well suited for modeling WSNs as we will see in the following. We will first define and give intuitions about connectivity in RGGs in Section 2.5.1 then we will examine the algorithms performance and the tightness of our bounds in Section 2.5.2. ...
Thesis
This thesis addresses the distributed estimation and optimization of a global value of interest over a network using only local and asynchronous (sometimes wireless) communications. Motivated by many different applications ranging from cloud computing to wireless sensor networks via machine learning, we design new algorithms and theoretically study three problems of very different nature : the propagation of the maximal initial value, the estimation of their average and finally distributed optimization.
... In many real world settings, spatial factors such as the average distance between individuals and their mobility play a crucial part in deciding the structure of a contact network and will influence any dynamic process defined on such a network. The networks where the connectivity is decided by a distance-dependent measure are called random geometric graphs [13,14]. Such spatial factors which are normally not considered in epidemic models on topological networks has gained increased recent attention in the wake of the COVID19 pandemic [15][16][17][18][19][20]. ...
Preprint
Full-text available
Our recent experience with COVID19 amply shows that spatial effects like mobility and average interpersonal distance are very important in deciding the outcome of an epidemic dynamics. Spatial connectivity structure of a population is usually modelled via Random Geometric Graphs which are networks generated from spatially distributed components where connections between components are made based on a distance-dependent probability measure. Structural and dynamical aspects of such graphs are important in describing processes with a spatial dependence such as the spread of an airborne disease. In this work, using a simple computational model of an epidemic, we investigate how spatial factors like average separation between individuals and their mobility affect the spread of a disease. We show that such spatial factors can give rise to oscillatory prevalence in a society of adaptive individuals. We also show that delays in executing non-pharmaceutical spatial mitigation strategies can accentuate oscillatory prevalence and can have non-monotonic effects on the peak prevalence. In both cases, we characterize the effects of different parameters on the prevalence of the disease and peak infection and obtain threshold values.
... We use a random geometric graph (RGG) as the connectivity pattern between the nodes [36,37]. The RGG is constructed by placing M nodes at random in a square box with size z. ...
Article
In this work, we study the topological transition on the associated networks in a model proposed by Saeedian et al. (Scientific Reports 2019 9:9726), which considers a coupled dynamics of node and link states. Our goal was to better understand the two observed phases, so we use another network structure (the so called random geometric graph - RGG) together with other metrics borrowed from network science. We observed a topological transition on the two associated networks, which are subgraphs of the full network. As the links have two possible states (friendly and non-friendly), we defined each associated network as composed of only one type of link. The (non) friendly associated network has (non) friendly links only. This topological transition was observed from the domain distribution of each associated network between the two phases of the system (absorbing and active). We also showed that another metric from network science called modularity (or assortative coefficient) can also be used as order parameter, giving the same phase diagram as the original order parameter from the seminal work. On the absorbing phase the absolute value of the modularity for each associated network reaches a maximum value, while on the active phase they fall to the minimum value.
... More generally, we establish the convergence of the rescaled degree sequences of the random graphs to a certain Poisson process. The behaviour of the maximum degree for the random graphs considered in this paper is significantly different from that of the maximum degree of an Erdős-Rényi graph or a random geometric graph, which is concentrated with a high probability on at most two consecutive numbers (see [3,Theorem 3.7] and [17,Theorem 6.6] or [15], respectively). More recently, such a concentration phenomenon was also shown for the maximum degree of a Poisson-Delaunay graph in [4]. ...
... It is implemented through a k-D tree built using configurations. The radius of the ball decreases at a "shrinking rate" derived using percolation theory [44]. return ∅ 5: end if 6: return F ...
Preprint
Computing kinodynamically feasible motion plans and repairing them on-the-fly as the environment changes is a challenging, yet relevant problem in robot-navigation. We propose a novel online single-query sampling-based motion re-planning algorithm - PiP-X, using finite-time invariant sets - funnels. We combine concepts from sampling-based methods, nonlinear systems analysis and control theory to create a single framework that enables feedback motion re-planning for any general nonlinear dynamical system in dynamic workspaces. A volumetric funnel-graph is constructed using sampling-based methods, and an optimal funnel-path from robot configuration to a desired goal region is then determined by computing the shortest-path subtree in it. Analysing and formally quantifying the stability of trajectories using Lyapunov level-set theory ensures kinodynamic feasibility and guaranteed set-invariance of the solution-paths. The use of incremental search techniques and a pre-computed library of motion-primitives ensure that our method can be used for quick online rewiring of controllable motion plans in densely cluttered and dynamic environments. We represent traversability and sequencibility of trajectories together in the form of an augmented directed-graph, helping us leverage discrete graph-based replanning algorithms to efficiently recompute feasible and controllable motion plans that are volumetric in nature. We validate our approach on a simulated 6DOF quadrotor platform in a variety of scenarios within a maze and random forest environment. From repeated experiments, we analyse the performance in terms of algorithm-success and length of traversed-trajectory.
... The recipe for generating a random geometric graph on a twodimensional Euclidean space with n nodes and radius d max is as follows 27 . First, n points are distributed uniformly at random on a unit square, by sampling both their horizontal and vertical coordinates uniformly at random. ...
Article
Full-text available
We consider the problem of deploying a quantum network on an existing fiber infrastructure, where quantum repeaters and end nodes can only be housed at specific locations. We propose a method based on integer linear programming (ILP) to place the minimal number of repeaters on such an existing network topology, such that requirements on end-to-end entanglement-generation rate and fidelity between any pair of end-nodes are satisfied. While ILPs are generally difficult to solve, we show that our method performs well in practice for networks of up to 100 nodes. We illustrate the behavior of our method both on randomly-generated network topologies, as well as on a real-world fiber topology deployed in the Netherlands.
... Theorem 2.4 (Palm Theory [24]). Let (X, ) be a metric space, f ∶ X → R a probability density and  n a Poisson process on X with intensity n = nf . ...
Article
Full-text available
Let M be a compact, unit volume, Riemannian manifold with boundary. We study the homology of a random Čech‐complex generated by a homogeneous Poisson process in M. Our main results are two asymptotic threshold formulas, an upper threshold above which the Čech complex recovers the kth homology of M with high probability, and a lower threshold below which it almost certainly does not. These thresholds share the same leading term. This extends work of Bobrowski–Weinberger and Bobrowski–Oliveira who establish similar formulas when M has no boundary. The cases with and without boundary differ: the corresponding common leading terms for the upper and lower thresholds differ being log(n) when M is closed and (2−2/d)log(n) when M has boundary; here n is the expected number of sample points. Our analysis identifies a special type of homological cycle occurring close to the boundary.
Preprint
We study internal diffusion limited aggregation on Z\mathbb{Z}, where a cluster is grown by sequentially adding the first site outside the cluster visited by each random walk dispatched from the origin. We assume that the increment distribution X of the driving random walks has EX=0\mathbb{E} X =0, but may be neither simple nor symmetric, and can have E(X2)=\mathbb{E} (X^2) = \infty, for example. For the case where E(X2)<\mathbb{E} (X^2) < \infty, we prove that after m walks have been dispatched, all but o(m) sites in the cluster form an approximately symmetric contiguous block around the origin. This extends known results for simple random walk. On the other hand, if~X is in the domain of attraction of a symmetric α\alpha-stable law, 1<α<21 < \alpha <2, we prove that the cluster contains a contiguous block of δm+o(m)\delta m +o(m) sites, where 0<δ<10 < \delta < 1, but, unlike the finite-variance case, one may not take δ=1\delta=1.
Preprint
Full-text available
We study the intersection of a random geometric graph with an Erd\H{o}s-R\'enyi graph. Specifically, we generate the random geometric graph G(n,r)G(n, r) by choosing n points uniformly at random from D=[0,1]2D=[0, 1]^2 and joining any two points whose Euclidean distance is at most r. We let G(n,p)G(n, p) be the classical Erd\H{o}s-R\'enyi graph, i.e. it has n vertices and every pair of vertices is adjacent with probability p independently. In this note we study G(n,r,p):=G(n,r)G(n,p)G(n, r, p):=G(n, r) \cap G(n, p). One way to think of this graph is that we take G(n,r)G(n, r) and then randomly delete edges with probability 1p1-p independently. We consider the clique number, independence number, connectivity, Hamiltonicity, chromatic number, and diameter of this graph where both p(n)0p(n)\to 0 and r(n)0r(n)\to 0; the same model was studied by Kahle, Tian and Wang (2023) for r(n)0r(n)\to 0 but p fixed.
Article
A rooted tree is balanced if the degree of a vertex depends only on its distance to the root. In this paper we determine the sharp threshold for the appearance of a large family of balanced spanning trees in the random geometric graph . In particular, we find the sharp threshold for balanced binary trees. More generally, we show that all sequences of balanced trees with uniformly bounded degrees and height tending to infinity appear above a sharp threshold, and none of these appears below the same value. Our results hold more generally for geometric graphs satisfying a mild condition on the distribution of their vertex set, and we provide a polynomial time algorithm to find such trees.
Conference Paper
Full-text available
Consider the random geometric graph G(n,r)\mathcal{G}(n,r) obtained by independently assigning a uniformly random position in [0,1]2[0,1]^2 to each of the n vertices of the graph and connecting two vertices by an edge whenever their Euclidean distance is at most r. We study the event that G(n,r)\mathcal{G}(n,r) contains a spanning copy of a balanced tree T and obtain sharp thresholds for these events. Our methods provide a polynomial-time algorithm for finding a copy of such trees T above the threshold.
Thesis
With the increase in data acquisition and storage capabilities, developing efficient methods for processing graph-structured data has become a crucial issue in data science. We introduce and study new methods based on heat diffusion to compare graphs. The novelty of our approach essentially lies in the introduction of the concept of distance processes, where we consider the family of all distances computed over a continuous range of diffusion times for a given pair of graphs. This allows us to develop a multi-scale analysis of graphs. Moreover, by representing graphs via tools borrowed from topological data analysis, we are able to compare graphs of different sizes or unaligned graphs. The statistical properties of these processes are studied with the theory of empirical processes. We prove a functional central limit theorem (CLT), as well as a Gaussian approximation result allowing us to show that the convergence rate in the CLT is independent of the graphs' sizes. These results are general and can be applied to other processes. Moreover, they guarantee the asymptotic validity of resampling methods for constructing confidence bands and two-sample tests comparing graph populations. We study the performance of these tests on simulated data sets and apply them to the problem of distribution shift detection in the context of neural network learning.
Article
We study Hamiltonicity in graphs obtained as the union of a deterministic n n‐vertex graph H H with linear degrees and a d d‐dimensional random geometric graph G d ( n , r ) Gd(n,r){G}^{d}(n,r), for any d ≥ 1 d1d\ge 1. We obtain an asymptotically optimal bound on the minimum r r for which a.a.s. H ∪ G d ( n , r ) HGd(n,r)H\cup {G}^{d}(n,r) is Hamiltonian. Our proof provides a linear time algorithm to find a Hamilton cycle in such graphs.
Article
Full-text available
In this work we study statistical properties of graph-based clustering algorithms that rely on the optimization of balanced graph cuts, the main example being the optimization of Cheeger cuts. We consider proximity graphs built from data sampled from an underlying distribution supported on a generic smooth compact manifold M{\mathcal {M}} M . In this setting, we obtain high probability convergence rates for both the Cheeger constant and the associated Cheeger cuts towards their continuum counterparts. The key technical tools are careful estimates of interpolation operators which lift empirical Cheeger cuts to the continuum, as well as continuum stability estimates for isoperimetric problems. To the best of our knowledge the quantitative estimates obtained here are the first of their kind.
Article
In this paper we improve the spectral convergence rates for graph-based approximations of weighted Laplace-Beltrami operators constructed from random data. We utilize regularity of the continuum eigenfunctions and strong pointwise consistency results to prove that spectral convergence rates are the same as the pointwise consistency rates for graph Laplacians. In particular, for an optimal choice of the graph connectivity ε, our results show that the eigenvalues and eigenvectors of the graph Laplacian converge to those of a weighted Laplace-Beltrami operator at a rate of O(n−1/(m+4)), up to log factors, where m is the manifold dimension and n is the number of vertices in the graph. Our approach is general and allows us to analyze a large variety of graph constructions that include ε-graphs and k-NN graphs. We also present the results of numerical experiments analyzing convergence rates on the two dimensional sphere.
Article
Full-text available
We establish both uniform and nonuniform error bounds of the Berry-Esseen type in normal approximation under local dependence. These results are of an order close to the best possible if not best possible. They are more general or sharper than many existing ones in the literature. The proofs couple Stein's method with the concentration inequality approach.
Article
1. Introduction 2. Probabilistic ingredients 3. Subgraph and component counts 4. Typical vertex degrees 5. Geometrical ingredients 6. Maximum degree, cliques and colourings 7. Minimum degree: laws of large numbers 8. Minimum degree: convergence in distribution 9. Percolative ingredients 10. Percolation and the largest component 11. The largest component for a binomial process 12. Ordering and partitioning problems 13. Connectivity and the number of components References Index