François Caron’s research while affiliated with University of Oxford and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (96)


Asymptotic analysis of statistical estimators related to MultiGraphex processes under misspecification
  • Article

November 2024

Bernoulli

Zacharie Naulet

·

Judith Rousseau

·

François Caron


On sparsity, power-law, and clustering properties of graphex processes

June 2023

·

21 Reads

·

9 Citations

Advances in Applied Probability

This paper investigates properties of the class of graphs based on exchangeable point processes. We provide asymptotic expressions for the number of edges, number of nodes, and degree distributions, identifying four regimes: (i) a dense regime, (ii) a sparse, almost dense regime, (iii) a sparse regime with power-law behaviour, and (iv) an almost extremely sparse regime. We show that, under mild assumptions, both the global and local clustering coefficients converge to constants which may or may not be the same. We also derive a central limit theorem for subgraph counts and for the number of nodes. Finally, we propose a class of models within this framework where one can separately control the latent structure and the global sparsity/power-law properties of the graph.


Bayesian Nonparametrics for Sparse Dynamic Networks

March 2023

·

7 Reads

·

7 Citations

Lecture Notes in Computer Science

Cian Naik

·

François Caron

·

Judith Rousseau

·

[...]

·

In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th11^{th} attacks.KeywordsBayesian nonparametricsPoisson random measuresNetworksRandom graphsSparsityPoint processes


Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

February 2023

·

8 Reads

We consider the optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter. We focus on the case where the node scalings are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime. We also provide experiments on synthetic and real-world datasets illustrating our theoretical results and showing the benefit of such scaling in terms of pruning and transfer learning.


Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

May 2022

·

23 Reads

This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a L\'evy measure on the positive reals. If the scalar parameters are strictly positive and the L\'evy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and feature learning is possible. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.



Fig. 1 Trace plots of a the number of active communities K n and b σ , on a synthetic example
Fig. 2 Posterior density of the sizes of the small communities for the dataset generated from a Stochastic Block Model
Fig. 4 Posterior of K n and σ for the Wiki-topcats dataset
Fig. 5 Reordered adjacency matrix of the Wikipedia topcats dataset
Proportion of the interactions of the features in each block for different values of overlapping

+7

Nonnegative Bayesian nonparametric factor models with completely random measures
  • Article
  • Full-text available

September 2021

·

52 Reads

·

7 Citations

Statistics and Computing

We present a Bayesian nonparametric Poisson factorization model for modeling dense network data with an unknown and potentially growing number of overlapping communities. The construction is based on completely random measures and allows the number of communities to either increase with the number of nodes at a specified logarithmic or polynomial rate, or be bounded. We develop asymptotics for the number and size of the communities of the network and derive a Markov chain Monte Carlo algorithm for targeting the exact posterior distribution for this model. The usefulness of the approach is illustrated on various real networks.

Download

Asymptotic Analysis of Statistical Estimators related to MultiGraphex Processes under Misspecification

July 2021

·

9 Reads

This article studies the asymptotic properties of Bayesian or frequentist estimators of a vector of parameters related to structural properties of sequences of graphs. The estimators studied originate from a particular class of graphex model introduced by Caron and Fox. The analysis is however performed here under very weak assumptions on the underlying data generating process, which may be different from the model of Caron and Fox or from a graphex model. In particular, we consider generic sparse graph models, with unbounded degree, whose degree distribution satisfies some assumptions. We show that one can relate the limit of the estimator of one of the parameters to the sparsity constant of the true graph generating process. When taking a Bayesian approach, we also show that the posterior distribution is asymptotically normal. We discuss situations where classical random graphs models such as configuration models, sparse graphon models, edge exchangeable models or graphon processes satisfy our assumptions.



Citations (48)


... Thus, for fixed τ , the number of edges is essentially of order the number of vertices squared and the graph is dense. The two parameter CF model provides much more flexible behavior; see Caron and Fox (2017) and Caron, Panero, and Rousseau (2023), but is difficult to analyze. We have argued that our two parameter SICRP(α, θ; τ −1 δ 2 ) is a simpler alternative, and the next result, which follows in a straightforward way from Pitman and Yor (1997, Proposition 21), shows how this SICRP and the two-parameter CF networks are related. ...

Reference:

Network and interaction models for data with hierarchical granularity via fragmentation and coagulation
On sparsity, power-law, and clustering properties of graphex processes
  • Citing Article
  • June 2023

Advances in Applied Probability

... and Miele (2017)) and related nonparametric graphon-based methods (Pensky (2019)) as well as nonparametric methods for dynamic link prediction (Sarkar, Chakrabarti and Jordan (2014)) and methods from Bayesian nonparametrics (Palla, Caron and Teh (2016)). Other related work includes sparse graphical models that can take account of different time points (Kalaitzis et al. (2013)). ...

Bayesian Nonparametrics for Sparse Dynamic Networks
  • Citing Chapter
  • March 2023

Lecture Notes in Computer Science

... Let M = i δ (θ i ,ϑ i ) be a unit-rate Poisson random measure on (0, +∞) 2 , and let W : [0, +∞) 2 → [0, 1] be a symmetric measurable function such that lim x→∞ W(x, x) and lim x→0 W(x, x) both exist (by (3), this implies that lim x→∞ W(x, x) = 0) and ...

The Normal-Generalised Gamma-Pareto Process: A Novel Pure-Jump Lévy Process with Flexible Tail and Jump-Activity Properties
  • Citing Article
  • January 2022

Bayesian Analysis

... These initial investigations sparked a rich stream of research, particularly within the machine learning community. ibp models have found widespread applicability across various domains, including Bayesian factor analysis and nonnegative matrix factorization (Griffiths and Ghahramani, 2006;Knowles and Ghahramani, 2011;Ayed and Caron, 2021), topic modeling (Williamson et al., 2010), relational models (Miller et al., 2009;Palla et al., 2012), and object recognition (Broderick et al., 2015). We refer to Teh and Jordan (2010) and Griffiths and Ghahramani (2011) for a comprehensive review of the early contributions. ...

Nonnegative Bayesian nonparametric factor models with completely random measures

Statistics and Computing

... Given the current estimate of the cluster assignments, the conditional intensities are then estimated using a non-parametric M-step, consisting of either a histogram or kernel based estimate. A similar model has been proposed elsewhere (Miscouridou et al. 2018), where edge exchangeable models for binary graphs are extended to this setting. Here, the baseline of a Hawkes process encodes the affiliation of each node to the K latent communities, with a common exponential kernel for all interactions. ...

Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data
  • Citing Article
  • March 2018

... Apart from the prior works outlined above, the literature on the microclustering property is scarce but diverse. Previous work includes models that sacrifice finite exchangeability to handle data with a temporal component (e.g., arrival times Di Benedetto et al. 2021), general finite mixture models with constraints on cluster sizes (Klami and Jitta 2016;Jitta and Klami 2018;Silverman and Silverman 2017), and models for sparse networks based on random partitions with power-law distributed cluster sizes (Bloem-Reddy et al. 2018). Recently, Lee and Sang (2022) considered the question of balance in cluster sizes, as encoded by majorization of cluster size vectors. ...

Non-exchangeable random partition models for microclustering
  • Citing Article
  • November 2017

The Annals of Statistics

... Another type of exchangability for graph-valued stochastic processes is that of Caron and Fox's exchangeable measures [13]. These models can also produce sparse graphs, but it is much less clear what their exchangability corresponds to in terms of modeling the real-world and hence when it can be expected (see Crane's comment in [13]). ...

Sparse Graphs Using Exchangeable Random Measures

Journal of the Royal Statistical Society Series B (Statistical Methodology)

... Probabilistic generative models have for many years been key tools in the analysis of network data [1,2]. Recent work in the area [3,4,5,6,7,8,9,10,11,12,13] has begun to incorporate the use of nonparametric discrete measures, in an effort to address the limitations of traditional models in capturing the sparsity of real large-scale networks [14]. These models construct a discrete random measure Θ (often a completely random measure, or CRM [15]) on a space Ψ, associate each atom of the measure with a vertex in the network, and then use the self-product of the measure-i.e., the measure Θ × Θ on Ψ 2 -to represent the magnitude of interaction between vertices. ...

On sparsity and power-law properties of graphs based on exchangeable point processes
  • Citing Article
  • August 2017