ThesisPDF Available

Inferring Community-driven Structure in Complex Networks

Authors:

Abstract and Figures

Despite a long tradition in the study of graphs and relational data, for decades the analysis of complex networks was limited by difficulties in data collection and computational burdens. The advent of new technologies in life sciences, as well as in our daily life, has suddenly shed light on the many interconnections that our world features, from friendships and collaborations between individuals or organizations, to functional couplings between cellular molecules. This has highly facilitated the collection of relational data, fostering an unprecedented interest in network science. Understanding relations encoded in complex networks, however, still represents a challenging task, and statistical methods that can help to summarize and simplify complex networks are needed. In this thesis we show that often one can gain a deep insight of a network by focusing their attention on communities, i.e. on clusters of nodes, and on the relations that exist between them. We begin by presenting NEAT, a network-based test that allows to assess relations between gene sets in a gene interaction network. NEAT extends traditional gene enrichment analysis tests by incorporating information on interactions between genes and it overcomes some limitations of existing network enrichment analysis approaches. Then, we propose two extended stochastic blockmodels that allow to infer the relations that exist between communities from relations between pairs of individuals in a social network. We advocate the use of penalized inference to estimate these models, with the aim of deriving a sparse reduced graph between communities. Application of these models to bill cosponsorship networks in the Italian Chamber of Deputies allows us to reconstruct the pattern of collaborations between Italian political parties from 2001 to 2015. Finally, we propose a novel clustering strategy for sequences of graphs, based on mixtures of generalized linear models. We show that the proposed clustering method not only is capable to retrieve subpopulations of networks within a cross-sectional or longitudinal sequence of networks, but it also allows to directly characterize them by considering each of the components that form the mixture model.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Data
Full-text available
Supplementary material associated to: Signorelli, M., Wit, E. C. (2017), A penalized inference approach to stochastic blockmodelling of community structure in the Italian Parliament. Journal of the Royal Statistical Society, Series C.
Article
Full-text available
We analyse bill cosponsorship networks in the Italian Chamber of Deputies. In comparison with other parliaments, a distinguishing feature of the Chamber is the large number of political groups. Our analysis aims to infer the pattern of collaborations between these groups from data on bill cosponsorships. We propose an extension of stochastic block models for edge-valued graphs and derive measures of group productivity and of collaboration between political parties. As the model proposed encloses a large number of parameters, we pursue a penalized likelihood approach that enables us to infer a sparse reduced graph displaying collaborations between political parties.
Conference Paper
Full-text available
Network enrichment analysis (NEA) integrates gene enrichment analysis with information on dependences between genes. Existing tests for NEA rely on normality assumptions, they can deal only with undirected networks and are computationally slow. We propose NEAT, an alternative test based on the hypergeometric distribution. NEAT can be applied also to directed and mixed networks, and it is faster and more powerful than existing NEA tests.
Article
Full-text available
Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by testing for associations between GO Slim categories themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN.
Article
The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components). The implications of the approach in designing statistics courses are discussed.