Introduction to papers on the modeling and analysis of network data---II
ABSTRACT Introduction to papers on the modeling and analysis of network data---II Comment: Published in at http://dx.doi.org/10.1214/10-AOAS365 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- SourceAvailable from: ArXiv[show abstract] [hide abstract]
ABSTRACT: We introduce a method for the theoretical analysis of exponential random graph models. The method is based on a large deviations approximation to the normalizing constant shown to be consistent using theory developed by Chatterjee and Varadhan. The theory explains a host of difficulties encountered by applied workers: many distinct models have essentially the same MLE, rendering the problems "practically" ill-posed. We give the first rigorous proofs of "degeneracy" observed in these models. Here, almost all graphs have essentially no edges or are essentially complete. We supplement recent work of Bhamidi, Bresler and Sly showing that for many models, the extra sufficient statistics are useless: most realizations look like the results of a simple Erdos-Renyi model. We also find classes of models where the limiting graphs differ from Erdos-Renyi graphs. A limitation of our approach, inherited from the limitation of graph limit theory, is that it works only for dense graphs.02/2011;
- [show abstract] [hide abstract]
ABSTRACT: We derive the full phase diagram for a large family of exponential random graph models, each containing a first order transition curve ending in a critical point.08/2011;
- [show abstract] [hide abstract]
ABSTRACT: The exponential family of random graphs is among the most widely-studied network models. We show that any exponential random graph model may alternatively be viewed as a lattice gas model with a finite Banach space norm. The system may then be treated by cluster expansion methods from statistical mechanics. In particular, we derive a convergent power series expansion for the limiting free energy in the case of small parameters. Since the free energy is the generating function for the expectations of other random variables, this characterizes the structure and behavior of the limiting network in this parameter region.Journal of Statistical Mechanics Theory and Experiment 02/2012; 2012(05). · 1.87 Impact Factor
arXiv:1011.1717v1 [stat.AP] 8 Nov 2010
The Annals of Applied Statistics
2010, Vol. 4, No. 2, 533–534
c ? Institute of Mathematical Statistics, 2010
INTRODUCTION TO PAPERS ON THE MODELING AND
ANALYSIS OF NETWORK DATA—II
By Stephen E. Fienberg
Carnegie Mellon University
This issue of The Annals of Applied Statistics (Volume 4, No. 2) con-
tains the second part of a Special Section on the topic of network modeling.
The first part consisted of seven papers and appeared with a general in-
troduction [Fienberg (2010)] in Volume 4, No. 1. In Part II we include a
diverse collection of eight additional papers with applications spanning bio-
logical, informational and social networks, using techniques such as kriging
and anomaly detection, and variational approximations, as well as the study
of latent structure in both static and dynamical networks:
• In A State-Space Mixed Membership Blockmodel for Dynamic Network To-
mography, Xing, Fu and Song combine earlier approaches involving mixed
membership stochastic blockmodels for static networks with state-space
models for trajectories and use the new dynamic modeling approach to
analyze the Sampson’s network of noviates in a monastery, the email com-
munication network between the Enron employees and a rewiring gene
interaction network of the life cycle of the fruit fly.
• In Maximum Likelihood Estimation for Social Network Dynamics, Sni-
jders, Koskinen and Schweinberger develop a likelihood-based approach to
network panel data with an underlying Markov continuous-time stochastic
actor-oriented process. They use the new methods to reanalyze a friend-
ship network between 32 freshman students in a given discipline at a Dutch
university, observed over six waves at three-week intervals beginning at
the start of the academic year.
• Xu, Dyer and Owen use a semi-supervised learning on network graphs in
which response variables observed at one node are used to estimate missing
values at other nodes, by exploiting an underlying correlation structure
among nearby nodes. The methods they employ in Empirical Stationary
Correlations for Semi-supervised Learning on Graphs are rooted in ideas
Received May 2010.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Statistics,
2010, Vol. 4, No. 2, 533–534. This reprint differs from the original in pagination
and typographic detail.
S. E. FIENBERG
about kriging emanating from geostatistics, and they compare their meth-
ods to ones proposed earlier using a data set containing the number of
web links between UK universities in 2002, and the WebKB data set con-
taining webpages collected from computer science departments of various
US universities in 1997.
• In Ranking Relations Using Analogies in Biological and Information Net-
works, Silva, Heller, Ghahramani and Airoldi explore the problem of rank-
ing relations in network-like settings based on a similarity criterion un-
derlying Bayesian sets, drawing on ideas of analogy items in test batteries
such at the SAT. They too analyze the WebKB collection, as well as the
problem of ranking protein–protein interactions using the MIPS database
for the proteins in budding yeast.
• Heard, Weston, Platanioti and Hand fuse discrete time counting models
to carry out Bayesian Anomaly Detection Methods for Social Networks
using data from the European Commission Joint Research Centre’s Euro-
pean Media Monitor web intelligence service, that provides real-time press
and media summaries to Commission cabinets and services, including a
breaking news and alerting service. They also study simulated cell phone
data from the VAST Mini Challenge covering a fictional ten-day period
on an island, narrowed to 400 unique cell phones during this period.
• James, Zhou, Zhu and Sabatti study Sparse Regulation Networks, in ge-
netic contexts using prior information about the network structure in con-
junction with observed gene expression data to estimate the transcription
regulatory network for E. coli. Their approach uses L1penalties on the
network to ensure a sparse structure.
• Zanghi, Picard, Miele and Ambroise explore Strategies for Online Infer-
ence of Model-Based Clustering in Large and Growing Networks. Their
online EM-based algorithms offer a good trade-off between precision and
speed, when estimating parameters for mixture distributions applied to
data from the political websphere during the 2008 US political campaign.
• Mariadassou, Robin and Vacher, in Uncovering Latent Structure in Val-
ued Graphs: A Variational Approach, use variational approximations to
likelihood mixture modes where the network connections are weighted
values instead of simple 0–1 entries. They use their method to analyze
interaction networks of tree and fungal species.
Fienberg, S. E. (2010). Introduction to papers on the modeling and analysis of network
data. Ann. Appl. Statist. 4 1–4.
SECTION ON NETWORK MODELING—II
Department of Statistics
and Machine Learning Department
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213