On Scale-Free Prior Distributions and Their Applicability in Large-Scale Network Inference with Gaussian Graphical Models.
ABSTRACT This paper concerns the specification, and performance, of scale-free prior distributions with a view toward large-scale network
inference from small-sample data sets. We devise three scale-free priors and implement them in the framework of Gaussian graphical
models. Gaussian graphical models are used in gene network inference where high-throughput data describing a large number
of variables with comparatively few samples are frequently analyzed by practitioners. And, although there is a consensus that
many such networks are scale-free, the modus operandi is to assign a random network prior. Simulations demonstrate that the scale-free priors outperform the random network prior
at recovering scale-free trees with degree exponents near 2, such as are characteristic of many real-world systems. On the
other hand, the random network prior compares favorably at recovering scale-free trees characterized by larger degree exponents.
- SourceAvailable from: Tjeerd Dijkstra[Show abstract] [Hide abstract]
ABSTRACT: Transcription control networks have a scale-free topological structure: While most genes are involved in a reduced number of links, a few hubs or key regulators are connected to a significantly large number of nodes. Several methods have been developed for the reconstruction of these networks from gene expression data, e.g. ARACNE. However, few of them take into account the scale-free structure of transcription networks. In this paper, we focus on the hubs that commonly appear in scale-free networks. First, three feature selection methods are proposed for the identification of those genes that are likely to be hubs and second, we introduce an improvement in ARACNE so that this technique can take into account the list of hub genes generated by the feature selection methods. Experiments with synthetic gene expression data validate the accuracy of the feature selection methods in the task of identifying hub genes. When ARACNE is combined with the output of these methods, we achieve up to a 62% improvement in performance over the original reconstruction algorithm. Finally, the best method for identifying hub genes is validated on a set of expression profiles from yeast.Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part I; 01/2010