On Scale-Free Prior Distributions and Their Applicability in Large-Scale Network Inference with Gaussian Graphical Models
DOI: 10.1007/978-3-642-02466-5_9 Conference: Complex Sciences, First International Conference, Complex 2009, Shanghai, China, February 23-25, 2009. Revised Papers, Part 1
This paper concerns the specification, and performance, of scale-free prior distributions with a view toward large-scale network
inference from small-sample data sets. We devise three scale-free priors and implement them in the framework of Gaussian graphical
models. Gaussian graphical models are used in gene network inference where high-throughput data describing a large number
of variables with comparatively few samples are frequently analyzed by practitioners. And, although there is a consensus that
many such networks are scale-free, the modus operandi is to assign a random network prior. Simulations demonstrate that the scale-free priors outperform the random network prior
at recovering scale-free trees with degree exponents near 2, such as are characteristic of many real-world systems. On the
other hand, the random network prior compares favorably at recovering scale-free trees characterized by larger degree exponents.
Available from: Tjeerd Dijkstra
[Show abstract] [Hide abstract]
ABSTRACT: Transcription control networks have a scale-free topological structure: While most genes are involved in a reduced number
of links, a few hubs or key regulators are connected to a significantly large number of nodes. Several methods have been developed
for the reconstruction of these networks from gene expression data, e.g. ARACNE. However, few of them take into account the
scale-free structure of transcription networks. In this paper, we focus on the hubs that commonly appear in scale-free networks.
First, three feature selection methods are proposed for the identification of those genes that are likely to be hubs and second,
we introduce an improvement in ARACNE so that this technique can take into account the list of hub genes generated by the
feature selection methods. Experiments with synthetic gene expression data validate the accuracy of the feature selection
methods in the task of identifying hub genes. When ARACNE is combined with the output of these methods, we achieve up to a
62% improvement in performance over the original reconstruction algorithm. Finally, the best method for identifying hub genes
is validated on a set of expression profiles from yeast.
Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part I; 01/2010
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.