Article

GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network.

BMC Bioinformatics (impact factor: 2.75). 08/2012; 13(1):197. DOI:10.1186/1471-2105-13-197 pp.197
Source: PubMed

ABSTRACT BACKGROUND: Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state of the art pathway-based sets. RESULTS: The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. This semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. CONCLUSIONS: The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods.

0 0
 · 
0 Bookmarks
 · 
26 Views

Keywords

analysis methods
 
crucial part
 
current state
 
extant biological knowledge
 
false negatives
 
false positive rates
 
Gene Set Analysis
 
generation methods
 
hypergeometric enrichment test
 
metabolic network
 
microarray analysis
 
poor reflection
 
Principal Component Analysis
 
semi-exhaustive nature
 
set generation step
 
significant aspects
 
statistical tests
 
topological complexity
 
underlying metabolic network
 
useful approach
 

Dan Jacobson