[Show abstract][Hide abstract] ABSTRACT: Recent studies of cellular networks have revealed modular organizations of genes and proteins. For example, in interactome networks, a module refers to a group of interacting proteins that form molecular complexes and/or biochemical pathways and together mediate a biological process. However, it is still poorly understood how biological information is transmitted between different modules. We have developed information flow analysis, a new computational approach that identifies proteins central to the transmission of biological information throughout the network. In the information flow analysis, we represent an interactome network as an electrical circuit, where interactions are modeled as resistors and proteins as interconnecting junctions. Construing the propagation of biological signals as flow of electrical current, our method calculates an information flow score for every protein. Unlike previous metrics of network centrality such as degree or betweenness that only consider topological features, our approach incorporates confidence scores of protein-protein interactions and automatically considers all possible paths in a network when evaluating the importance of each protein. We apply our method to the interactome networks of Saccharomyces cerevisiae and Caenorhabditis elegans. We find that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the protein's information flow score. Even among proteins of low degree or low betweenness, high information scores serve as a strong predictor of loss-of-function lethality or pleiotropy. The correlation between information flow scores and phenotypes supports our hypothesis that the proteins of high information flow reside in central positions in interactome networks. We also show that the ranks of information flow scores are more consistent than that of betweenness when a large amount of noisy data is added to an interactome. Finally, we combine gene expression data with interaction data in C. elegans and construct an interactome network for muscle-specific genes. We find that genes that rank high in terms of information flow in the muscle interactome network but not in the entire network tend to play important roles in muscle function. This framework for studying tissue-specific networks by the information flow model can be applied to other tissues and other organisms as well.
[Show abstract][Hide abstract] ABSTRACT: Author Summary
In a biological system, some genes play single roles while others perform multiple functions. How can we determine which genes are multi-functional? An informative way for probing gene functions is to eliminate the expression of a given gene and observe the phenotypic consequences. RNAi techniques have enabled the generation of genome-wide phenotypic data. Conventionally, genes are clustered into mutually exclusive categories according to the observed defects following RNAi. However, assigning genes that may play multiple roles exclusively into a single category is arbitrary. This paper works out a computational approach that categorizes genes while allowing assignment of genes with complex phenotypes into multiple categories. We apply this approach to genes involved in cell divisions of C. elegans early embryos, and find that about half of these genes can be assigned to more than one functional category. This approach has allowed the identification of previously undiscovered gene functions. We also find that genes playing many roles in early embryos tend to reside in central positions in protein networks. Our approach can be used to perform functional annotations based on phenotypic data in other systems and to identify genes that coordinate multiple biological functions.