[show abstract][hide abstract] ABSTRACT: Recent studies of cellular networks have revealed modular organizations of genes and proteins. For example, in interactome networks, a module refers to a group of interacting proteins that form molecular complexes and/or biochemical pathways and together mediate a biological process. However, it is still poorly understood how biological information is transmitted between different modules. We have developed information flow analysis, a new computational approach that identifies proteins central to the transmission of biological information throughout the network. In the information flow analysis, we represent an interactome network as an electrical circuit, where interactions are modeled as resistors and proteins as interconnecting junctions. Construing the propagation of biological signals as flow of electrical current, our method calculates an information flow score for every protein. Unlike previous metrics of network centrality such as degree or betweenness that only consider topological features, our approach incorporates confidence scores of protein-protein interactions and automatically considers all possible paths in a network when evaluating the importance of each protein. We apply our method to the interactome networks of Saccharomyces cerevisiae and Caenorhabditis elegans. We find that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the protein's information flow score. Even among proteins of low degree or low betweenness, high information scores serve as a strong predictor of loss-of-function lethality or pleiotropy. The correlation between information flow scores and phenotypes supports our hypothesis that the proteins of high information flow reside in central positions in interactome networks. We also show that the ranks of information flow scores are more consistent than that of betweenness when a large amount of noisy data is added to an interactome. Finally, we combine gene expression data with interaction data in C. elegans and construct an interactome network for muscle-specific genes. We find that genes that rank high in terms of information flow in the muscle interactome network but not in the entire network tend to play important roles in muscle function. This framework for studying tissue-specific networks by the information flow model can be applied to other tissues and other organisms as well.
[show abstract][hide abstract] ABSTRACT: Pleiotropy refers to the phenomenon in which a single gene controls several distinct, and seemingly unrelated, phenotypic effects. We use C. elegans early embryogenesis as a model to conduct systematic studies of pleiotropy. We analyze high-throughput RNA interference (RNAi) data from C. elegans and identify "phenotypic signatures", which are sets of cellular defects indicative of certain biological functions. By matching phenotypic profiles to our identified signatures, we assign genes with complex phenotypic profiles to multiple functional classes. Overall, we observe that pleiotropy occurs extensively among genes involved in early embryogenesis, and a small proportion of these genes are highly pleiotropic. We hypothesize that genes involved in early embryogenesis are organized into partially overlapping functional modules, and that pleiotropic genes represent "connectors" between these modules. In support of this hypothesis, we find that highly pleiotropic genes tend to reside in central positions in protein-protein interaction networks, suggesting that pleiotropic genes act as connecting points between different protein complexes or pathways.