Three-dimensional reconstruction of protein networks provides insight into human genetic disease.

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.
Nature Biotechnology (Impact Factor: 39.08). 01/2012; 30(2):159-64. DOI: 10.1038/nbt.2106
Source: PubMed

ABSTRACT To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense point mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific interactions is critical to understanding pathogenesis. Structurally resolved interaction networks should be valuable tools for interpreting the wealth of data being generated by large-scale structural genomics and disease association studies.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.
    Molecular Systems Biology 12/2014; 10(12). · 14.10 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mutations in genes potentially lead to a number of genetic diseases with differing severity. These disease genes have been the focus of research in recent years showing that the disease gene population as a whole is not homogeneous, and can be categorized according to their interactions. Locus heterogeneity describes a single disorder caused by mutations in different genes each acting individually to cause the same disease. Using datasets of experimentally derived human disease genes and protein interactions, we created a protein interaction network to investigate the relationships between the products of genes associated with a disease displaying locus heterogeneity, and use network parameters to suggest properties that distinguish these disease genes from the overall disease gene population. Through the manual curation of known causative genes of 100 diseases displaying locus heterogeneity and 397 single-gene Mendelian disorders, we use network parameters to show that our locus heterogeneity network displays distinct properties from the global disease network and a Mendelian network. Using the global human proteome, through random simulation of the network we show that heterogeneous genes display significant interconnectivity. Further topological analysis of this network revealed clustering of locus heterogeneity genes that cause identical disorders, indicating that these disease genes are involved in similar biological processes. We then use this information to suggest additional genes that may contribute to diseases with locus heterogeneity.
    Frontiers in Genetics 12/2014; 5:434.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The promise of personalized cancer medicine cannot be fulfilled until we gain better understanding of the connections between the genomic makeup of a patient's tumor and its response to anticancer drugs. Several datasets that include both pharmacologic profiles of cancer cell lines as well as their genomic alterations have been recently developed and extensively analyzed. However, most analyses of these datasets assume that mutations in a gene will have the same consequences regardless of their location. While this assumption might be correct in some cases, such analyses may miss subtler, yet still relevant, effects mediated by mutations in specific protein regions. Here we study such perturbations by separating effects of mutations in different protein functional regions (PFRs), including protein domains and intrinsically disordered regions. Using this approach, we have been able to identify 171 novel associations between mutations in specific PFRs and changes in the activity of 24 drugs that couldn't be recovered by traditional gene-centric analyses. Our results demonstrate how focusing on individual protein regions can provide novel insights into the mechanisms underlying the drug sensitivity of cancer cell lines. Moreover, while these new correlations are identified using only data from cancer cell lines, we have been able to validate some of our predictions using data from actual cancer patients. Our findings highlight how gene-centric experiments (such as systematic knock-out or silencing of individual genes) are missing relevant effects mediated by perturbations of specific protein regions. All the associations described here are available from
    PLoS Computational Biology 01/2015; 11(1):e1004024. · 4.83 Impact Factor


Available from