About
41
Publications
7,983
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,372
Citations
Introduction
I'm interested in the evolution and molecular detail of biological systems. I use computational methods to answer biological questions born of these interests.
Additional affiliations
August 2013 - May 2018
January 2010 - July 2013
Education
September 2013 - May 2018
September 2009 - May 2013
Publications
Publications (41)
Optimal growth temperature is a complex trait involving many cellular components, and its physiology is not yet fully understood. Evolution of continuous characters, such as optimal growth temperature, is often modeled as a one-dimensional random walk, but such a model may be an oversimplification given the complex processes underlying the evolutio...
Protein-protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. B...
Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. This approach requires extensive computational pipelines that integrate multiple tools, databases and extensive data processing steps. We present the EVcouplings framework, a fully integ...
Clustered protocadherins are a large family of paralogous proteins that play important roles in neuronal development. The more than 50 clustered protocadherin isoforms have remarkable homophilic specificity for interactions between cellular surfaces that is controlled by a large antiparallel dimer interface formed by the first four extracellular ca...
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutation...
Bacteria from the orders Bacillales and Clostridiales differentiate into stress-resistant spores that can remain dormant for years, yet rapidly germinate upon nutrient sensing. How spores monitor nutrients is poorly understood but in most cases requires putative membrane receptors. The prototypical receptor from Bacillus subtilis consists of three...
Bacteria from the orders Bacillales and Clostridiales differentiate into stress-resistant spores that can remain dormant for years, yet rapidly germinate upon nutrient sensing. How spores monitor nutrients is poorly understood but in most cases requires putative membrane receptors. The prototypical receptor from Bacillus subtilis consists of three...
Proteins often accumulate neutral mutations that do not affect current functions ¹ but can profoundly influence future mutational possibilities and functions 2–4 . Understanding such hidden potential has major implications for protein design and evolutionary forecasting 5–7 , but has been limited by a lack of systematic efforts to identify potentia...
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale int...
The shape, elongation, division and sporulation (SEDS) proteins are a highly conserved family of transmembrane glycosyltransferases that work in concert with class B penicillin-binding proteins (bPBPs) to build the bacterial peptidoglycan cell wall1,2,3,4,5,6. How these proteins coordinate polymerization of new glycan strands with their crosslinkin...
The majority of protein interactions in most organisms are unknown, and experimental methods for determining protein interactions can yield divergent results. Here we use an orthogonal, purely computational method based on sequence coevolution to discover protein interactions at large scale. In the model organism Escherichia coli, 53% of protein pa...
Clustered protocadherins, a large family of paralogous proteins that play important roles in neuronal development, provide an important case study of interaction specificity in a large eukaryotic protein family. A mammalian genome has more than 50 clustered protocadherin isoforms, which have remarkable homophilic specificity for interactions betwee...
Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments,...
The shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work has revealed that the prototypical SEDS family member RodA is a peptidogly...
Bacteria can evolve rapidly under positive selection owing to their vast numbers, allowing their genes to diversify by adapting to different environments. We asked whether the same genes that evolve rapidly in the long-term evolution experiment with Escherichia coli (LTEE) have also diversified extensively in nature. To make this comparison, we ide...
Bacteria can evolve rapidly under positive selection owing to their vast numbers, allowing their genes to diversify by adapting to different environments. We asked whether the same genes that are fast evolving in the long-term evolution experiment with Escherichia coli (LTEE) have also diversified extensively in nature. We identified ~2000 core gen...
Background:
Treatment of Neisseria gonorrhoeae infection is empiric and based on population-wide susceptibilities. Increasing antimicrobial resistance underscores the potential importance of rapid diagnostics, including sequence-based tests, to guide therapy. However, the utility of sequence-based diagnostics depends on the prevalence and dynamics...
Evolutionary couplings from the clustered Pcdh alignment.DOI:
http://dx.doi.org/10.7554/eLife.18449.016
Alignment of non-clustered Pcdhs EC1-4.DOI:
http://dx.doi.org/10.7554/eLife.18449.013
Statistics for PcdhγB3 EC1-4 structure.DOI:
http://dx.doi.org/10.7554/eLife.18449.004
Evolutionary couplings from the non-clustered Pcdh alignment.DOI:
http://dx.doi.org/10.7554/eLife.18449.014
Alignment of clustered Pcdhs EC1-4.DOI:
http://dx.doi.org/10.7554/eLife.18449.015
Significance
Parasitic interactions can result in changes to the host’s behavior in a way that promotes the distribution or life cycle of the parasite. Inteins are molecular parasites found in all three domains of life. Here we look at the influence of an intein in the DNA polymerase on a population of halophilic archaea in simulations, in experime...
Protocadherins (Pcdhs) are cell adhesion and signaling proteins used by neurons to develop and maintain neuronal networks, relying on trans homophilic interactions between their extracellular cadherin (EC) repeat domains. We present the structure of the antiparallel EC1-4 homodimer of human PcdhYB3, a member of the γ subfamily of clustered Pcdhs. S...
Benchmark data set and results.
DOI:
http://dx.doi.org/10.7554/eLife.03430.024
PDB identifiers used for comparison of predicted evolutionary couplings to known 3D structures.
DOI:
http://dx.doi.org/10.7554/eLife.03430.030
De novo prediction data set and results.
DOI:
http://dx.doi.org/10.7554/eLife.03430.025
Docking results.
DOI:
http://dx.doi.org/10.7554/eLife.03430.026
Predicted inter-ECs for complexes in de novo prediction data set with EVcomplex score ≥0.8.
DOI:
http://dx.doi.org/10.7554/eLife.03430.027
ATP synthase interaction predictions.
DOI:
http://dx.doi.org/10.7554/eLife.03430.028
Comparison of ATP synthase EVcomplex predictions of a and b subunit with cross-linking studies.
DOI:
http://dx.doi.org/10.7554/eLife.03430.029
The bacterial genomes of Thermotoga species show evidence of significant interdomain horizontal gene transfer from the Archaea. Members of this genus acquired many genes from the Thermococcales, which grow at higher temperatures than Thermotoga species. In order to study the functional history of an interdomain horizontally acquired gene we used an...
Here we describe the genome of Mesotoga prima MesG1.Ag4.2, the first genome of a mesophilic Thermotogales bacterium. Mesotoga prima was isolated from a polychlorinated biphenyl (PCB)-dechlorinating enrichment culture from Baltimore Harbor sediments. Its 2.97 Mb genome is considerably larger than any previously sequenced Thermotogales genomes, which...
Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life gen...
Minimum parsimony counts supporting each of the possible trees (A) and rings (B). The lowest count is used to determine if the data supports a tree or a ring [3]. In the original analyses by Lake [2], the best ring had a minimum parsimony count of 581 versus 625 for the best supported tree (first column). Best supported trees or rings for each test...
List of all possible trees and rings for five taxa sampling. Each possible tree and ring is listed with the compatible presence-absence pattern of gene families (Pfam) given in Figure 1. For example, the tree and ring corresponding to ABCDR are shown at the left of each table. A corresponds to Actinobacteria, B to Bacilli, C to Clostridia, D for do...
In 2009, James Lake introduced a new hypothesis in which reticulate phylogeny reconstruction is used to elucidate the origin of gram-negative bacteria (Nature 460: 967-971). The presented data supported the gram-negative bacteria originating from an ancient endosymbiosis between the Actinobacteria and Clostridia. His conclusion was based on a prese...