About the lab

Modern biology increasingly relies on high-throughput techniques. This trend challenges computational biologists to quickly extract as much useful information from the data as possible. In the genomic sense, this primarily implies correlating phenotypic differences with observed nucleotide sequence variations. On the protein side the challenge generally is to annotate protein function at reasonable accuracy levels. The whole organism level, then incorporates all types of evidence to annotate evolutionary history, current health conditions, and prognosed phenotypic changes.

We believe that nucleic and amino acid sequences contain a large portion of the information necessary to address both of these directions. However, we are always willing to supplement this data with other sources availa...

Featured projects (3)

Project
Building a more accurate predictor to evaluate the impact of amino acid variant effects on overall protein function
Project
HPC solutions and infrastructure to use compute large amounts of jobs in parallel on multiple, heterogenous cluster environments.

Featured research (15)

Motivation Metal binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability. Results We developed a novel machine learning-based method, mebipred, for identifying metal binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of eleven ubiquitously-present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments, and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal binding proteins. These results highlight mebipred’s utility in analysing microbiome metal requirements. Availability mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/. Supplementary information Supplementary data are available from Bioinformatics online repository. Additional data is available from http://dx.doi.org/10.5281/zenodo.5722730 and http://dx.doi.org/10.5281/zenodo.6332940.
Biological redox reactions drive planetary biogeochemical cycles. Using a novel, structure-guided sequence analysis of proteins, we explored the patterns of evolution of enzymes responsible for these reactions. Our analysis reveals that the folds that bind transition metal–containing ligands have similar structural geometry and amino acid sequences across the full diversity of proteins. Similarity across folds reflects the availability of key transition metals over geological time and strongly suggests that transition metal–ligand binding had a small number of common peptide origins. We observe that structures central to our similarity network come primarily from oxidoreductases, suggesting that ancestral peptides may have also facilitated electron transfer reactions. Last, our results reveal that the earliest biologically functional peptides were likely available before the assembly of fully functional protein domains over 3.8 billion years ago.Thus, life is a special, very complex form of motion of matter, but this form did not always exist, and it is not separated from inorganic nature by an impassable abyss; rather, it arose from inorganic nature as a new property in the process of evolution of the world. We must study the history of this evolution if we want to solve the problem of the origin of life. [A. I. Oparin (1)]
Metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability. We developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is nearly 90% accurate in recognizing proteins that bind metal ions and ion containing ligands. Moreover, the identity of ten ubiquitously present metal ions and ion-containing ligands can be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and outperforms other prediction methods, both in speed and accuracy. mebipred can also identify protein metal-binding capabilities from short sequence stretches and, thus, may be useful for the annotation of metagenomic samples metal requirements inferred from translated sequencing reads. We performed an analysis of microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human-hosted microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results are highlight mebipreds utility in analyzing microbiome metal requirements. mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/
Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.

Lab head

Yana Bromberg
Department
  • Department of Biochemistry and Microbiology
About Yana Bromberg
  • I study how function is encoded in genetic data, whether by a single gene, a genome, or a metagenome. I use computational approaches to correlate genome variation to phenotype, identify the specifics of molecular functions, and elucidate complex system interactions. My lab researches: (1) pathogenesis pathways via mutational load analysis; (2) origins of electron-transfer reactions by looking for functional similarity of metal-binding folds; and (3) emergent microbiome functionality.

Members (5)

Maximilian Miller
  • Rutgers, The State University of New Jersey
Yannick Mahlich
  • Technische Universität München
Ariel Alejandro Aptekmann
  • Rutgers, The State University of New Jersey
Zishuo Zeng
  • Rutgers, The State University of New Jersey