About
69
Publications
50,361
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
62,236
Citations
Introduction
Skills and Expertise
Additional affiliations
November 2012 - March 2013
April 2009 - October 2012
October 2008 - April 2009
Publications
Publications (69)
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING dat...
Proteins cooperate, regulate and bind each other to achieve their functions. Understanding the complex network of their interactions is essential for a systems-level description of cellular processes. The STRING database compiles, scores and integrates protein–protein association information drawn from experimental assays, computational predictions...
Protein–protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable iden...
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently o...
The "Protein Abundances Across Organisms" database (PaxDb) is an integrative meta-resource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts, and expresses abundance values for each protein in "parts per million" (ppm) in...
Background: Protein-protein interactions play essential roles in almost all biological processes. The binding interfaces between interacting proteins impose evolutionary constraints, leading to co-evolutionary signals that have successfully been employed to predict protein interactions from multiple sequence alignments (MSAs). During the constructi...
The "Protein Abundances Across Organisms" database (PaxDB) is an integrative meta-resource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDB focuses on computing best-estimate abundances for proteins in normal/healthy contexts, and expresses abundance values for each protein in "parts per million" (ppm) in...
Motivation
Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer and other diseases. Switches in the expression of most dominant alternative isoforms can alter protein interaction networks of associated genes giving rise to disease and disease progression. Here, we present CanIsoNet,...
Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING databas...
The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 53...
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING...
A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existi...
Motivation
Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer. Switches in the expression of alternative isoforms can alter protein interaction networks of associated genes giving rise to cancer progression and metastases. We have recently analyzed the pathogenic impact of switchi...
Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC compleme...
Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existi...
Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interac...
Under normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). To address the pathogenic impact of these switches, we have analyzed isoform-s...
The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficu...
Background
An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in...
Kinase and phosphatase overexpression drives tumorigenesis and drug resistance. We previously developed a mass-cytometry-based single-cell proteomics approach that enables quantitative assessment of overexpression effects on cell signaling. Here, we applied this approach in a human kinome- and phosphatome-wide study to assess how 649 individually o...
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING da...
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral...
As viruses continue to pose risks to global health, having a better understanding of virus–host protein–protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein–protein interaction database specifically catering to virus–virus and virus–host interactions. This database combines evidence f...
Background: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant i...
As viruses continue to pose risks to global health, having a better understanding of virus-host protein-protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein-protein interaction database specifically catering to virus-virus and virus-host interactions. This database combines evidence f...
Kinase and phosphatase overexpression drives tumorigenesis and drug resistance in many cancer types. Signaling networks reprogrammed by protein overexpression remain largely uncharacterized, hindering discovery of paths to therapeutic intervention. We previously developed a single cell proteomics approach based on mass cytometry that enables quanti...
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g. new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome anno...
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physi...
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively in accessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functiona...
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess t...
Interactions between proteins and small molecules are an integral part of biological processes in living organisms. Information
on these interactions is dispersed over many databases, texts and prediction methods, which makes it difficult to get a comprehensive
overview of the available evidence. To address this, we have developed STITCH (‘Search T...
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated
and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating
OGs across taxonomic levels, making nested groups hierarchically consistent. This allows...
Years of meticulous curation of scientific literature and increasingly reliable computational predictions have resulted in creation of vast databases of protein interaction data. Over the years, these repositories have become a basic framework in which experiments are analyzed and new directions of research are explored. Here we present an overview...
Protein quantification at proteome-wide scale is an important aim, enabling insights into fundamental cellular biology and serving to constrain experiments and theoretical models. While proteome-wide quantification is not yet fully routine, many datasets approaching proteome-wide coverage are becoming available through biophysical and mass-spectrom...
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their
systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions
are scattered over multiple resources, and the available data exhibit notable differences in terms...
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehe...
STITCH is a database of protein-chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the pr...
The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr bp). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude....
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone
towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable
progress has been made—particularly for certain model organisms and functional systems. Current...
Gene duplicates generated via retroposition were long thought to be pseudogenized and consequently decayed. However, a significant number of these genes escaped their evolutionary destiny and evolved into functional genes. Despite multiple studies, the number of functional retrogenes in human and other genomes remains unclear. We performed a compar...
Transcription factors (TFs) have long been known to be principally activators of transcription in eukaryotes and prokaryotes. The growing awareness of the ubiquity of microRNAs (miRNAs) as suppressive regulators in eukaryotes, suggests the possibility of a mutual, preferential, self-regulatory connectivity between miRNAs and TFs. Here we investigat...
human_miRNA_TF_net_ensg.tdf. Tab delimited text file containing all links in the human predicted miRNA:TF network. First column is start node, second column is end node, last column is 1 or 2 depending whether the link is activating (1 transcription factor binding site) or suppressing (2 miRNA binding site). Gene identifiers are ENSEMBL IDs and miR...
mouse_miRNA_TF_net_enmusg.tdf. Tab delimited text file containing all links in the mouse predicted miRNA:TF network. First column is start node, second column is end node, last column is 1 or 2 depending whether the link is activating (1 transcription factor binding site) or suppressing (2 miRNA binding site). Gene identifiers are ENSEMBL IDs and m...
supplementary_tables.xlsx. This file contains source data, additional graphs, and calculated values for this work in 15 Excel sheets: Text-mining data, text mining by eggNOG categories, text mining by GO categories, TarBase scoring system, TarBase data scored, TarBase score vs.TF enrichment, TarBase by eggNOG categories, TarBase by GO categories, p...
We used high-sensitivity, high-resolution tandem mass spectrometry to shotgun sequence ancient protein remains extracted from a 43 000 year old woolly mammoth ( Mammuthus primigenius ) bone preserved in the Siberian permafrost. For the first time, 126 unique protein accessions, mostly low-abundance extracellular matrix and plasma proteins, were con...
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper
phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology
assig...
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of
interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version,
the number of chemicals with interactions and the number of high-confidence interactions both increas...
The tip of a projectile point made of mastodon bone is embedded in a rib of a single disarticulated mastodon at the Manis
site in the state of Washington. Radiocarbon dating and DNA analysis show that the rib is associated with the other remains
and dates to 13,800 years ago. Thus, osseous projectile points, common to the Beringian Upper Paleolithi...
There is a growing recognition of the importance of protein kinases in the control of alternative splicing. To define the underlying regulatory mechanisms, highly selective inhibitors are needed. Here, we report the discovery and characterization of the dichloroindolyl enaminonitrile KH-CB19, a potent and highly specific inhibitor of the CDC2-like...
The covalent attachment of ubiquitin to proteins regulates numerous processes in eukaryotic cells. Here we report the identification of 753 unique lysine ubiquitylation sites on 471 proteins using higher-energy collisional dissociation on the LTQ Orbitrap Velos. In total 5756 putative ubiquitin substrates were identified. Lysine residues targeted b...
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public effo...
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete ge...
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete ge...
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily
increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various
databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the...