Damian Szklarczyk

Damian Szklarczyk
Swiss Institute of Bioinformatics · Institute of Molecular and Life Sciences, University of Zurich

PhD

About

69
Publications
50,361
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
62,236
Citations
Additional affiliations
November 2012 - March 2013
University of Copenhagen
Position
  • PostDoc Position
April 2009 - October 2012
University of Copenhagen
Position
  • PhD Student
October 2008 - April 2009
European Molecular Biology Laboratory
Position
  • Research Assistant

Publications

Publications (69)
Preprint
Full-text available
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING dat...
Article
Proteins cooperate, regulate and bind each other to achieve their functions. Understanding the complex network of their interactions is essential for a systems-level description of cellular processes. The STRING database compiles, scores and integrates protein–protein association information drawn from experimental assays, computational predictions...
Article
Full-text available
Protein–protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable iden...
Article
Full-text available
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently o...
Article
The "Protein Abundances Across Organisms" database (PaxDb) is an integrative meta-resource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts, and expresses abundance values for each protein in "parts per million" (ppm) in...
Preprint
Full-text available
Background: Protein-protein interactions play essential roles in almost all biological processes. The binding interfaces between interacting proteins impose evolutionary constraints, leading to co-evolutionary signals that have successfully been employed to predict protein interactions from multiple sequence alignments (MSAs). During the constructi...
Preprint
Full-text available
The "Protein Abundances Across Organisms" database (PaxDB) is an integrative meta-resource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDB focuses on computing best-estimate abundances for proteins in normal/healthy contexts, and expresses abundance values for each protein in "parts per million" (ppm) in...
Article
Full-text available
Motivation Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer and other diseases. Switches in the expression of most dominant alternative isoforms can alter protein interaction networks of associated genes giving rise to disease and disease progression. Here, we present CanIsoNet,...
Article
Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING databas...
Article
Full-text available
The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 53...
Article
Full-text available
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING...
Article
Full-text available
A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existi...
Preprint
Full-text available
Motivation Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer. Switches in the expression of alternative isoforms can alter protein interaction networks of associated genes giving rise to cancer progression and metastases. We have recently analyzed the pathogenic impact of switchi...
Article
Full-text available
Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC compleme...
Preprint
Full-text available
Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existi...
Article
Full-text available
Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interac...
Article
Full-text available
Under normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). To address the pathogenic impact of these switches, we have analyzed isoform-s...
Article
Full-text available
The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficu...
Article
Full-text available
Background An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in...
Article
Full-text available
Kinase and phosphatase overexpression drives tumorigenesis and drug resistance. We previously developed a mass-cytometry-based single-cell proteomics approach that enables quantitative assessment of overexpression effects on cell signaling. Here, we applied this approach in a human kinome- and phosphatome-wide study to assess how 649 individually o...
Article
Full-text available
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING da...
Article
Full-text available
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral...
Article
Full-text available
As viruses continue to pose risks to global health, having a better understanding of virus–host protein–protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein–protein interaction database specifically catering to virus–virus and virus–host interactions. This database combines evidence f...
Preprint
Full-text available
Background: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant i...
Preprint
As viruses continue to pose risks to global health, having a better understanding of virus-host protein-protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein-protein interaction database specifically catering to virus-virus and virus-host interactions. This database combines evidence f...
Preprint
Full-text available
Kinase and phosphatase overexpression drives tumorigenesis and drug resistance in many cancer types. Signaling networks reprogrammed by protein overexpression remain largely uncharacterized, hindering discovery of paths to therapeutic intervention. We previously developed a single cell proteomics approach based on mass cytometry that enables quanti...
Article
Full-text available
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g. new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome anno...
Article
Full-text available
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physi...
Preprint
Full-text available
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively in accessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functiona...
Article
Full-text available
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess t...
Article
Full-text available
Interactions between proteins and small molecules are an integral part of biological processes in living organisms. Information on these interactions is dispersed over many databases, texts and prediction methods, which makes it difficult to get a comprehensive overview of the available evidence. To address this, we have developed STITCH (‘Search T...
Article
Full-text available
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows...
Article
Years of meticulous curation of scientific literature and increasingly reliable computational predictions have resulted in creation of vast databases of protein interaction data. Over the years, these repositories have become a basic framework in which experiments are analyzed and new directions of research are explored. Here we present an overview...
Article
Full-text available
Protein quantification at proteome-wide scale is an important aim, enabling insights into fundamental cellular biology and serving to constrain experiments and theoretical models. While proteome-wide quantification is not yet fully routine, many datasets approaching proteome-wide coverage are becoming available through biophysical and mass-spectrom...
Article
Full-text available
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms...
Article
Full-text available
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehe...
Article
Full-text available
STITCH is a database of protein-chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the pr...
Article
Full-text available
The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr bp). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude....
Article
Full-text available
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for certain model organisms and functional systems. Current...
Article
Full-text available
Gene duplicates generated via retroposition were long thought to be pseudogenized and consequently decayed. However, a significant number of these genes escaped their evolutionary destiny and evolved into functional genes. Despite multiple studies, the number of functional retrogenes in human and other genomes remains unclear. We performed a compar...
Article
Full-text available
Transcription factors (TFs) have long been known to be principally activators of transcription in eukaryotes and prokaryotes. The growing awareness of the ubiquity of microRNAs (miRNAs) as suppressive regulators in eukaryotes, suggests the possibility of a mutual, preferential, self-regulatory connectivity between miRNAs and TFs. Here we investigat...
Data
human_miRNA_TF_net_ensg.tdf. Tab delimited text file containing all links in the human predicted miRNA:TF network. First column is start node, second column is end node, last column is 1 or 2 depending whether the link is activating (1 transcription factor binding site) or suppressing (2 miRNA binding site). Gene identifiers are ENSEMBL IDs and miR...
Data
mouse_miRNA_TF_net_enmusg.tdf. Tab delimited text file containing all links in the mouse predicted miRNA:TF network. First column is start node, second column is end node, last column is 1 or 2 depending whether the link is activating (1 transcription factor binding site) or suppressing (2 miRNA binding site). Gene identifiers are ENSEMBL IDs and m...
Data
supplementary_tables.xlsx. This file contains source data, additional graphs, and calculated values for this work in 15 Excel sheets: Text-mining data, text mining by eggNOG categories, text mining by GO categories, TarBase scoring system, TarBase data scored, TarBase score vs.TF enrichment, TarBase by eggNOG categories, TarBase by GO categories, p...
Article
Full-text available
We used high-sensitivity, high-resolution tandem mass spectrometry to shotgun sequence ancient protein remains extracted from a 43 000 year old woolly mammoth ( Mammuthus primigenius ) bone preserved in the Siberian permafrost. For the first time, 126 unique protein accessions, mostly low-abundance extracellular matrix and plasma proteins, were con...
Article
Full-text available
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assig...
Article
Full-text available
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increas...
Article
Full-text available
The tip of a projectile point made of mastodon bone is embedded in a rib of a single disarticulated mastodon at the Manis site in the state of Washington. Radiocarbon dating and DNA analysis show that the rib is associated with the other remains and dates to 13,800 years ago. Thus, osseous projectile points, common to the Beringian Upper Paleolithi...
Article
Full-text available
There is a growing recognition of the importance of protein kinases in the control of alternative splicing. To define the underlying regulatory mechanisms, highly selective inhibitors are needed. Here, we report the discovery and characterization of the dichloroindolyl enaminonitrile KH-CB19, a potent and highly specific inhibitor of the CDC2-like...
Article
Full-text available
The covalent attachment of ubiquitin to proteins regulates numerous processes in eukaryotic cells. Here we report the identification of 753 unique lysine ubiquitylation sites on 471 proteins using higher-energy collisional dissociation on the LTQ Orbitrap Velos. In total 5756 putative ubiquitin substrates were identified. Lysine residues targeted b...
Article
Full-text available
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public effo...
Article
Full-text available
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete ge...
Article
Full-text available
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete ge...
Article
Full-text available
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the...

Network

Cited By