Amos Bairoch

Amos Bairoch
University of Geneva | UNIGE · Department of Microbiology and Molecular Medicine (MIMOL)

PhD

About

303
Publications
110,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
87,224
Citations
Introduction
Amos Bairoch currently works at the Faculty of Medicine of the University of Geneva and the Swiss Institute of Bioinformatics (SIB). Amos does research in Biocuration. His current projects are neXtProt and the Cellosaurus
Additional affiliations
April 2001 - present
University of Geneva
Position
  • Professor (Full)
April 1998 - present
Swiss Institute of Bioinformatics
Position
  • Group Leader

Publications

Publications (303)
Article
Full-text available
About 10% of human proteins have no annotated function in protein knowledge bases. A workflow to generate hypotheses for the function of these uncharacterized proteins has been developed, based on predicted and experimental information on protein properties, interactions, tissular expression, subcellular localization, conservation in other organism...
Article
Full-text available
The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics,...
Article
Full-text available
Central nervous system (CNS), notably brain, metastases are most prevalent in lung cancer (20–56% of patients), breast cancer (5–20%) and melanoma (7–16%). Lesions occur in both the brain parenchyma and the meninges. To mechanistically understand CNS metastasis formation and develop preventive and therapeutic strategies, it is essential to use mode...
Article
Spread of cancer to the brain remains an unmet clinical need in spite of the increasing number of cases among patients with lung, breast cancer and melanoma most notably. Although research on brain metastasis was considered a minor aspect in the past due to its untreatable nature and invariable lethality, nowadays limited but encouraging examples h...
Article
Full-text available
The Feature-Viewer is a lightweight library for the visualization of biological data mapped to a protein or nucleotide sequence. It is designed for ease of use while allowing for a full customization. The library is already used by several biological data resources and allows intuitive visual mapping of a full spectra of sequence features for diffe...
Article
Full-text available
The neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary...
Article
The ABCD (for AntiBodies Chemically Defined) database is a repository of sequenced antibodies, integrating curated information about the antibody and its antigen with cross-links to standardized databases of chemical and protein entities. It is freely available to the academic community, accessible through the ExPASy server (https://web.expasy.org/...
Article
Full-text available
Despite an increased awareness of the problematic of cell line cross‐contamination and misidentification, it remains nowadays a major source of erroneous experimental results in biomedical research. To prevent it, researchers are expected to frequently test the authenticity of the cell lines they are working on. STR profiling was selected as the in...
Preprint
Full-text available
This micro-review is intended to provide an up to date list of the cell lines that have flown in space, starting with HeLa in 1960. We coin a new term, cellonaut for cell lines that have been sent in earth orbit. As of June 2019, we were able to identify 52 different celllonauts. Cell lines that are cellonauts are annotated as such in the Cellosaur...
Article
Research in toxicology relies on in vitro models such as cell lines. These living models are prone to change and may be described in publications with insufficient information or quality control testing. This article sets out recommendations to improve the reliability of cell-based research.
Article
Full-text available
The use of misidentified and contaminated cell lines continues to be a problem in biomedical research. Research Resource Identifiers (RRIDs) should reduce the prevalence of misidentified and contaminated cell lines in the literature by alerting researchers to cell lines that are on the list of problematic cell lines, which is maintained by the Inte...
Data
Data on number of misidentified cell lines per year.
Data
Curator-SciScore-disagreement - the false negatives (33 papers) found by the curator.
Data
Data on problematic cell lines for all journals.
Data
List of problematic cell lines extracted from Cellosaurus Version 25 (March 2018).
Article
20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/protei...
Article
The practice of data sharing in the proteomics field took off and quickly spread in recent years as a result of collective effort. Nowadays, most journal editors mandate the submission of the original raw mass spectra to one of the databases member of the ProteomeXchange consortium. With the exception of large institutional initiatives such as Pept...
Article
Full-text available
Background Germline pathogenic variants in the breast cancer type 1 susceptibility gene BRCA1 are associated with a 60% lifetime risk for breast and ovarian cancer. This overall risk estimate is for all BRCA1 variants; obviously, not all variants confer the same risk of developing a disease. In cancer patients, loss of BRCA1 function in tumor tissu...
Article
Full-text available
The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosau...
Article
Full-text available
Protein kinases are a large family of enzymes catalyzing protein phosphorylation. The human genome contains 518 protein kinase genes, 478 of which belong to the classical protein kinase family and 40 are atypical protein kinases [...].
Article
Full-text available
Unambiguous cell line authentication is essential to avoid loss of association between data and cells. The risk for loss of references increases with the rapidity that new human pluripotent stem cell (hPSC) lines are generated, exchanged, and implemented. Ideally, a single name should be used as a generally applied reference for each cell line to a...
Article
Full-text available
Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Minin...
Article
Full-text available
Voltage-gated sodium channels are pore-forming transmembrane proteins that selectively allow sodium ions to flow across the plasma membrane according to the electro-chemical gradient thus mediating the rising phase of action potentials in excitable cells and playing key roles in physiological processes such as neurotransmission, skeletal muscle con...
Article
Full-text available
The neXtProt human protein knowledgebase (https://www.nextprot.org) continues to add new content and tools, with a focus on proteomics and genetic variation data. neXtProt now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community. Moreover, the neXtProt release 2016-08-25 includes over 800...
Article
Full-text available
Within the C-HPP, the French and Swiss teams are responsible for the annotation of proteins from chromosomes 14 and 2, respectively. neXtProt currently reports 1231 entries on chromosome 2 and 624 entries on chromosome 14; of these, 134 and 93 entries are still not experimentally validated and are thus considered as “missing proteins” (PE2-4), resp...
Data
The Cellosaurus is a knowledge resource on cell lines. It attempt to describe all cell lines used in biomedical research. - Immortalized cell lines - Naturally immortal cell lines (ie stem cell lines) - Finite life cell lines when those are distributed and used widely - Vertebrate cell lines with an emphasis on human, mouse and rat cell lines - In...
Article
Full-text available
Ion channels are transmembrane proteins that selectively allow ions to flow across the plasma membrane and play key roles in diverse biological processes. A multitude of diseases, called channelopathies, such as epilepsies, muscle paralysis, pain syndromes, cardiac arrhythmias or hypoglycemia are due to ion channel mutations. A wide corpus of liter...
Article
Full-text available
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of S...
Data
INTRODUCTION The Cellosaurus is a thesaurus of cell lines. It attempt to all cell lines used in biomedical research. Its scope includes: - Immortalized cell lines - Naturally immortal cell lines (example: stem cell lines) - Finite life cell lines when those are distributed and used widely - Vertebrate cell line with an emphasis on human, mouse...
Article
Full-text available
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from th...
Article
Full-text available
During 11–12 August 2014, a Protein Bioinformatics and Community Resources Retreat was held at the Wellcome Trust Genome Campus in Hinxton, UK. This meeting brought together the principal investigators of several specialized protein resources (such as CAZy, TCDB and MEROPS) as well as those from protein databases from the large Bioinformatics centr...
Data
The Cellosaurus is a thesaurus of cell lines. It attempt to all cell lines used in biomedical research. Its scope includes: - Immortalized cell lines - Naturally immortal cell lines (ie stem cell lines) - Finite life cell lines when those are distributed and used widely - Vertebrate cell line with an emphasis on human, mouse and rat cell lines -...
Article
The high throughput characterization of protein N-termini is becoming an emerging challenge in the proteomics and proteogenomics fields. The present study describes the free N-terminome analysis of human mitochondria-enriched samples using trimethoxyphenyl phosphonium (TMPP) labeling approaches. Owing to the extent of protein import and cleavage fo...
Article
Full-text available
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual papers, all the way to major reference databases with professional staff and long-term funding. Specialist protein...
Article
Full-text available
neXtProt (http://www.nextprot.org) is a human protein-centric knowledgebase developed at the SIB Swiss Institute of Bioinformatics. Focused solely on human proteins, neXtProt aims to provide a state of the art resource for the representation of human biology by capturing a wide range of data, precise annotations, fully traceable data provenance and...
Article
Full-text available
Mammalian mitochondria may contain up to 1,500 different proteins, and many of them have neither been confidently identified nor characterized. In this study, we demonstrate that C11orf83, which was lacking experimental characterization, is a mitochondrial inner membrane protein facing the intermembrane space. This protein is specifically associate...
Article
neXtProt provides a comprehensive knowledgebase on human proteins complemented by an extensive cross incorporation of annotations from many databases. With the diversity of published data, provenance information becomes critical to providing reliable and trustworthy services to scientists, thus the tracking of provenance in open, decentralized syst...
Article
Deoxyribose-phosphate aldolase (EC 4.1.2.4), which converts 2-deoxy-D-ribose-5-phosphate into glyceraldehyde-3-phosphate and acetaldehyde, belongs to the core metabolism of living organisms. It was previously shown that human cells harbor deoxyribose phosphate aldolase activity but the protein responsible of this activity has never been formally id...
Article
Understanding how genetic differences between individuals impact the regulation, expression, and ultimately function of proteins is an important step toward realizing the promise of personal medicine. There are several technical barriers hindering the transition of biological knowledge into the applications relevant to precision medicine. One impor...
Article
Full-text available
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This art...
Article
Full-text available
Data on enzyme activities and kinetics have often been reported with insufficient experimental detail to allow their repetition. This paper discusses the objectives and recommendations of the Standards for Reporting Enzyme Data (STRENDA) project to define minimal experimental standards for the reporting enzyme functional data.
Data
The cellosaurus is a thesaurus of cell lines. It attempt to list all cell lines used in biomedical research. Its scope includes: - Immortalized cell lines - Naturally immortal cell lines (ie stem cell lines) - Finite life cell lines when those are distributed and used widely - Vertebrate cell line with an emphasis on human, mouse and rat cell lines...
Article
Full-text available
Vertebrate genomes contain around 20,000 protein-encoding genes, of which a large fraction is still not associated with specific functions. A major task in future genomics will thus be to assign physiological roles to all open reading frames revealed by genome sequencing. Here we show that C2orf62, a highly conserved protein with little homology to...
Article
One year ago the Human Proteome Project (HPP) leadership designated the baseline metrics for the Human Proteome Project to be based on neXtProt with a total of 13 664 proteins validated at protein evidence level 1 (PE1) by mass spectrometry, antibody-capture, Edman sequencing, or 3D structures. Corresponding chromosome-specific data were provided f...
Article
Full-text available
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-...
Article
Full-text available
The methionine salvage pathway is widely distributed among some eubacteria, yeast, plants and animals and recycles the sulfur-containing metabolite 5-methylthioadenosine (MTA) to methionine. In eukaryotic cells, the methionine salvage pathway takes place in the cytosol and usually involves six enzymatic activities: MTA phosphorylase (MTAP, EC 2.4.2...
Article
We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of...
Article
About 5000 (25%) of the ∼20400 human protein-coding genes currently lack any experimental evidence at the protein level. For many others, there is only little information relative to their abundance, distribution, subcellular localization, interactions, or cellular functions. The aim of the HUPO Human Proteome Project (HPP, www.thehpp.org ) is to c...
Article
The objective of the international Chromosome-Centric Human Proteome Project (C-HPP) is to map and annotate all proteins encoded by the genes on each human chromosome. The C-HPP consortium was established to organize a collaborative network among the research teams responsible for protein mapping of individual chromosomes and to identify compelling...
Article
Full-text available
neXtProt (http://www.nextprot.org/) is a new human protein-centric knowledge platform. Developed at the Swiss Institute of Bioinformatics (SIB), it aims to help researchers answer questions relevant to human proteins. To achieve this goal, neXtProt is built on a corpus containing both curated knowledge originating from the UniProtKB/Swiss-Prot know...
Article
Full-text available
UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the ba...
Article
Full-text available
After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort wi...
Conference Paper
In the human proteome, about 5’000 proteins lack experimentally validated functional information. In this work we propose to tackle the problem of human protein function prediction by three distinct supervised learning schemes: one-versus-all classification; tournament learning; multi-label learning. Target values of supervised learning models are...
Article
Full-text available
After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with...