About
367
Publications
106,396
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
91,857
Citations
Introduction
Current institution
Publications
Publications (367)
With the evermore emphasis put on open science and its invaluable benefits to the scientific community, it is no longer the case where a research project simply ends with a scientific publication. The benefits of data sharing and reproducibility of results have taken the centerpiece within the life science research supported by FAIR principles that...
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing reso...
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisa...
The General Data Protection Regulation (GDPR) became binding law in the European Union Member States in 2018, as a step toward harmonizing personal data protection legislation in the European Union. The Regulation governs almost all types of personal data processing, hence, also, those pertaining to biomedical research. The purpose of this article...
The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effect...
Data resources at the European Bioinformatics Institute (EMBL-EBI, https://www.ebi.ac.uk/) archive, organize and provide added-value analysis of research data produced around the world. This year's update for EMBL-EBI focuses on data exchanges among resources, both within the institute and with a wider global infrastructure. Within EMBL-EBI, data r...
Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals....
The European Bioinformatics Institute (https://www.ebi.ac.uk/) archives, curates and analyses life sciences data produced by researchers throughout the world, and makes these data available for re-use globally (https://www.ebi.ac.uk/). Data volumes continue to grow exponentially: total raw storage capacity now exceeds 160 petabytes, and we manage t...
The European Bioinformatics Institute (EMBL-EBI) supports life-science research throughout the world by providing open data, open-source software and analytical tools, and technical infrastructure (https://www.ebi.ac.uk). We accommodate an increasingly diverse range of data types and integrate them, so that biologists in all disciplines can explore...
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are availab...
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are availab...
Integr8 (http://www.ebi.ac.uk/integr8/) has been developed to provide an integration layer for the exploitation of genomic and proteomic data. High-quality databases from major bioinformatics centres in Europe are included, and some core data and the relationships of biological entities to each other and to entries in other databases are stored. Th...
New technologies are revolutionising biological research and its applications by making it easier and cheaper to generate
ever-greater volumes and types of data. In response, the services and infrastructure of the European Bioinformatics Institute
(EMBL-EBI, www.ebi.ac.uk) are continually expanding: total disk capacity increases significantly every...
The Cardiovascular Gene Annotation Initiative is focused on the submission of functional gene annotation data to major public biological databases. We are creating Gene Ontology annotations, as well as capturing protein-protein interaction data.
Gene Ontology (GO) is a key resource for researchers wishing to understand the biological role of a gen...
UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences
during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters.
An increasing fraction of new sequences are identical to a sequence that already exists i...
Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and qua...
There is a growing trend toward public dissemination of proteomics data, which is facilitating the assessment, reuse, comparative analyses and extraction of new findings from published data1, 2. This process has been mainly driven by journal publication guidelines and funding agencies. However, there is a need for better integration of public repos...
Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological
data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the
Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing...
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequences
and functional annotation. It integrates, interprets and standardizes data from literature and numerous resources to achieve
the most comprehen...
Mitochondria are a common energy source for organs and organisms; their diverse functions are specialized according to the unique phenotypes of their hosting environment. Perturbation of mitochondrial homeostasis accompanies significant pathological phenotypes. However, the connections between mitochondrial proteome properties and function remain t...
Rationale:
Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts.
Obj...
Transcriptional control ensures genes are expressed in the right amounts at the correct times and locations. Understanding quantitatively how regulatory systems convert input signals to appropriate outputs remains a challenge. For the first time, we successfully model even skipped (eve) stripes 2 and 3+7 across the entire fly embryo at cellular res...
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available...
Supplementary file containing the following. Figure S1: The structural hierarchy of lipid identifications measurable by mass spectrometry. Figure S2: Database schema of the LipidHome database. Figure S3: The “Category” -> “Main Class” -> “Sub Class” hierarchy of lipids currently stored in the LipidHome database. Figure S4: Diagram that shows how li...
The Gene Ontology (GO) is the de facto standard for the functional description of gene products, providing a consistent, information-rich terminology applicable across species and information repositories. The UniProt Consortium uses both manual and automatic GO annotation approaches to curate UniProt Knowledgebase (UniProtKB) entries. The selectio...
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled
vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity
of GO annotations. First, the...
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies.
The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including
focused literature-based annotation...
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is h...
The community working on model organisms is growing steadily and the number of model organisms for which proteome data are being generated is continuously increasing. To standardize efforts and to make optimal use of proteomics data acquired from model organisms, a new Human Proteome Organisation (HUPO) initiative on model organism proteomes (iMOP)...
Biological process GO-Elite MAPPFinder Results.
(DOC)
We constructed the Cardiac Organellar Peptide Atlas Library (COPa library) as a targeted and interactive resource to the cardiovascular community. Annotated peptide spectra are hosted using a relational database in a modular fashion based on species (e.g. human, mouse) and organelles (e.g. mitochondria, proteasome). Within this release of COPa libr...
We developed a peptide spectral library search engine for the cardiovascular community. Over 50,000,000 spectra obtained with LTQ-Orbitrap instrument on cardiac mitochondria and proteasome were analyzed, and 108,268 representative spectra were included in this organelle-based library. An improved dot product algorithm, slide dot product, was coded...
The Gene Ontology (GO) resource provides dynamic controlled vocabularies to provide an information-rich resource to aid in the consistent description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). System-focused projects, such as the Renal and Cardiovascular GO Annotation In...
The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins.
Currently supplying over 100 million annotations to 11 million proteins in more than 360 000 taxa, this resource has increased
2-fol...
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely
available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures,
against which protein sequences can be searched to de...
The International Protein Index (IPI) database has been one of the most widely used protein databases in MS proteomics approaches. Recently, the closure of IPI in September 2011 was announced. Its recommended replacement is the new UniProt Knowledgebase (UniProtKB) "complete proteome" sets, launched in May 2011. Here, we analyze the consequences of...
Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in p...
Mitochondrial functions are dynamically regulated in the heart. In particular, protein phosphorylation has been shown to be a key mechanism modulating mitochondrial function in diverse cardiovascular phenotypes. However, site-specific phosphorylation information remains scarce for this organ. Accordingly, we performed a comprehensive characterizati...
The Gene Ontology (GO) is a controlled vocabulary that represents knowledge about the functional attributes of gene products
in a structured manner and can be used in both computational and human analyses. This vocabulary has been used by diverse
curation groups to associate functional information to individual gene products in the form of annotati...
The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniP...
Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusion...
One of the essential requirements of the proteomics community is a high quality annotated nonredundant protein sequence database with stable identifiers and an archival service to enable protein identification and characterization. The scope of this chapter is to illustrate how Universal Protein Resource (UniProt) (The UniProt Consortium, Nucleic A...
Putting data into the public domain is not the same thing as making those data accessible for intelligent analysis. A distinguished group of editors and experts who were already engaged in one way or another with the issues inherent in making research data public came together with statisticians to initiate a dialogue about policies and practicalit...
The rate at which data is acquired frequently outstrips the capacity of the human mind to house it. Instead, we mine it. The ability to electronically cull the majority of mankind's knowledge of the functioning of a particular biomolecule at the push of a button would be an acutely effective, efficient research tool. Consider the benefits of crossi...
The "4D Biology Workshop for Health and Disease", held on 16-17th of March 2010 in Brussels, aimed at finding the best organising principles for large-scale proteomics, interactomics and structural genomics/biology initiatives, and setting the vision for future high-throughput research and large-scale data gathering in biological and medical scienc...
Clinical proteomics has yielded some early positive results-the identification of potential disease biomarkers-indicating the promise for this analytical approach to improve the current state of the art in clinical practice. However, the inability to verify some candidate molecules in subsequent studies has led to skepticism among many clinicians a...
Background / Purpose:
The UniProt Knowledgebase (UniProtKB) is the central access point for extensively curated protein information. UniProtKB is a protein-centric, non-redundant database aiming to provide everything that is known about a protein. UniProtKB provides an integrated and uniform presentation of disparate data, including annotations s...
IntroductionSources of data in bioinformatics knowledge basesDesign of knowledge basesImplementation of knowledge basesUpdating of knowledge basesConclusions
References
Recent studies on cardiovascular progenitors have led to a new appreciation that paracrine factors may support the regeneration of damaged tissues.
We used a shotgun proteomics strategy to compare the secretome of peripheral blood-derived smooth muscle progenitors (SPCs) with human aortic smooth muscle cells. The late-outgrowth SPCs produced fewer...
The gene ontology (go) resource provides dynamic controlled vocabularies to aid in the description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). A renal-focused curation initiative, funded by Kidney Research UK and supported by the GO Consortium, has started at the European...
Unique substrings in genomes may indicate high level ofspecificity which is crucial and fundamental to many geneticsstudies, such as PCR, microarray hybridization, Southernand Northern blotting, RNA interference (RNAi), andgenome (re)sequencing. However, being unique sequencein the genome alone is not adequate to guaranty high specificity.For examp...
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists...
The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge, which can lead to new insights and hypotheses that are biologically meaningful and advance the understanding of protein functions by biologists...
Protein variants, which vary in their exact chemical composition, are termed protein species independently if they are coded by one gene or by a paralogous or orhtologous gene or alleles of that gene. The term protein species covers splicing variants, truncated proteins and posttranslational modified proteins and is defined chemically in contrast t...
The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with...
Recent developments in proteomics technology offer new opportunities for clinical applications in hospital or specialized laboratories including the identification of novel biomarkers, monitoring of disease, detecting adverse effects of drugs, and environmental hazards. Advanced spectrometry technologies and the development of new protein array for...
Our knowledge of proteins has greatly improved in recent years, driven by new technologies in the fields of molecular biology and proteome research. It has become clear that from a single gene not only one single gene product but many different ones - termed protein species - are generated, all of which may be associated with different functions. N...
Rapid release of prepublication data has served the field of genomics well. Attendees at a workshop in Toronto recommend extending the practice to other biological data sets.
QuickGO is a web-based tool that allows easy browsing of the Gene Ontology (GO) and all associated electronic and manual GO annotations provided by the GO Consortium annotation groups QuickGO has been a popular GO browser for many years, but after a recent redevelopment it is now able to offer a greater range of facilities including bulk downloads...
The Gene Ontology (GO) has proven to be a valuable resource for functional annotation of gene products. At well over 27 000 terms, the descriptiveness of GO has increased rapidly in line with the biological data it represents. Therefore, it is vital to be able to easily and quickly mine the functional information that has been made available throug...
Much of modern biological research can be organised under unifying concepts such as ‘Network Biology’ or ‘Systems Biology’. These provide frameworks for discussion and evaluation, which is particularly necessary given the large number of interconnected components being measured in the genomic era. Conversely, they embody simplifications and assumpt...
Annotations of enzyme function provide critical starting points for generating and testing biological hypotheses, but the quality of functional annotations is hindered by uncertain assignments for uncharacterized sequences and by the relative sparseness of validated experimental data. Given the relentless increase in genomic data, new thinking and...
In proteomics, rapid developments in instrumentation led to the acquisition of increasingly large data sets. Correspondingly, ProDaC was founded in 2006 as a Coordination Action project within the 6th European Union Framework Programme to support data sharing and community-wide data collection. The objectives of ProDaC were the development of docum...
There is a strong demand in the genomic community to develop effective algorithms to reliably identify genomic variants. Indel detection using next-gen data is difficult and identification of long structural variations is extremely challenging.
We present Pindel, a pattern growth approach, to detect breakpoints of large deletions and medium-sized i...
The Proteomics Data Collection (ProDaC) consortium, a "Coordination Action" funded by the 6th EU Framework Programme, started in October 2006. Its aim was to facilitate the collection and distribution of proteomics data and the public availability of data sets from proteomics experiments. Within the consortium standard formats are created and tools...
Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the Nat...
The Gene Ontology (GO) is a well-established, structured vocabulary that has been successfully used for 10 years in the annotation of proteins. GO terms, created in consultation with the biology community, are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently GO consists of more than...
Gene Ontology (GO) provides a controlled vocabulary, which is used by several groups around the world to provide functional annotation to proteins across a wide range of species ("www.geneontology.org":http://www.geneontology.org). The BHF-UCL team is funded by the British Heath Foundation to supply GO annotation specifically for human proteins inv...
The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extensi...
The field of clinical proteomics offers opportunities to identify new disease biomarkers in body fluids, cells and tissues. These biomarkers can be used in clinical applications for diagnosis, stratification of patients for specific treatment, or therapy monitoring. New protein array formats and improved spectrometry technologies have brought these...
The capacity of proteomics methods and mass spectrometry instrumentation to generate data has grown substantially over the past years. This data volume growth has in turn led to an increased reliance on software to identify peptide or protein sequences from the recorded mass spectra. Diverse algorithms can be applied for the processing of these dat...
ProDaC (Proteomics Data Collection), a "Coordination Action" within the 6(th) EU framework programme, was created to support the collection, distribution and public availability of data from proteomics experiments. Within the consortium standards are created and maintained enabling an extensive data collection within the proteomics community. Impor...
Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusion...
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information that is essential for modern biological research. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute, the Protei...
Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusion...
The proteome of human salivary fluid has the potential to open new doors for disease biomarker discovery. A recent study to comprehensively identify and catalog the human ductal salivary proteome led to the compilation of 1166 proteins. The protein complexity of both saliva and plasma is large, suggesting that a comparison of these two proteomes wi...
Until recently, biologists have concentrated on studying specific pathways or individual molecules as an approach to unravelling the intricate details of cellular events. However recent advances in high-throughput proteomic methodologies have made it possible to profile the global compositions of entire tissues, organelles or interactomes at specif...
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple,
diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration
is performed manually and approximately...
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase
(UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an
extensive, publicly a...
The “Coordination Action” ProDaC (Proteomics Data Collection) – funded by the EU within the 6th framework programme – was created to support the dissemination, utilization and publication of proteomics data. Within this international consortium, standards are developed and maintained to support extensive data collection by the proteomics community....
Assignment of function to protein sequence is a task of growing importance in the life sciences, as new high-throughput sequencing DNA technologies generate ever increasing quantities of genomic and meta-genomic data. Patterns within the sequence space, caused by the evolutionary conservation and assembly of protein domains, make possible the infer...
In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publica...
Chris F Taylor, Dawn Field, Susanna-Assunta Sansone, Jan Aerts, Rolf Apweiler, Michael Ashburner, Pierre-Alain Binz, Molly Bogue, Tim Booth, Alvis Brazma, Ryan R Brinkman, Catherine A Ball, Eric W Deutsch, Oliver Fiehn, Jennifer Fostel, Peter Ghazal, Frank Gibson, Adam Michael Clark, Graeme Grimes, John M Hancock, Nigel W Hardy, Henning Hermjakob,...
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) provides interactive and programmatic interfaces to query, browse and navigate an ever increasing number of biomedical ontologies
and controlled vocabularies. The volume of data available for querying has more than quadrupled since it went into production
and OLS functionality has been in...
In proteomics a paradox situation developed in the last years. At one side it is basic knowledge that proteins are post-translationally modified and occur in different isoforms. At the other side the protein expression concept disclaims post-translational modifications by connecting protein names directly with function.
Optimal proteome coverage is...
Myocardial ischemia induces mitochondrial dysfunction and may lead to cardiac cell death. However, our ability to understand mitochondrial dysfunction in ischemia has been hindered by an absence of molecular markers defining the various degrees of injury. We sought to characterize the impact of ischemic damage on mitochondrial proteome biology usin...
Programmatic access to the UniProt Knowledgebase (UniProtKB) is essential for many bioinformatics applications dealing with protein data. We have created a Java library named UniProtJAPI, which facilitates the integration of UniProt data into Java-based software applications. The library supports queries and similarity searches that return UniProtK...
Gene Ontology (GO) vocabularies are an established standard for linking functional information to genes and gene products (www.geneontology.org/). A recent collaboration between University College London and the European Bioinformatics Institute is providing GO annotation to human cardiovascular-associated genes (http://www.ucl.ac.uk/medicine/cardi...