Database The Journal of Biological Databases and Curation

Publisher: Oxford Journals (Firm), Oxford University Press

Description

  • Impact factor
    4.20
  • 5-year impact
    4.19
  • Cited half-life
    2.20
  • Immediacy index
    0.73
  • Eigenfactor
    0.00
  • Article influence
    1.80
  • Other titles
    Journal of biological databases and curation
  • ISSN
    1758-0463
  • OCLC
    319891682
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 months embargo on science, technology, medicine articles
    • 2 years embargo on arts and humanities articles
    • Some titles may have different embargoes
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on author's personal website, employer website, free public server or pre-prints in subject area
    • Post-print in Institutional repositories or Central repositories
    • Publisher version cannot be used except for Nucleic Acids Research articles
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Articles in some journals can be made Open Access on payment of additional charge
    • Eligible UK authors may deposit in OpenDepot
    • Publisher will deposit on behalf of NIH funded authors to PubMed Central, Nucleic Acids Research authors must pay their fee first
    • Some titles may use different policies
  • Classification
    ​ yellow

Publications in this journal

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk.
    Database The Journal of Biological Databases and Curation 01/2014; 2014.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Often papers are published where the underlying data supporting the research are not made available because of the limitations of making such large data sets publicly and permanently accessible. Even if the raw data are deposited in public archives, the essential analysis intermediaries, scripts or software are frequently not made available, meaning the science is not reproducible. The GigaScience journal is attempting to address this issue with the associated data storage and dissemination portal, the GigaScience database (GigaDB). Here we present the current version of GigaDB and reveal plans for the next generation of improvements. However, most importantly, we are soliciting responses from you, the users, to ensure that future developments are focused on the data storage and dissemination issues that still need resolving. Database URL: http://www.gigadb.org.
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau018.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BCL2DB (http://bcl2db.ibcp.fr) is a database designed to integrate data on BCL-2 family members and BH3-only proteins. These proteins control the mitochondrial apoptotic pathway and probably many other cellular processes as well. This large protein group is formed by a family of pro-apoptotic and anti-apoptotic homologs that have phylogenetic relationships with BCL-2, and by a collection of evolutionarily and structurally unrelated proteins characterized by the presence of a region of local sequence similarity with BCL-2, termed the BH3 motif. BCL2DB is monthly built, thanks to an automated procedure relying on a set of homemade profile HMMs computed from seed reference sequences representative of the various BCL-2 homologs and BH3-only proteins. The BCL2DB entries integrate data from the Ensembl, Ensembl Genomes, European Nucleotide Archive and Protein Data Bank databases and are enriched with specific information like protein classification into orthology groups and distribution of BH motifs along the sequences. The Web interface allows for easy browsing of the site and fast access to data, as well as sequence analysis with generic and specific tools. BCL2DB provides a helpful and powerful tool to both 'BCL-2-ologists' and researchers working in the various fields of physiopathology. Database URL: http://bcl2db.ibcp.fr.
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau013.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Whereas N-glycosylation is a posttranslational modification performed across evolution, the archaeal version of this protein-processing event presents a degree of diversity not seen in either bacteria or eukarya. Accordingly, archaeal N-glycosylation relies on a large number of enzymes that are often species-specific or restricted to a select group of species. As such, there is a need for an organized platform upon which amassing information about archaeal glycosylation (agl) genes can rest. Accordingly, the aglgenes database provides detailed descriptions of experimentally characterized archaeal N-glycosyation pathway components. For each agl gene, genomic information, supporting literature and relevant external links are provided at a functional intuitive web-interface designed for data browsing. Routine updates ensure that novel experimental information on genes and proteins contributing to archaeal N-glycosylation is incorporated into aglgenes in a timely manner. As such, aglgenes represents a specialized resource for sharing validated experimental information online, providing support for workers in the field of archaeal protein glycosylation. Database URL: www.bgu.ac.il/aglgenes.
    Database The Journal of Biological Databases and Curation 01/2014; 2014.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/
    Database The Journal of Biological Databases and Curation 01/2014; 2014.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases. Database URL: http://www.mgc.ac.cn/DBatVir/
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau021.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu.
    Database The Journal of Biological Databases and Curation 01/2014; 2014.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list.
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau017.