The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan. .
Journal of Biomedical Semantics (Impact Factor: 2.26). 08/2010; 1(1):8. DOI: 10.1186/2041-1480-1-8
Source: PubMed


Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.

Download full-text


Available from: Bruno Aranda,
39 Reads
  • Source
    • "The Biodiversity Data Enrichment Hackathon followed a use case-driven model, i.e. a model where effort before and during the hackathon was prioritised on the basis of compelling end user scenarios that could be enabled by the combined contributions of people that otherwise, outside of the hackathon, do not collaborate. This is an often-followed model for such events: past examples of this include the DBCLS BioHackathons (Katayama et al. 2010, Katayama et al. 2011, Katayama et al. 2013) and the NESCent phyloinformatics hackathons (Lapp et al. 2007, Stoltzfus et al. 2013). The approach is appropriate in cases where participants operate in overlapping domains but do not necessarily collaborate on the same projects or are familiar with the same code bases. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.
    Biodiversity Data Journal 06/2014; 2(2). DOI:10.3897/BDJ.2.e1125
  • Source
    • "It is used to describe complete experimental workflows and it relies heavily on inheritance and ontologies. Solutions for integrating resources and solving databases interoperability often consist in combining web services [15-18] with data warehousing systems and federated databases, using sophisticated tools like BioMart, MOLGENIS and Taverna Workbench [19]. BioMOBY is an open source ontology-based integration system for accessing distributed and heterogeneous data sources via web services [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
    BMC Genomics 05/2014; 15(Suppl 3):S3. DOI:10.1186/1471-2164-15-S3-S3 · 3.99 Impact Factor
  • Source
    • "Nevertheless, certain choices must be made, in a harmonized manner, to maximize interoperability. The yearly BioHackathon series [1-3] of events attempts to provide the environment within which these choices can be explored, evaluated, and then implemented on a collaborative and community-guided basis. These BioHackathons were hosted by the National Bioscience Database Center (NBDC) [4] and the Database Center for Life Science (DBCLS) [5] as a part of the Integrated Database Project to integrate life science databases in Japan. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
    Journal of Biomedical Semantics 02/2014; 5(1):5. DOI:10.1186/2041-1480-5-5 · 2.26 Impact Factor
Show more