Calling on a million minds for community annotation in WikiProteins

Erasmus Medical Centre, Department of Medical Informatics, Rotterdam, the Netherlands.
Genome biology (Impact Factor: 10.81). 02/2008; 9(5):R89. DOI: 10.1186/gb-2008-9-5-r89
Source: PubMed


WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at

Download full-text


Available from: Roberto C. S. Pacheco, Oct 13, 2015
1 Follower
79 Reads
  • Source
    • "So, in science, the ‘long tail’ is about collecting and connecting these missing links. In recent years, wiki-based annotation platforms, such as WikiProteins (17), WikiPathways (18) and WikiGenes (19), have enjoyed broad community participation. WikiProteins is a semantic web-based (20–22) portal modeled on wiki pages with connected knowlets of >1 million biomedical concepts. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A large repertoire of gene-centric data has been generated in the field of zebrafish biology. Although the bulk of these data are available in the public domain, most of them are not readily accessible or available in nonstandard formats. One major challenge is to unify and integrate these widely scattered data sources. We tested the hypothesis that active community participation could be a viable option to address this challenge. We present here our approach to create standards for assimilation and sharing of information and a system of open standards for database intercommunication. We have attempted to address this challenge by creating a community-centric solution for zebrafish gene annotation. The Zebrafish GenomeWiki is a ‘wiki’-based resource, which aims to provide an altruistic shared environment for collective annotation of the zebrafish genes. The Zebrafish GenomeWiki has features that enable users to comment, annotate, edit and rate this gene-centric information. The credits for contributions can be tracked through a transparent microattribution system. In contrast to other wikis, the Zebrafish GenomeWiki is a ‘structured wiki’ or rather a ‘semantic wiki’. The Zebrafish GenomeWiki implements a semantically linked data structure, which in the future would be amenable to semantic search.Database URL:
    Database The Journal of Biological Databases and Curation 02/2014; 2014:bau011. DOI:10.1093/database/bau011 · 3.37 Impact Factor
  • Source
    • "As in other fields, community efforts, such as data annotation and curation, are progressively enabled by the growing support of social information and communication technologies. The technical environments that are available for community annotation, data publishing and integration play an increasingly important role in the life sciences [31-34]. Yet, some factors are still limiting the possible valuable contributions arising from social efforts. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
    BMC Bioinformatics 01/2014; 15 Suppl 1(Suppl 1):S2. DOI:10.1186/1471-2105-15-S1-S2 · 2.58 Impact Factor
  • Source
    • "But it was shown that curators working in small, focused groups (like institutes) don’t have the capacity to keep up with the enormous growth of new findings [3]. This calls for a crowdsourced setup: a large, distributed community of scientists that collectively curates on a part-time, volunteer basis [4-7]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Ideally each Life Science article should get a ‘structured digital abstract’. This is a structured summary of the paper’s findings that is both human-verified and machine-readable. But articles can contain a large variety of information types and contextual details that all need to be reconciled with appropriate names, terms and identifiers, which poses a challenge to any curator. Current approaches mostly use tagging or limited entry-forms for semantic encoding. Findings We implemented a ‘controlled language’ as a more expressive representation method. We studied how usable this format was for wet-lab-biologists that volunteered as curators. We assessed some issues that arise with the usability of ontologies and other controlled vocabularies, for the encoding of structured information by ‘untrained’ curators. We take a user-oriented viewpoint, and make recommendations that may prove useful for creating a better curation environment: one that can engage a large community of volunteer curators. Conclusions Entering information in a biocuration environment could improve in expressiveness and user-friendliness, if curators would be enabled to use synonymous and polysemous terms literally, whereby each term stays linked to an identifier.
    BMC Research Notes 10/2012; 5(1):601. DOI:10.1186/1756-0500-5-601
Show more