Sustaining the Data and Bioresource Commons
Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3EG, UK. Science
(Impact Factor: 33.61).
10/2010; 330(6004):592-3. DOI: 10.1126/science.1191506
Globalization of biomedical research requires sustained investment for databases and biorepositories.
Available from: Mariane S. Sousa-Baena
- "Documenting biological diversity requires open exchange of data and tools across disciplines and national borders (Mittermeier et al. 1997). Therefore, the traditional paradigm of sharing scientific data and results only through publications in books and specialized journals is not sufficient (Schofield et al. 2010). Lack of taxonomic knowledge is frequently pointed to as an impediment for biodiversity studies and for defining conservation plans (Wheeler et al. 2004b). "
[Show abstract] [Hide abstract]
ABSTRACT: Scientists from megadiverse countries, such as Brazil, face huge challenges in gathering and analyzing information about species richness and abundance. In Brazil, speciesLink is an e-infrastructure that offers free and open access to data from more than 300 biological and data collections. SpeciesLink’s thematic network, INCT-Virtual Herbarium of Plants and Fungi and the List of Species of the Brazilian Flora, are used as primary data sources to develop Lacunas, an information system with a public web interface that generates detailed reports of the status of plant species occurrence data. Lacunas also integrates information about endemism, conservation status, and collecting efforts over time. Here we describe the motivation and the functionality of this system, showing how it can be useful in detecting under-sampled plant species and geographic areas. We show examples of how knowledge can be extracted from biodiversity primary data using Lacunas. For instance, Lacunas report revealed that 111 angiosperm species (10.3 %), currently considered Data Deficient (DD) in the Official List of Threatened Brazilian Flora, have their distribution well characterized. In addition, the situation of Attalea funifera, a native palm classified as DD, was analyzed in detail, together with other use cases. Information presented in Lacunas reports can thus be used by scientists and policy-makers to help evaluate the status of species occurrence data and prioritize digitization and collecting efforts, as well as some features concerning its conservation status. As Lacunas offers a public online interface, it may also become a valuable tool for helping decision-making processes to become more dynamic and transparent.
Available from: Anne Thessen
- "It is estimated that 80% of scientific output comes from these small providers . Generally called " small science, " these data are rarely preserved  . Scientific publication, a narrative explanation derived from primary data, is often the only lasting record of this work. "
[Show abstract] [Hide abstract]
ABSTRACT: Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science.
A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.
Available from: Rama Balakrishnan
- "In this era of increased data generation coupled with decreased funding for curation efforts (5, 26), it is critical to develop innovative and efficient strategies for prioritizing annotations for review, in order to maintain the extremely high quality of literature-based annotation sets. We report here the first steps in establishing a procedure for leveraging computational predictions in order to improve literature-based GO annotation consistency and quality. "
[Show abstract] [Hide abstract]
ABSTRACT: Annotation using Gene Ontology (GO) terms is one of the most important ways in which biological information about specific gene products can be expressed in a searchable, computable form that may be compared across genomes and organisms. Because literature-based GO annotations are often used to propagate functional predictions between related proteins, their accuracy is critically important. We present a strategy that employs a comparison of literature-based annotations with computational predictions to identify and prioritize genes whose annotations need review. Using this method, we show that comparison of manually assigned ‘unknown’ annotations in the Saccharomyces Genome Database (SGD) with InterPro-based predictions can identify annotations that need to be updated. A survey of literature-based annotations and computational predictions made by the Gene Ontology Annotation (GOA) project at the European Bioinformatics Institute (EBI) across several other databases shows that this comparison strategy could be used to maintain and improve the quality of GO annotations for other organisms besides yeast. The survey also shows that although GOA-assigned predictions are the most comprehensive source of functional information for many genomes, a large proportion of genes in a variety of different organisms entirely lack these predictions but do have manual annotations. This underscores the critical need for manually performed, literature-based curation to provide functional information about genes that are outside the scope of widely used computational methods. Thus, the combination of manual and computational methods is essential to provide the most accurate and complete functional annotation of a genome.
Database URL: http://www.yeastgenome.org
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.