Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands. .
Journal of Cheminformatics (Impact Factor: 3.59). 01/2010; 2(1):4. DOI: 10.1186/1758-2946-2-4
Source: PubMed

ABSTRACT :Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Advances done in "-Omics" technologies in the last 20 years have made available to the researches huge amounts of data spanning a wide variety of biological processes from gene sequences to the metabolites present in a cell at a particular time. The management, analysis and representation of these data have been facilitated by mean of the advances made by biomedical informatics in areas such as data architecture and integration systems. However, despite the efforts done by biologists in this area, research in drug design adds a new level of information by incorporating data related with small molecules, which increases the complexity of these integration systems. Current knowledge in molecular biology has shown that it is possible to use comprehensive and integrative approaches to understand the biological processes from a systems perspective and thatnbsp; pathological processes can be mapped into biological networks. Therefore, current strategies for drug design are focusing on how to interact with or modify those networks to achieve the desired effects on what is called systems chemical biology. In this review several approaches for data integration in systems chemical biology will be analysed and described. Furthermore, because of the increasing relevance of the development and use of nanomaterials and their expected impact in the near future, the requirements of integration systems that incorporate these new data types associated with nanomaterials will also be analysed.
    Current topics in medicinal chemistry 03/2013; · 4.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas. We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
    Drug discovery today 11/2013; · 6.63 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
    Journal of biomedical semantics. 03/2014; 5(1):15.

Full-text (4 Sources)

Available from
May 22, 2014