Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.
ABSTRACT :Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.
- [Show abstract] [Hide abstract]
ABSTRACT: Advances done in "-Omics" technologies in the last 20 years have made available to the researches huge amounts of data spanning a wide variety of biological processes from gene sequences to the metabolites present in a cell at a particular time. The management, analysis and representation of these data have been facilitated by mean of the advances made by biomedical informatics in areas such as data architecture and integration systems. However, despite the efforts done by biologists in this area, research in drug design adds a new level of information by incorporating data related with small molecules, which increases the complexity of these integration systems. Current knowledge in molecular biology has shown that it is possible to use comprehensive and integrative approaches to understand the biological processes from a systems perspective and thatnbsp; pathological processes can be mapped into biological networks. Therefore, current strategies for drug design are focusing on how to interact with or modify those networks to achieve the desired effects on what is called systems chemical biology. In this review several approaches for data integration in systems chemical biology will be analysed and described. Furthermore, because of the increasing relevance of the development and use of nanomaterials and their expected impact in the near future, the requirements of integration systems that incorporate these new data types associated with nanomaterials will also be analysed.Current topics in medicinal chemistry 03/2013; · 4.47 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas. We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.Drug discovery today 11/2013; · 6.63 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.Journal of biomedical semantics. 03/2014; 5(1):15.
Hettne et al. Journal of Cheminformatics 2010, 2:4
Automatic vs. manual curation of a multi-source
chemical dictionary: the impact on text mining
© 2010 Hettne et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Kristina M Hettne*1,2, Antony J Williams3, Erik M van Mulligen1, Jos Kleinjans2, Valery Tkachenko3 and Jan A Kors1
In 'Automatic vs. manual curation of a multi-source
chemical dictionary: the impact on text mining' (Hettne
et al. Journal of Cheminformatics 2010, 2:3) , the name
of the automatically curated dictionary is identified as
'Chemlist'. CHEMLIST is a trademark that the American
Chemical Society has used for many years to identify its
Regulated Chemicals Listing (CAS) database. To avoid
future confusion, the 'Chemlist' dictionary mentioned in
this article has been renamed to 'Jochem.'
1Department of Medical Informatics, Erasmus University Medical Center,
Rotterdam, The Netherlands, 2Department of Health Risk Analysis and
Toxicology, Maastricht University, Maastricht, The Netherlands and 3Royal
Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC-27587, USA
1.Hettne KM, Williams AJ, van Mulligen EM, Kleinjans J, Tkachenko V, Kors
JA: Automatic vs. manual curation of a multi-source chemical
dictionary: the impact on text mining. J Cheminform 2010, 2:3.
Cite this article as: Hettne et al., Automatic vs. manual curation of a multi-
source chemical dictionary: the impact on text mining Journal of Cheminfor-
matics 2010, 2:4
Received: 1 June 2010 Accepted: 3 June 2010
Published: 3 June 2010
This article is available from: http://www.jcheminf.com/content/2/1/4© 2010 Hettne et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Journal of Cheminformatics 2010, 2:4
* Correspondence: email@example.com
1 Department of Medical Informatics, Erasmus University Medical Center,
Rotterdam, The Netherlands
Full list of author information is available at the end of the article
Open access provides opportunities to our
colleagues in other parts of the globe, by allowing
anyone to view the content free of charge.
W. Jeffery Hurst, The Hershey Company.
Publish with ChemistryCentral and every
scientist can read your work free of charge
available free of charge to the entire scientific community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: