Article

caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research.

Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15232, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.57). 05/2010; 17(3):253-64. DOI: 10.1136/jamia.2009.002295
Source: PubMed

ABSTRACT The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.

0 Bookmarks
 · 
67 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. Methods A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. Results A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.
    International Journal of Medical Informatics 09/2014; · 2.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epilepsy is a common serious neurological disorder with a complex set of possible phenotypes ranging from pathologic abnormalities to variations in electroencephalogram. This paper presents a system called Phenotype Exaction in Epilepsy (PEEP) for extracting complex epilepsy phenotypes and their correlated anatomical locations from clinical discharge summaries, a primary data source for this purpose. PEEP generates candidate phenotype and anatomical location pairs by embedding a named entity recognition method, based on the Epilepsy and Seizure Ontology, into the National Library of Medicine's MetaMap program. Such candidate pairs are further processed using a correlation algorithm. The derived phenotypes and correlated locations have been used for cohort identification with an integrated ontology-driven visual query interface. To evaluate the performance of PEEP, 400 de-identified discharge summaries were used for development and an additional 262 were used as test data. PEEP achieved a micro-averaged precision of 0.924, recall of 0.931, and F1-measure of 0.927 for extracting epilepsy phenotypes. The performance on the extraction of correlated phenotypes and anatomical locations shows a micro-averaged F1-measure of 0.856 (Precision: 0.852, Recall: 0.859). The evaluation demonstrates that PEEP is an effective approach to extracting complex epilepsy phenotypes for cohort identification.
    Journal of Biomedical Informatics 06/2014; · 2.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To facilitate the process of extracting information from narrative medical reports and transforming extracted data into standardized structured forms, we present an interactive, incrementally learning based information extraction system - ASLForm. ASLForm provides users a convenient interface that can be used as a simple data extraction and data entry system. It is unique, however, in its ability to transparently analyze and quickly learn, from users' interactions with a small number of reports, the desired values for the data fields. Additional user feedback (through acceptance decision or edits on the generated values) can incrementally refine the decision model in real-time, which further reduces users' interaction effort thereafter. The system eventually achieves high accuracy on data extraction with minimal effort from users. ASLForm requires no special configuration or training sets, and is not constrained to specific domains, thus it is easy to use and highly portable. Our experiments demonstrate the effectiveness of the system.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2013; 2013:1590-9.

Full-text (2 Sources)

Download
12 Downloads
Available from
Jun 1, 2014