Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC)

Massachusetts Veterans Epidemiology Research and Information Center Cooperative Studies Coordinating Center, VA Boston Healthcare System, Jamaica Plain, Massachusetts 02130, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.93). 07/2010; 17(4):375-82. DOI: 10.1136/jamia.2009.001412
Source: PubMed

ABSTRACT Reducing custom software development effort is an important goal in information retrieval (IR). This study evaluated a generalizable approach involving with no custom software or rules development. The study used documents "consistent with cancer" to evaluate system performance in the domains of colorectal (CRC), prostate (PC), and lung (LC) cancer. Using an end-user-supplied reference set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms. Training and testing involved 10-fold cross-validation for three sets of 500 documents each. Performance metrics included recall, precision, and F-measure. Annotation time for five physicians was also measured. Top performing algorithms had recall, precision, and F-measure values as follows: for CRC, 0.90, 0.92, and 0.89, respectively; for PC, 0.97, 0.95, and 0.94; and for LC, 0.76, 0.80, and 0.75. In all but one case, conditional random fields outperformed maximum entropy-based classifiers. Algorithms had good performance without custom code or rules development, but performance varied by specific application.

Download full-text


Available from: Wildon R Farwell, Jan 02, 2014
  • Source
    • "The next step of the analysis was performance of a supervised classification of retrieved, filtered affirmative 400-character document snippets as true or false positive using the Automated Retrieval Console developed by D'Avolio [12] [13]. This classifier utilizes the Mayo Ctakes (2) toolset for linguistic feature extraction, the UIMA pipeline architecture and the MALLET conditional random fields classifier. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To fulfill the promise of electronic health records to support the study of disease in populations, efficient techniques are required to search large clinical corpora. The authors describe a hybrid system that combines a search engine and a natural language feature extraction and classification system to estimate the annual incidence of suicide attempts and demonstrate an association of adverse childhood experiences with suicide attempt risk in a cohort of 250,000 patients. The methodology replicated a previous finding that a positive association between suicide attempt incidence and a history of childhood abuse, neglect or family dysfunction exists, and that the association is stronger when multiple adverse events are reported.
    Proceedings of the 2014 47th Hawaii International Conference on System Sciences; 01/2014
  • Source
    • "Utilizing controlled terminology could have further increased recall, as previously demonstrated in retrieving liver cysts using iSCOUT [24]. Finally, supervised classification algorithms, previously implemented for information retrieval, were not available in either application [31]. Incorporating these algorithms into information retrieval applications could further enhance precision and recall of these tools. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. THIS PAPER: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications - an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) - to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application's performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks.
    The Open Medical Informatics Journal 08/2012; 6:28-35. DOI:10.2174/1874431101206010028
  • Source
    • "There was a special section focused on CRI papers in the December 2011 supplement issue. Much of the increase can be attributed to publications from awardees of the CTSA, since publication rate is related to funding.38 JAMIA publications acknowledging CTSA funding rose from three in 200939–41 to four in 201014 42–44 and 15 in 2011.15 17 19 36 45–55 Some of the articles were not exclusively focused on CRI, but were directly related, covering many different topics that are highly relevant to CRI: data models and terminologies,27 56–68 natural language processing (NLP),16 50 61 69–99 surveillance systems,48 65 80 100–110 and privacy technology and policy.33 111–117 This 2012 CRI supplement adds 18 new publications to this growing field. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical research informatics is the rapidly evolving sub-discipline within biomedical informatics that focuses on developing new informatics theories, tools, and solutions to accelerate the full translational continuum: basic research to clinical trials (T1), clinical trials to academic health center practice (T2), diffusion and implementation to community practice (T3), and 'real world' outcomes (T4). We present a conceptual model based on an informatics-enabled clinical research workflow, integration across heterogeneous data sources, and core informatics tools and platforms. We use this conceptual model to highlight 18 new articles in the JAMIA special issue on clinical research informatics.
    Journal of the American Medical Informatics Association 04/2012; 19(e1):e36-e42. DOI:10.1136/amiajnl-2012-000968 · 3.93 Impact Factor
Show more