A practical method for transforming free-text eligibility criteria into computable criteria

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA.
Journal of Biomedical Informatics (Impact Factor: 2.48). 04/2011; 44(2):239-50. DOI: 10.1016/j.jbi.2010.09.007
Source: PubMed

ABSTRACT Formalizing eligibility criteria in a computer-interpretable language would facilitate eligibility determination for study subjects and the identification of studies on similar patient populations. Because such formalization is extremely labor intensive, we transform the problem from one of fully capturing the semantics of criteria directly in a formal expression language to one of annotating free-text criteria in a format called ERGO annotation. The annotation can be done manually, or it can be partially automated using natural-language processing techniques. We evaluated our approach in three ways. First, we assessed the extent to which ERGO annotations capture the semantics of 1000 eligibility criteria randomly drawn from Second, we demonstrated the practicality of the annotation process in a feasibility study. Finally, we demonstrate the computability of ERGO annotation by using it to (1) structure a library of eligibility criteria, (2) search for studies enrolling specified study populations, and (3) screen patients for potential eligibility for a study. We therefore demonstrate a new and practical method for incrementally capturing the semantics of free-text eligibility criteria into computable form.


Available from: Mor Peleg, Apr 28, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancer is responsible for approximately 7.6 million deaths per year worldwide. A 2012 survey in the United Kingdom found dramatic improvement in survival rates for childhood cancer because of increased participation in clinical trials. Unfortunately, overall patient participation in cancer clinical studies is low. A key logistical barrier to patient and physician participation is the time required for identification of appropriate clinical trials for individual patients. We introduce the Trial Prospector tool that supports end-to-end management of cancer clinical trial recruitment workflow with (a) structured entry of trial eligibility criteria, (b) automated extraction of patient data from multiple sources, (c) a scalable matching algorithm, and (d) interactive user interface (UI) for physicians with both matching results and a detailed explanation of causes for ineligibility of available trials. We report the results from deployment of Trial Prospector at the National Cancer Institute (NCI)-designated Case Comprehensive Cancer Center (Case CCC) with 1,367 clinical trial eligibility evaluations performed with 100% accuracy.
    Cancer informatics 01/2014; 13:157-66. DOI:10.4137/CIN.S19454
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Implementing semi-automated processes to efficiently match patients to clinical trials at the point of care requires both detailed patient data and authoritative information about open studies. Objective: To evaluate the utility of the registry as a data source for semi-automated trial eligibility screening. Methods: Eligibility criteria and metadata for 437 trials open for recruitment in four different clinical domains were identified in Trials were evaluated for up to date recruitment status and eligibility criteria were evaluated for obstacles to automated interpretation. Finally, phone or email outreach to coordinators at a subset of the trials was made to assess the accuracy of contact details and recruitment status. Results: 24% (104 of 437) of trials declaring on open recruitment status list a study completion date in the past, indicating out of date records. Substantial barriers to automated eligibility interpretation in free form text are present in 81% to up to 94% of all trials. We were unable to contact coordinators at 31% (45 of 146) of the trials in the subset, either by phone or by email. Only 53% (74 of 146) would confirm that they were still recruiting patients. Conclusion: Because has entries on most US and many international trials, the registry could be repurposed as a comprehensive trial matching data source. Semi-automated point of care recruitment would be facilitated by matching the registry's eligibility criteria against clinical data from electronic health records. But the current entries fall short. Ultimately, improved techniques in natural language processing will facilitate semi-automated complex matching. As immediate next steps, we recommend augmenting data entry forms to capture key eligibility criteria in a simple, structured format.
    PLoS ONE 10/2014; 9(10):e111055. DOI:10.1371/journal.pone.0111055 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". Objectives: The increasing availability of electronic clinical data provides great potential for finding eligible patients for clinical research. However, data heterogeneity makes it difficult for clinical researchers to interrogate sources consistently. Existing standard query languages are often not sufficient to query across diverse representations. Thus, a higher-level domain language is needed so that queries become data-representation agnostic. To this end, we define a clinician-readable computational language for querying whether patients meet eligibility criteria (ECs) from clinical trials. This language is capable of implementing the temporal semantics required by many ECs, and can be automatically evaluated on heterogeneous data sources. Methods: By reference to standards and examples of existing ECs, a clinician-readable query language was developed. Using a model-based approach, it was implemented to transform captured ECs into queries that interrogate heterogeneous data warehouses. The query language was evaluated on two types of data sources, each different in structure and content. Results: The query language abstracts the level of expressivity so that researchers construct their ECs with no prior knowledge of the data sources. It was evaluated on two types of semantically and structurally diverse data warehouses. This query language is now used to express ECs in the EHR4CR project. A survey shows that it was perceived by the majority of users to be useful, easy to understand and unambiguous. Discussion: An EC-specific language enables clinical researchers to express their ECs as a query such that the user is isolated from complexities of different heterogeneous clinical data sets. More generally, the approach demonstrates that a domain query language has potential for overcoming the problems of semantic interoperability and is applicable where the nature of the queries is well understood and the data is conceptually similar but in different representations. Conclusions: Our language provides a strong basis for use across different clinical domains for expressing ECs by overcoming the heterogeneous nature of electronic clinical data whilst maintaining semantic consistency. It is readily comprehensible by target users. This demonstrates that a domain query language can be both usable and interoperable.
    Methods of Information in Medicine 07/2014; 53(4). DOI:10.3414/ME13-02-0027 · 1.08 Impact Factor