A practical method for transforming free-text eligibility criteria into computable criteria

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA.
Journal of Biomedical Informatics (Impact Factor: 2.19). 04/2011; 44(2):239-50. DOI: 10.1016/j.jbi.2010.09.007
Source: PubMed


Formalizing eligibility criteria in a computer-interpretable language would facilitate eligibility determination for study subjects and the identification of studies on similar patient populations. Because such formalization is extremely labor intensive, we transform the problem from one of fully capturing the semantics of criteria directly in a formal expression language to one of annotating free-text criteria in a format called ERGO annotation. The annotation can be done manually, or it can be partially automated using natural-language processing techniques. We evaluated our approach in three ways. First, we assessed the extent to which ERGO annotations capture the semantics of 1000 eligibility criteria randomly drawn from ClinicalTrials.gov. Second, we demonstrated the practicality of the annotation process in a feasibility study. Finally, we demonstrate the computability of ERGO annotation by using it to (1) structure a library of eligibility criteria, (2) search for studies enrolling specified study populations, and (3) screen patients for potential eligibility for a study. We therefore demonstrate a new and practical method for incrementally capturing the semantics of free-text eligibility criteria into computable form.

Download full-text


Available from: Mor Peleg,
  • Source
    • "A number of screening methods for EMRs for eligible patients have been developed [7,8,10,11,18-20]. The search method for eligible patients used in our study was based on the patient treatment information rather than the plain text description of the disease in the EMRs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A number of clinical trials have encountered difficulties enrolling a sufficient number of patients upon initiating the trial. Recently, many screening systems that search clinical data warehouses for patients who are eligible for clinical trials have been developed. We aimed to estimate the number of eligible patients using routine electronic medical records (EMRs) and to predict the difficulty of enrolling sufficient patients prior to beginning a trial. Investigator-initiated clinical trials that were conducted at Kyoto University Hospital between July 2004 and January 2011 were included in this study. We searched the EMRs for eligible patients and calculated the eligible EMR patient index by dividing the number of eligible patients in the EMRs by the target sample size. Additionally, we divided the trial eligibility criteria into corresponding data elements in the EMRs to evaluate the completeness of mapping clinical manifestation in trial eligibility criteria into structured data elements in the EMRs. We evaluated the correlation between the index and the accrual achievement with Spearman's rank correlation coefficient. Thirteen of 19 trials did not achieve their original target sample size. Overall, 55% of the trial eligibility criteria were mapped into data elements in EMRs. The accrual achievement demonstrated a significant positive correlation with the eligible EMR patient index (r = 0.67, 95% confidence interval (CI), 0.42 to 0.92). The receiver operating characteristic analysis revealed an eligible EMR patient index cut-off value of 1.7, with a sensitivity of 69.2% and a specificity of 100.0%. Our study suggests that the eligible EMR patient index remains exploratory but could be a useful component of the feasibility study when planning a clinical trial. Establishing a step to check whether there are likely to be a sufficient number of eligible patients enables sponsors and investigators to concentrate their resources and efforts on more achievable trials.
    Trials 12/2013; 14(1):426. DOI:10.1186/1745-6215-14-426 · 1.73 Impact Factor
  • Source
    • "Still, none of the languages seems to be able to accurately represent all eligibility criteria. Tu et al. [8] estimate up to 60% criteria coverage for the ad hoc expression language ERGO. Wang et al. [9] estimate that their advanced Arden Syntax is able to describe up to 90% of all eligibility criteria. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The necessity to translate eligibility criteria from free text into decision rules that are compatible with data from the electronic health record (EHR) constitutes the main challenge when developing and deploying clinical trial recruitment support systems. Recruitment decisions based on case-based reasoning, i.e. using past cases rather than explicit rules, could dispense with the need for translating eligibility criteria and could also be implemented largely independently from the terminology of the EHR's database. We evaluated the feasibility of predictive modeling to assess the eligibility of patients for clinical trials and report on a prototype's performance for different system configurations. The prototype worked by using existing basic patient data of manually assessed eligible and ineligible patients to induce prediction models. Performance was measured retrospectively for three clinical trials by plotting receiver operating characteristic curves and comparing the area under the curve (ROC-AUC) for different prediction algorithms, different sizes of the learning set and different numbers and aggregation levels of the patient attributes. Random forests were generally among the best performing models with a maximum ROC-AUC of 0.81 (CI: 0.72-0.88) for trial A, 0.96 (CI: 0.95-0.97) for trial B and 0.99 (CI: 0.98-0.99) for trial C. The full potential of this algorithm was reached after learning from approximately 200 manually screened patients (eligible and ineligible). Neither block- nor category-level aggregation of diagnosis and procedure codes influenced the algorithms' performance substantially. Our results indicate that predictive modeling is a feasible approach to support patient recruitment into clinical trials. Its major advantages over the commonly applied rule-based systems are its independency from the concrete representation of eligibility criteria and EHR data and its potential for automation.
    BMC Medical Informatics and Decision Making 12/2013; 13(1):134. DOI:10.1186/1472-6947-13-134 · 1.83 Impact Factor
  • Source
    • "Ross et al. conducted a survey of 1,000 criteria randomly selected from ClinicalTrials.gov and found that 80% of them had a significant semantic complexity [13], with 40% involving some temporal reasoning. Tu et al. proposed an approach to convert free text eligibility criteria into the computable ERGO formalism [14]. O’Connor et al. developed a solution based on OWL and SWRL that supports temporal reasoning and bridges the gap between patients specific data and more general eligibility criteria [15]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical trials are important for patients, for researchers and for companies. One of the major bottlenecks is patient recruitment. This task requires the matching of a large volume of information about the patient with numerous eligibility criteria, in a logically-complex combination. Moreover, some of the patient's information necessary to determine the status of the eligibility criteria may not be available at the time of pre-screening. We showed that the classic approach based on negation as failure over-estimates rejection when confronted with partially-known information about the eligibility criteria because it ignores the distinction between a trial for which patient eligibility should be rejected and trials for which patient eligibility cannot be asserted. We have also shown that 58.64% of the values were unknown in the 286 prostate cancer cases examined during the weekly urology multidisciplinary meetings at Rennes' university hospital between October 2008 and March 2009. We propose an OWL design pattern for modeling eligibility criteria based on the open world assumption to address the missing information problem. We validate our model on a fictitious clinical trial and evaluate it on two real clinical trials. Our approach successfully distinguished clinical trials for which the patient is eligible, clinical trials for which we know that the patient is not eligible and clinical trials for which the patient may be eligible provided that further pieces of information (which we can identify) can be obtained. OWL-based reasoning based on the open world assumption provides an adequate framework for distinguishing those patients who can confidently be rejected from those whose status cannot be determined. The expected benefits are a reduction of the workload of the physicians and a higher efficiency by allowing them to focus on the patients whose eligibility actually require expertise.
    Journal of Biomedical Semantics 09/2013; 4(1):17. DOI:10.1186/2041-1480-4-17 · 2.26 Impact Factor
Show more