A practical method for transforming free-text eligibility criteria into computable criteria.

Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA.
Journal of Biomedical Informatics (Impact Factor: 2.13). 04/2011; 44(2):239-50. DOI: 10.1016/j.jbi.2010.09.007
Source: PubMed

ABSTRACT Formalizing eligibility criteria in a computer-interpretable language would facilitate eligibility determination for study subjects and the identification of studies on similar patient populations. Because such formalization is extremely labor intensive, we transform the problem from one of fully capturing the semantics of criteria directly in a formal expression language to one of annotating free-text criteria in a format called ERGO annotation. The annotation can be done manually, or it can be partially automated using natural-language processing techniques. We evaluated our approach in three ways. First, we assessed the extent to which ERGO annotations capture the semantics of 1000 eligibility criteria randomly drawn from Second, we demonstrated the practicality of the annotation process in a feasibility study. Finally, we demonstrate the computability of ERGO annotation by using it to (1) structure a library of eligibility criteria, (2) search for studies enrolling specified study populations, and (3) screen patients for potential eligibility for a study. We therefore demonstrate a new and practical method for incrementally capturing the semantics of free-text eligibility criteria into computable form.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: Medical progress depends on the evaluation of new diagnostic and therapeutic interventions within clinical trials. Clinical trial recruitment support systems (CTRSS) aim to improve the recruitment process in terms of effectiveness and efficiency.
    Journal of Medical Internet Research 01/2014; 16(7):e161. · 4.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Implementing semi-automated processes to efficiently match patients to clinical trials at the point of care requires both detailed patient data and authoritative information about open studies.
    PLoS ONE 10/2014; 9(10):e111055. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". Objectives: The increasing availability of electronic clinical data provides great potential for finding eligible patients for clinical research. However, data heterogeneity makes it difficult for clinical researchers to interrogate sources consistently. Existing standard query languages are often not sufficient to query across diverse representations. Thus, a higher-level domain language is needed so that queries become data-representation agnostic. To this end, we define a clinician-readable computational language for querying whether patients meet eligibility criteria (ECs) from clinical trials. This language is capable of implementing the temporal semantics required by many ECs, and can be automatically evaluated on heterogeneous data sources. Methods: By reference to standards and examples of existing ECs, a clinician-readable query language was developed. Using a model-based approach, it was implemented to transform captured ECs into queries that interrogate heterogeneous data warehouses. The query language was evaluated on two types of data sources, each different in structure and content. Results: The query language abstracts the level of expressivity so that researchers construct their ECs with no prior knowledge of the data sources. It was evaluated on two types of semantically and structurally diverse data warehouses. This query language is now used to express ECs in the EHR4CR project. A survey shows that it was perceived by the majority of users to be useful, easy to understand and unambiguous. Discussion: An EC-specific language enables clinical researchers to express their ECs as a query such that the user is isolated from complexities of different heterogeneous clinical data sets. More generally, the approach demonstrates that a domain query language has potential for overcoming the problems of semantic interoperability and is applicable where the nature of the queries is well understood and the data is conceptually similar but in different representations. Conclusions: Our language provides a strong basis for use across different clinical domains for expressing ECs by overcoming the heterogeneous nature of electronic clinical data whilst maintaining semantic consistency. It is readily comprehensible by target users. This demonstrates that a domain query language can be both usable and interoperable.
    Methods of Information in Medicine 07/2014; 53(4). · 1.08 Impact Factor

Full-text (2 Sources)

Available from
May 23, 2014