Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.

Department of Informatics, University of Sussex, Brighton, UK.
Journal of the American Medical Informatics Association (Impact Factor: 3.57). 11/2013; DOI: 10.1136/amiajnl-2013-001847
Source: PubMed

ABSTRACT UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user.
Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality.
An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results.
We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research.
Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Pharmaceutical clinical trials are primarily conducted across many countries, yet recruitment numbers are frequently not met in time. Electronic health records store large amounts of potentially useful data that could aid in this process. The EHR4CR project aims at re-using EHR data for clinical research purposes. Objective: To evaluate whether the protocol feasibility platform produced by the Electronic Health Records for Clinical Research (EHR4CR) project can be installed and set up in accordance with local technical and governance requirements to execute protocol feasibility queries uniformly across national borders. Methods: We installed specifically engineered software and warehouses at local sites. Approvals for data access and usage of the platform were acquired and terminology mapping of local site codes to central platform codes were performed. A test data set, or real EHR data where approvals were in place, were loaded into data warehouses. Test feasibility queries were created on a central component of the platform and sent to the local components at eleven university hospitals. Results: To use real, de-identified EHR data we obtained permissions and approvals from 'data controllers' and ethics committees. Through the platform we were able to create feasibility queries, distribute them to eleven university hospitals and retrieve aggregated patient counts of both test data and de-identified EHR data. Conclusion: It is possible to install a uniform piece of software in different university hospitals in five European countries and configure it to the requirements of the local networks, while complying with local data protection regulations. We were also able set up ETL processes and data warehouses, to re-use EHR data for feasibility queries distributed over the EHR4CR platform.
    Methods of Information in Medicine 06/2014; · 1.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To evaluate risk factors associated with exacerbation frequency in primary care. Information on exacerbations of chronic obstructive pulmonary disease (COPD) has mainly been generated by secondary care-based clinical cohorts. Retrospective observational cohort study. Electronic medical records database (England and Wales). 58 589 patients with COPD aged ≥40 years with COPD diagnosis recorded between 1 April 2009 and 30 September 2012, and with at least 365 days of follow-up before and after the COPD diagnosis, were identified in the Clinical Practice Research Datalink. Mean age: 69 years; 47% female; mean forced expiratory volume in 1s 60% predicted. Data on moderate or severe exacerbation episodes defined by diagnosis and/or medication codes 12 months following cohort entry were retrieved, together with demographic and clinical characteristics. Associations between patient characteristics and odds of having none versus one, none versus frequent (≥2) and one versus frequent exacerbations over 12 months follow-up were evaluated using multivariate logistic regression models. During follow-up, 23% of patients had evidence of frequent moderate-to-severe COPD exacerbations (24% one; 53% none). Independent predictors of increased odds of having exacerbations during the follow-up, either frequent episodes or one episode, included prior exacerbations, increasing dyspnoea score, increasing grade of airflow limitation, females and prior or current history of several comorbidities (eg, asthma, depression, anxiety, heart failure and cancer). Primary care-managed patients with COPD at the highest risk of exacerbations can be identified by exploring medical history for the presence of prior exacerbations, greater COPD disease severity and co-occurrence of other medical conditions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to
    BMJ Open 12/2014; 4(12):e006171. · 2.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The optimal method of identifying people with chronic obstructive pulmonary disease (COPD) from electronic primary care records is not known. We assessed the accuracy of different approaches using the Clinical Practice Research Datalink, a UK electronic health record database.
    BMJ Open 07/2014; 4(7):e005540. · 2.06 Impact Factor


Available from
May 20, 2014
Available from