Conference PaperPDF Available

Detailed Clinical Modelling Approach to Data Extraction from Heterogeneous Data Sources for Clinical Research


Abstract and Figures

The reuse of routinely collected clinical data for clinical research is being explored as part of the drive to reduce duplicate data entry and to start making full use of the big data potential in the healthcare domain. Clinical researchers often need to extract data from patient registries and other patient record datasets for data analysis as part of clinical studies. In the TRANSFoRm project, researchers define their study requirements via a Query Formulation Workbench. We use a standardised approach to data extraction to retrieve relevant information from heterogeneous data sources, using semantic interoperability enabled via detailed clinical modelling. This approach is used for data extraction from data sources for analysis and for pre-population of electronic Case Report Forms from electronic health records in primary care clinical systems.
Content may be subject to copyright.
Detailed Clinical Modelling Approach to Data Extraction from Heterogeneous
Data Sources for Clinical Research
Sarah N. Lim Choi Keung, PhD1, Lei Zhao, MSc1, James Rossiter, PhD1, Mark
McGilchrist, PhD2, Frank Culross, MSc, BSc2, Jean-François Ethier, MD3, Anita Burgun,
MD, PhD3, Robert A. Verheij, PhD4, Nasra Khan, MSc4, Adel Taweel, PhD5, Vasa Curcin,
PhD6, Brendan C. Delaney, BM BCh, MD5, Theodoros N. Arvanitis, DPhil1
1Institute of Digital Healthcare, WMG, University of Warwick, UK;
2Health Informatics Centre, University of Dundee, UK;
3INSERM UMR_S 872, France;
4NIVEL Netherlands Institute for Health Services Research, The Netherlands;
5Department of Primary Care and Health Sciences, King’s College London, UK;
6Department of Computing, Imperial College London, UK
The reuse of routinely collected clinical data for clinical research is being explored as part of the drive to reduce
duplicate data entry and to start making full use of the big data potential in the healthcare domain. Clinical
researchers often need to extract data from patient registries and other patient record datasets for data analysis as
part of clinical studies. In the TRANSFoRm project, researchers define their study requirements via a Query
Formulation Workbench. We use a standardised approach to data extraction to retrieve relevant information from
heterogeneous data sources, using semantic interoperability enabled via detailed clinical modelling. This approach
is used for data extraction from data sources for analysis and for pre-population of electronic Case Report Forms
from electronic health records in primary care clinical systems.
One of the challenges in healthcare is the efficient reuse of routinely collected data for secondary purposes, such as
clinical research. The main uses of electronic health records (eHRs) from patient registries or eHR systems in
clinical research are for data analysis and for pre-population of electronic Case Report Forms (eCRFs). While
existing patient records can sometimes fulfil all the requirements of a retrospective study analysis, the pre-
population of eCRFs from eHRs can cover between 30% and 50% of the requirements1, and integrated electronic
data capture for eCRFs and eHRs can have an even higher overlap, depending on the study2. These highlight the
potential of reusing clinical data while reducing the amount of redundant data entry (data recorded in clinical care
that can be directly used for clinical research). Our research aims to support the interoperability between the clinical
researcher tools and the clinical data within patient registries and eHR systems.
The TRANSFoRm project3 aims to develop rigorous and generic methods for the integration of primary care clinical
and research activities, to support patient safety and clinical research. The two clinical research support tools for
researchers are the Query Formulation Workbench (QFW) and the eCRF Data Collection Tool. The QFW helps
researchers to define studies with eligibility criteria sets for participants, build queries to identify eligible
participants, flag patients, and extract data for analysis. The eCRF Data Collection Tool will support primary care
practitioners to collect clinical study data and support the collection of patient reported outcome measures (PROMs)
via web and mobile methods. In TRANSFoRm, the challenge is to bridge the gap between user requirements in
terms of clinical study data items, and the execution of actual queries based on these requirements at the data
sources. We adopt a two-level modelling approach4-6 to separate out the more stable domain information from the
various schema implemented by the heterogeneous data sources. The detailed clinical modelling (DCM) approach
represents this accurately and will be described further in this paper.
The workflow and the involvement of the TRANSFoRm tools (specifically the QFW) and components are shown in
Figure 1, from the definition of the study data extraction requirements to the actual queries at the data sources. In
this paper, we focus on cohort identification. Taking the case of a researcher using the QFW to define a retrospective
study of patients with Diabetes Mellitus, Step 1 involves defining the data to be extracted from the data sources,
without needing to know the format or coding system used in individual data sources. In Steps 2 to 4, a number of
TRANSFoRm components are involved to convert the data extract definition into semantically interoperable queries
that can be executed at the respective data sources to return the requested data in the format defined by the user.
The remaining sections of this paper are structured as follows to describe DCM approach for semantic
interoperability. The Methods section describes the DCM approach as a two-level modelling based on an
information model and archetypes to constrain it. The Results section then demonstrates with examples how user
requirements are mapped to a specific patient registry schema for data extraction. Finally, we discuss the use of the
DCM approach in other TRANSFoRm tools, and finish with some conclusions and future work.
Figure 1. Conceptual workflow, from user definition of data extract requirements to actual queries at data source.
Detailed Clinical Models (DCM) organise health information by combining knowledge, data element specification,
relationships between elements, and terminology into information models that allow deployment in different
technical formats7,8. DCM enables semantic interoperability by formalising or standardising clinical data elements
which are modelled independently of their technical implementations. The data elements and models can then be
applied in various technical contexts, such as eHR, messaging, data warehouses and clinical decision support
systems. Work on DCM is still at an early stage with a number of groups involved on an ISO standard for DCM9.
Within the TRANSFoRm project, the two-level modelling approach of DCM is depicted on the first level as an
information model, the Clinical Research Information Model (CRIM), which defines the workflow and data
requirements of the clinical research task, combined with the Clinical Data Integration Model (CDIM), an ontology
of clinical primary care domain that captures the structural and semantic variability of data representations across
data sources. This separation of the information model from the reference ontology has been previously described by
Smith and Ceusters10. At the second level, archetypes are used to constrain the domain concepts and specify the
implementation aspects of the data elements within eHR systems or patient registries. We use the Archetype
Definition Language (ADL) to define the constraints and combine them with CDIM concepts in specifying the
appropriate data types and range values. The two-level modelling approach, using the concept of archetype for
detailed clinical content modelling, has been adopted by ISO/CEN 1360611,12. This approach makes it possible to
separate specific clinical content from the software implementation. The technical design of the software is driven
by the first level information model which specifies the generic information structure of the domain. The archetype
defines the data elements that are required by specific application contexts e.g. different clinical studies.
The distributed query and data extraction infrastructure is a central component of the TRANSFoRm software
platform. This infrastructure facilitates patient identification and reuse of routine healthcare data for research
analysis. The TRANSForm platform interacts with disparate patient registries and eHR systems via the Data Node
Connector, which translates the user queries, such as a data extraction definition as part of a retrospective study, in
the form of archetypes to data source queries using the Semantic Mediator. The Semantic Mediator ensures the
semantic translation queries from the Query Formulation Workbench to individual data source schema with the help
of data source models (DSM) and mappings to CDIM (CDIM-DSM)13,14. The transformed query can then be
executed at the data source side and results are returned to the user. While specific DSM and CDIM-DSM mappings
are required for each data source, these have to be built only once per data source. Additionally, the detailed clinical
model is flexible enough to enable researchers to query heterogeneous datasets without any knowledge of the
underlying structure, as they themselves do not use the DSM and CDIM-DSM mappings directly.
The data extraction for analysis was carried out for a Diabetes study, using a patient registry sample. In this section,
we demonstrate how the data extract definition was processed, from the user at the Query Formulation Workbench,
via the TRANSFoRm DCM to the data source. Following the steps in the conceptual workflow in Figure 1, we
describe one specific data extract requirement prescription dates for Metformin medication for illustration. The
clinical researcher defines what data to extract using the Query Formulation Workbench. In the case where the
researcher wants to extract all the instances when patients have been prescribed Metformin (Figure 2), the data
elements Medication and Prescription date are selected for extraction, and the constraint on the Medication concept
is specified as part of the archetype specification. For example, the researcher can choose Metformin with the ATC
code ‘A10BA02’ from the TRANSFoRm terminology service15. The resulting archetype definition in ADL is shown
in Figure 3.
Figure 2: Data extract definition using the Query Formulation Workbench
Figure 3: Medication archetype definition in ADL.
The translation of archetypes into a computable form at the data source includes the use of a DSM (Figure 4a) and
the CDIM-DSM mappings for the data source (Figure 4b). The DSM defines how the data source organises the
medication prescription information, while the CDIM-DSM mappings express information in the form of triplets
(CDIM concept; operator; terminology code). For instance, for Metformin with ATC code ‘A10BA02’, the
information triplet is represented as (medication agent; =; A10BA02). Following the transformations, an SQL
query is generated to enable the specified data to be extracted from the data source (Figure 5).
Figure 4: (a) Part of DSM definition (b) Part of CDIM-DSM for medication.
Figure 5: SQL query generated for data source schema.
Different solutions have been developed internationally to support a more rapid translation of scientific discoveries
into clinical practice, notably i2b216. i2b2 is a data warehousing system that extracts, transforms and loads data
into a common schema. In comparison, the TRANSFoRm infrastructure adopts a model-based mediation
approach, allowing the querying of heterogeneous data repositories without needing them to be in a single
common schema. The TRANSFoRm project also aims to support clinical research with the reuse of eHR data
within eCRFs, to avoid duplicate data collection. A minimisation of transcription errors and time-saving are added
benefits for the reuse of routinely-collected clinical data. For instance, Köpcke et al.17 report that the pre-population
of case report forms decreased the time for data collection by nine-fold, from a median of 255 to 30 s. The DCM
approach can be used in a similar way for the automatic pre-population of eCRFs from eHR systems as for the data
extraction for retrospective studies from patient registries. The pre-populated data can be exported in the Operational
Data Model (ODM) format18, a standard for the interchange of data and metadata for clinical research, especially
data collected from multiple sources. This will make the pre-populated data compatible with the remaining eCRF
and PROM data that are collected as part of a study.
TRANSFoRm uses archetypes in the current implementation as ADL is a user-friendly language and can be easily
understood by clinical researchers. HL7 templates, which constrain the HL7 clinical statement pattern, provide an
alternative way to implement DCM in the context of HL78. Future improvements to the TRANSFoRm GUI tools
can include an authoring tool to assist users in defining new data elements. Referring to the medication archetype
definition in Figure 2, currently, a user cannot directly update the archetype structure, for example to add the
constraint of the dosage of the medication. Additionally, the tool can support various data element specification
formats, such as HL7 templates and archetypes, for interoperation with systems that use these technologies.
The reuse of routinely collected data from clinical care in clinical research is an important goal of the TRANSFoRm
project. The approach is to retrieve relevant data elements from the data sources (patient registries and eHR systems)
without using a common structure to enable interoperability. Researchers can use the TRANSFoRm tools to define
their studies without being aware of the underlying structure of the heterogeneous datasets. We have presented how
a detailed clinical modelling approach is used to enable semantic interoperability between the researcher-defined
queries and the individual data sources. The two-level modelling supports the flexibility of specifying new
archetypes, as well as to add new data sources, while keeping the information model stable. Therefore, the DCM
approach facilitates the bridging of the gap between clinical research and clinical care. The next steps include the
validation of this approach and the related TRANSFoRm tools and components. Validation is being planned based
on two use cases, a retrospective genotype-phenotype diabetes study and a prospective study for the gastro-
oesophageal reflux disease randomised control trial.
The TRANSFoRm project is partially funded by the European Commission under the 7th Framework Programme
(Grant Agreement 247787).
1. El Fadly A, Rance B, Lucas N, Mead C, Chatellier G, Lastic P-Y, et al. Integrating clinical research with the Healthcare
Enterprise: From the RE-USE project to the EHR4CR platform. J Biomed Inform. 2011 Dec;44, Supplement 1:S94S102.
2. Zahlmann G, Harzendorf N, Shwarz-Boeger U, Paepke S, Schmidt M, Harbeck N, et al. EHR and EDC Integration in
Reality [Internet]. Appl. Clin. Trials. 2009 [cited 2013 Oct 1]. Available from:
3. TRANSFoRm [Internet]. [cited 2013 Sep 30]. Available from:
4. Rector AL, Nowlan WA, Kay S, Goble CA, Howkins TJ. A framework for modelling the electronic medical record.
Methods Inf Med. 1993 Apr;32(2):10919.
5. Johnson SB. Generic data modeling for clinical repositories. J Am Med Inform Assoc. 1996;3(5):32839.
6. Beale T. Archetypes: Constraint-based domain models for future-proof information systems. Seattle, Washington, USA,
November 4, 2002; 2002. Available from:
7. Goossen W, Goossen-Baremans A, van der Zel M. Detailed Clinical Models: A Review. Health Informatics Res.
8. Goossen WTF, Goossen-Baremans A. Bridging the HL7 template - 13606 archetype gap with detailed clinical models. Stud
Health Technol Inform. 2010;160(Pt 2):9326.
9. European Committee for Standardization CEN. CEN/TC 251 - Standards under development [Internet]. [cited 2013 Oct 1].
Available from:
10. Smith B, Ceusters W. HL7 RIM: an incoherent standard. Stud Health Technol Inform. 2006;124:1338.
11. EN 13606 Association. The CEN/ISO EN13606 standard [Internet]. [cited 2013 Oct 2]. Available from:
12. Muñoz P, Trigo J, Martínez I, Muñoz A, Escayola J, García J. The ISO/EN 13606 Standard for the Interoperable Exchange
of Electronic Health Records. J Healthc Eng. 2011 Mar 1;2(1):124.
13. Ethier J-F, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN, et al. A unified structural/terminological
interoperability framework based on LexEVS: application to TRANSFoRm. J Am Med Inform Assoc. 2013 Jan
14. Ethier J, McGilchrist M, Burgun A, Sullivan F. D6.3 Data Integration Models [Internet]. 2013. Available from:
15. Lim Choi Keung SN, Zhao L, Tyler E, Arvanitis TN. Integrated Vocabulary Service for Health Data Interoperability.
Fourth International Conference on eHealth, Telemedicine and Social Medicine (eTELEMED 2012). Valencia, Spain:
IARIA; 2012. p. 1247.
16. Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the
bedside (i2b2)[J]. J Am Med Inform Assoc. 2010, 17(2): 124-130.
17. Köpcke F, Kraus S, Scholler A, Nau C, Schüttler J, Prokosch H-U, et al. Secondary use of routinely collected patient data in
a clinical trial: an evaluation of the effects on patient recruitment and data acquisition. Int J Med Inf. 2013;82(3):18592.
18. CDISC. ODM: Operational data Model [Internet]. [cited 2013 Oct 2]. Available from:
Objective The Learning Health System (LHS) requires integration of research into routine practice. ‘eSource’ or embedding clinical trial functionalities into routine electronic health record (EHR) systems has long been put forward as a solution to the rising costs of research. We aimed to create and validate an eSource solution that would be readily extensible as part of a LHS. Materials and Methods The EU FP7 TRANSFoRm project’s approach is based on dual modeling, using the Clinical Research Information Model (CRIM) and the Clinical Data Integration Model of meaning (CDIM) to bridge the gap between clinical and research data structures, using the CDISC Operational Data Model (ODM) standard. Validation against GCP requirements was conducted in a clinical site, and a cluster randomised evaluation by site nested into a live clinical trial. Results Using the form definition element of ODM, we linked precisely modelled data queries to data elements, constrained against CDIM concepts, to enable automated patient identification for specific protocols and pre-population of electronic case report forms (e-CRF). Both control and eSource sites recruited better than expected with no significant difference. Completeness of clinical forms was significantly improved by eSource, but Patient Related Outcome Measures (PROMs) were less well completed on smartphones than paper in this population. Discussion The TRANSFoRm approach provides an ontologically-based approach to eSource in a low-resource, heterogeneous, highly distributed environment, that allows precise prospective mapping of data elements in the EHR. Conclusion Further studies using this approach to CDISC should optimise the delivery of PROMS, whilst building a sustainable infrastructure for eSource with research networks, trials units and EHR vendors.
Full-text available
The standardization of Electronic Health Records (EHR) is a crucial factor for ensuring interoperable sharing of health data. During recent decades, a plethora of initiatives - driven by international organizations - has emerged to define the required models describing the exchange of information between EHRs. These models cover different essential characteristics for building interoperable EHRs, such as architecture, methodology, communication, safety or terminology, among others. In this context, the European reference frame for the standardized exchange of EHR is the recently approved ISO/EN 13606 standard. This multi-part standard provides the syntactic and semantic capabilities (through a dual model approach) as well as terminology, security and interface considerations for the standardized exchange of EHR. This paper provides (a) an introduction to the different standardization efforts related to the interoperable exchange of EHR around the world, and (b) a description of how the ISO/EN 13606 standard provides interoperable sharing of clinical information.
Conference Paper
Full-text available
The paper addresses the problem of interoperation when searching across patient data represented in several medical vocabularies. This is an important issue of relevance to eHealth integration that will allow clinical information to be used in clinical research. We propose a novel way to semantically integrate a number of vocabularies for reference using a vocabulary service.
Conference Paper
Full-text available
Most information systems today are built using "single-level" methodologies, in which both informational and knowledge concepts are built into one level of object and data models. In domains characterised by complexity, large numbers of concepts, and/or a high rate of defini- tional change, systems based on such models are expensive to maintain and usually have to be replaced after a few years. However, a two-level methodology is possible, in which systems are built from information models only, and driven at runtime by knowledge-level concept definitions, or "archetypes". In this approach, systems can be built more quickly and last longer, whilst archetypes are authored directly by domain specialists, rather than IT personnel. Executed properly, the approach has the potential for creating future-proof systems and infor- mation. Work in the medical informatics domain on electronic health records (EHRs) has shown that a two-level methodology is implementable, makes for smaller systems, and empowers domain users.
Full-text available
Objective Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method. Materials and methods We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures. Results Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project. Conclusions We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration.
Full-text available
Background: There are different approaches for repurposing clinical data collected in the Electronic Healthcare Record (EHR) for use in clinical research. Semantic integration of "siloed" applications across domain boundaries is the raison d'être of the standards-based profiles developed by the Integrating the Healthcare Enterprise (IHE) initiative - an initiative by healthcare professionals and industry promoting the coordinated use of established standards such as DICOM and HL7 to address specific clinical needs in support of optimal patient care. In particular, the combination of two IHE profiles - the integration profile "Retrieve Form for Data Capture" (RFD), and the IHE content profile "Clinical Research Document" (CRD) - offers a straightforward approach to repurposing EHR data by enabling the pre-population of the case report forms (eCRF) used for clinical research data capture by Clinical Data Management Systems (CDMS) with previously collected EHR data. Objective: Implement an alternative solution of the RFD-CRD integration profile centered around two approaches: (i) Use of the EHR as the single-source data-entry and persistence point in order to ensure that all the clinical data for a given patient could be found in a single source irrespective of the data collection context, i.e. patient care or clinical research; and (ii) Maximize the automatic pre-population process through the use of a semantic interoperability services that identify duplicate or semantically-equivalent eCRF/EHR data elements as they were collected in the EHR context. Methods: The RE-USE architecture and associated profiles are focused on defining a set of scalable, standards-based, IHE-compliant profiles that can enable single-source data collection/entry and cross-system data reuse through semantic integration. Specifically, data reuse is realized through the semantic mapping of data collection fields in electronic Case Report Forms (eCRFs) to data elements previously defined as part of patient care-centric templates in the EHR context. The approach was evaluated in the context of a multi-center clinical trial conducted in a large, multi-disciplinary hospital with an installed EHR. Results: Data elements of seven eCRFs used in a multi-center clinical trial were mapped to data elements of patient care-centric templates in use in the EHR at the George Pompidou hospital. 13.4% of the data elements of the eCRFs were found to be represented in EHR templates and were therefore candidate for pre-population. During the execution phase of the clinical study, the semantic mapping architecture enabled data persisted in the EHR context as part of clinical care to be used to pre-populate eCRFS for use without secondary data entry. To ensure that the pre-populated data is viable for use in the clinical research context, all pre-populated eCRF data needs to be first approved by a trial investigator prior to being persisted in a research data store within a CDMS. Conclusion: Single-source data entry in the clinical care context for use in the clinical research context - a process enabled through the use of the EHR as single point of data entry, can - if demonstrated to be a viable strategy - not only significantly reduce data collection efforts while simultaneously increasing data collection accuracy secondary to elimination of transcription or double-entry errors between the two contexts but also ensure that all the clinical data for a given patient, irrespective of the data collection context, are available in the EHR for decision support and treatment planning. The RE-USE approach used mapping algorithms to identify semantic coherence between clinical care and clinical research data elements and pre-populate eCRFs. The RE-USE project utilized SNOMED International v.3.5 as its "pivot reference terminology" to support EHR-to-eCRF mapping, a decision that likely enhanced the "recall" of the mapping algorithms. The RE-USE results demonstrate the difficult challenges involved in semantic integration between the clinical care and clinical research contexts.
Full-text available
Due to the increasing use of electronic patient records and other health care information technology, we see an increase in requests to utilize these data. A highly level of standardization is required during the gathering of these data in the clinical context in order to use it for analyses. Detailed Clinical Models (DCM) have been created toward this purpose and several initiatives have been implemented in various parts of the world to create standardized models. This paper presents a review of DCM. Two types of analyses are presented; one comparing DCM against health care information architectures and a second bottom up approach from concept analysis to representation. In addition core parts of the draft ISO standard 13972 on DCM are used such as clinician involvement, data element specification, modeling, meta information, and repository and governance. SIX INITIATIVES WERE SELECTED: Intermountain Healthcare, 13606/OpenEHR Archetypes, Clinical Templates, Clinical Contents Models, Health Level 7 templates, and Dutch Detailed Clinical Models. Each model selected was reviewed for their overall development, involvement of clinicians, use of data types, code bindings, expressing semantics, modeling, meta information, use of repository and governance. Using both a top down and bottom up approach to comparison reveals many commonalties and differences between initiatives. Important differences include the use of or lack of a reference model and expressiveness of models. Applying clinical data element standards facilitates the use of conceptual DCM models in different technical representations.
Full-text available
The idea of two level modeling has been taken up in healthcare information systems development. There is ongoing debate which approach should be taken. From the premise that there is a lack of clinician's time available, and the need for semantic interoperability, harmonization efforts are important. The question this paper addresses is whether Detailed Clinical Models (DCM) can bridge the gap between existing approaches. As methodology, a bottom up approach in multilevel comparison of existing content and modeling is used. Results indicate that it is feasible to compare and reuse DCM with clinical content from one approach to the other, when specific limitations are taken into account and precise analysis of each data-item is carried out. In particular the HL7 templates, the ISO/CEN 13606 and OpenEHR archetypes reveal more commonalties than differences. The linkage of DCM to terminologies suggests that data-items can be linked to concepts present in multiple terminologies. This work concludes that it is feasible to model a multitude of precise items of clinical information in the format of DCM and that transformations between different approaches are possible without loss of meaning. However, a set of single or combined clinical items and assessment scales have been tested. Larger groupings of clinical information might bring up more challenges.
Full-text available
Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing ( Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL:
Purpose: Clinical trials are time-consuming and require constant focus on data quality. Finding sufficient time for a trial is a challenging task for involved physicians, especially when it is conducted in parallel to patient care. From the point of view of medical informatics, the growing amount of electronically available patient data allows to support two key activities: the recruitment of patients into the study and the documentation of trial data. Methods: The project was carried out at one site of a European multicenter study. The study protocol required eligibility assessment for 510 patients in one week and the documentation of 46-186 data elements per patient. A database query based on routine data from patient care was set up to identify eligible patients and its results were compared to those of manual recruitment. Additionally, routine data was used to pre-populate the paper-based case report forms and the time necessary to fill in the remaining data elements was compared to completely manual data collection. Results: Even though manual recruitment of 327 patients already achieved high sensitivity (88%) and specificity (87%), the subsequent electronic report helped to include 42 (14%) additional patients and identified 21 (7%) patients, who were incorrectly included. Pre-populating the case report forms decreased the time required for documentation from a median of 255 to 30s. Conclusions: Reuse of routine data can help to improve the quality of patient recruitment and may reduce the time needed for data acquisition. These benefits can exceed the efforts required for development and implementation of the corresponding electronic support systems.
This paper presents a model for an electronic medical record which satisfies the requirements for a faithful and structured record of patient care set out in a previous paper in this series. The model underlies the PEN & PAD clinical workstation, and it provides for a permanent, completely attributable record of patient care and the process of medical decision making. The model separates the record into two levels: direct observations of the patient and meta-statements about the use of observations in decision making and the clinical dialogue. The model is presented in terms of "descriptions" formulated in the Structured Meta Knowledge (SMK) formalism, but many of its features are more general than the specific implementation. The use of electronic medical records based on the model for decision support and the analysis of aggregated data are discussed along with potential use of the model in distributed information systems.