Data standards for clinical research data collection
forms: current status and challenges
Rachel L Richesson,1Prakash Nadkarni2
Case report forms (CRFs) are used for structured-data
collection in clinical research studies. Existing CRF-
related standards encompass structural features of
forms and data items, content standards, and
specifications for using terminologies. This paper reviews
existing standards and discusses their current
limitations. Because clinical research is highly protocol-
specific, forms-development processes are more easily
standardized than is CRF content. Tools that support
retrieval and reuse of existing items will enable
standards adoption in clinical research applications. Such
tools will depend upon formal relationships between
items and terminological standards. Future standards
adoption will depend upon standardized approaches for
bridging generic structural standards and domain-specific
content standards. Clinical research informatics can help
define tools requirements in terms of workflow support
for research activities, reconcile the perspectives of
varied clinical research stakeholders, and coordinate
standards efforts toward interoperability across
healthcare and research data collection.
Data collection for clinical research involves gath-
ering variables relevant to research hypotheses.
These variables (‘patient parameters,’ ‘data items,’
‘data elements,’ or ‘questions’) are aggregated into
data-collection forms (‘Case Report Forms’ or
CRFs) for study implementation. The International
Electro-technical Commission (ISO/IEC) 11179
technical standard)1defines a data element as ‘a
unit of data for which the definition, identification,
representation, and permissible values are specified
through a set of attributes.’ Such attributes include:
the element’s internal name, data type, caption
presented to users, detailed description, and basic
validation information such as range checks or set
Data element and CRF reuse can reduce study
implementation time, and facilitate sharing and
analyzability of data aggregated from multiple
sources.2 3In this paper, we summarize relevant
CRFs standards and their limitations, and highlight
important unaddressed informatics-standardization
challenges in optimizing research processes and
facilitating interoperability of research and health-
BACKGROUND AND SIGNIFICANCE
CRFs support either primary (real-time) data
collection, or secondarily recorded data originating
elsewhere (eg, the electronic health record (EHR) or
paper records). EHR and Research data capture
differ in that the latter records a subset of patient
parametersdthe research protocol’s variablesdin
much greater depth and in maximally structured
form; narrative text is de-emphasized except to
record unanticipated information.
primary electronic data capture (EDC) has steadily
increased,4paper is still used when EDC is infea-
sible for logistic or financial reasons. The existence
of secondary EDC also influences manual workflow
processes related to verification of paper-based
primary data, for example, checks for completeness,
legibility, and valid codes. The present limbo
between paper and EDC complicates standardiza-
CRF standards: current activities
Currently, no universal CRF-design standards exist,
though conventions and some ‘best’ practices
do.5e9The Clinical Data Interchange Standards
which focuses primarily on regulated studies, has
proposed such standards. However, these proposals,
while valuable for general areas such as drug safety,
do not address broader issues of clinical research,
including observational research, genetic studies,
and studies using patient-reported experience as
key study endpoints.
Clinical Data Standards Acquisition Standards
In response to the Food and Drug Administration
(FDA)’s 2004 report,
Challenge and Opportunity on the Critical Path to
New Medical Products,’ a CDISC project, Clinical
Data Standards Acquisition Standards Harmoniza-
tion (CDASH), addresses data collection standards
standards focused on cross-specialty areas such as
clinical-trials safety. Disease- or therapeutic-specific
standards are now being considered, along with
tools and process development to facilitate data-
element reuse across diseases.
The OpenEHR foundation has proposed arche-
types11 12as a basis for HL7 Clinical Document
Architecture templates.13Archetypes are agreed-
upon specifications that support rigorous comput-
able definitions of clinical concepts. For example,
the archetype for Blood Pressure measurement
includes type of measure (eg, diastolic, systolic,
activity level, position), body site where measured,
time of day when measured, and measurement
1Department of Pediatrics,
University of South Florida
College of Medicine, Tampa,
2Yale University, New Haven,
Dr Rachel L Richesson,
University of South Florida
College of Medicine,
Department of Pediatrics, 3650
Spectrum Boulevard, Suite 100,
Tampa, FL 33612, USA;
Received 5 November 2008
Accepted 8 February 2011
J Am Med Inform Assoc 2011;18:341e346. doi:10.1136/amiajnl-2011-000107 341
T Patrick of University of Wisconsin-Milwaukee for his stimulating comments on
Funding Funding and/or programmatic support for this project was provided by Grant
Numbers RR019259-01 and RR019259-02 from the National Center for Research
Resources and National Institute of Neurological Disorders and Stroke, respectively,
both National Institutes of Health components, and the National Institutes of Health
Office of Rare Diseases Research.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Iso-Iec. Information TechnologydMetadata Registries (MDR), in Part 1: Framework.
Geneva: International Organization for Standardization/ International Electrotechnical
Commission, 2004. http://metadata-stds.org/11179/#A1.
Souza T, Kush R, Evans JP. Global clinical data interchange standards are here! Drug
Discov Today 2007;12:174e81.
Levin R. Data Standards for Regulated Clinical Trials: FDA Perspective. 2005. http://
www.cdisc.org/pdf/2004_06_14_cdisc.pdf (accessed 21 Jul 2006).
Welker JA. Implementation of electronic data capture systems: barriers and
solutions. Contemp Clin Trials 2007;28:329e36.
ClinfoSource I. E-Training for Clinical Trials. On-line Training Session. 2008. https://
www.clinfosource.com/documents/brochureII.pdf (accessed 14 Mar 2011).
Pocock SJ. Clinical Trials: A Practical Approach. New York: Wiley and Sons,
Gore SM. Assessing clinical trialsdrecord sheets. BMJ (Clin Res Ed)
Crewson Applegate KE PE. Fundamentals of clinical research for radiologists. Am
J Reontgenol 2001;177:755e61.
Lu Z. Technical challenges in designing post-marketing eCRFs to address clinical
safety and pharmacovigilance needs. Contemp Clin Trials 2010;31:108e18.
CDISC. Clinical Data Acquisition Standards Harmonization: Basic Data Collection
Fields for Case Report Forms. Draft version 1.0. http://www.cdisc.org/cdash
(accessed 1 Sep 2010).
Leslie H. International developments in openEHR archetypes and templates. HIM J
Kalra D, Beale T, Heard S. The openEHR foundation. Stud Health Technol Inform
Browne E. Archetypes for HL7 CDA Documents. 2008. http://www.openehr.org/wiki/
download/attachments/3440870/Archetypes_in_CDA_4.pdf (accessed 14 Mar 2011).
McNamara RL, Brass LM, Drozda JP, et al. ACC/AHA key data elements and
definitions for measuring the clinical management and outcomes of patients with
atrial fibrillation: A report of the American College of Cardiology/American Heart
Association task force on clinical data standards (Writing Commitee to Develop Data
Standards on Atrial Fibrillation). J Am Coll Cardiol 2004;44:475e95.
Buxton AE, Calkins H, Callans DJ, et al. ACC/AHA/HRS 2006 key data elements and
definitions for electrophysiological studies and procedures: a report of the American
College of Cardiology/American Heart Association Task Force on Clinical Data
Standards (ACC/AHA/HRS Writing Committee to Develop Data Standards on
Electrophysiology). Circulation 2006;114:2534e70.
Ohmann C, Kuchinke W. Future developments of medical informatics from the
viewpoint of networked clinical research. Interoperability and integration. Methods Inf
NCI. caBIG. Cancer Biomedical Informatics Grid. Data Standards. 2006 01-04-2008.
25 May 2006).
Nahm M, McCourt B, Walden A, et al. Cardiovascular and Tuberculosis Data
Standards, Release 1.0, Package 1. 2008. http://www.cdisc.org/standards/cardio/
index.html (accessed 23 Nov 2010).
Stone K. NINDS common data element project: a long-awaited breakthrough in
streamlining trials. Ann Neurol 68:A11e13.
NINDS. NINDS Common Data Elements. Harmonizing Information. Streamlining
Research. Project Overview. 2010 Aug 19 http://www.commondataelements.ninds.
nih.gov/ProjReview.aspx (accessed 20 Aug 2010).
Stover PJ, Harlan WR, Hammond JA, et al. PhenX: a toolkit for interdisciplinary
genetics research. Curr Opin Lipidol 2010;21:136e40.
Richesson RL, Mon D, Kallem C, et al. A Strategy for Defining Common Data
Elements to Support Clinical Care and Secondary Use in Clinical Research, in 2010
AMIA Clinical Research Informatics Summit. San Francisco, 2010. http://
crisummit2010.amia.org/files/symposium2008/100_richesson.pdf. (accessed 14 Mar
HL7. Diabe-DS Project Wikid‘EHR Diabetes Data Strategy.’ 2010 http://wiki.hl7.org/
index.php?title¼EHR_Diabetes_Data_Strategy (accessed 23 Nov 2010).
ANSI. Healthcare Information Technology Standards Panel (HITSP). Enabling
Healthcare Interoperability. 2010. http://www.hitsp.org/ (accessed 23 Nov 2010).
Kuperman GJ, Blair JS, Franck RA, et al. Developing data content specifications for
the nationwide health information network trial implementations. J Am Med Inform
Nadkarni PM, Brandt CA. The common data elements for cancer research: remarks
on functions and structure. Methods Inf Med 2006;45:594e601.
Warzel DB, Andonaydis C, McCurry B, et al. Common data element (CDE)
management and deployment in clinical trials. AMIA Annu Symp Proc 2003:1048.
Covitz PA, Hartel F, Schaefer C, et al. caCORE: a common infrastructure for cancer
informatics. Bioinformatics 2003;19:2404e12.
AHRQ. The United States Health Information Knowledgebase (USHIK). 2010. http://
ushik.ahrq.gov/index.html?Referer¼Index (accessed 23 Nov 2010).
Richesson RL, Krischer JP. Data standards in clinical research: gaps, overlaps,
challenges and future directions. J Am Med Inform Assoc 2007;14:687e96.
Fridsma DB, Evans J, Hastak S, et al. The BRIDG project: a technical report. J Am
Med Inform Assoc 2008;15:130e7.
Phillips L. The Double Metaphone Search Algorithm, in C/C++ Users Journal.
2000. http://portal.acm.org/citation.cfm?id=349132 (accessed 18 Mar 2011).
Blaha M. Data store models are different from data interchange models. Electr
Notes Theor Comp Sci 2004;94:51e8.
Hamilton M. Rating depressive patients. J Clin Psychiatry 1980;41
(12 Pt 2):21e4.
Williams JBW, Link MJ, Rosenthal NE, et al. Structured Interview Guide for the
Hamilton Depression Rating Scale, Seasonal Affective Disorders Version (SIGHSAD).
New York: New York Psychiatric Institute, 1988.
EMA. ICH Topic E 6 (R1) Guideline for Good Clinical Practice; CPMP/ICH/135/95. 59.
2002. http://www.emea.europa.eu/pdfs/human/ich/013595en.pdf (accessed 14 Mar
Aday LA. Designing and Conducting Health Surveys. 2nd edn. San Francisco, CA:
Poksinska B, Kahlgaard JJ, Antoni M. The state of ISO 9000 certification: A study
of Swedish organisations. TQM Mag 14, 2002. doi:10.1108/09544780210439734.
Tsim YC, Yeung VWS, Leung ETC. An adaptation to ISO 9001:2000 for certified
organisations. Managerial Auditing Journal 17, http://dx.doi.org/10.1108/
02686900210429669. 2002. http://www.emeraldinsight.com/journals.htm?
Donabedian A. Quality assurance. Structure, process and outcome. Nurs Stand
1992;7(11 Suppl QA):4e5.
Donabedian A. Explorations in Quality Assessment and Monitoring: Vol. 1. The
Definition of Quality and Approaches to its Assessment. Ann Arbor, MI: Health
Administration Press, 1980.
Donabedian A. Evaluating the quality of medical care. Milbank Mem Fund Q
FDA. Guidance for Industry. Qualification Process for Drug Development Tools. DRAFT
GUIDANCE.F.a.D.A.C.f.D.E.a.R. (CDER). 2010. http://www.fda.gov/downloads/Drugs/
14 Mar 2011).
Stausberg J, Lo ¨be M, Verplancke P, et al. Foundations of a metadata
repository for databases of registers and trials. Stud Health Technol Inform
Brandt CA, Cohen DB, Shifman MA, et al. Approaches and informatics tools to
assist in the integration of similar clinical research questionnaires. Methods Inf Med
Richesson R, Shereff D, Andrews J. [RD] PRISM Library: patient registry
item specifications and metadata for rare diseases. J Libr Metadata
Chute CG. Medical concept representation. In: Chen H, Fuller SS, Friedman C, et al,
eds. Medical Informatics. Knowledge Management and Data Mining in Biomedicine.
New York: Springer, 2005:163e82.
Mead CN. Data interchange standards in healthcare IT: computable semantic
interoperability: now possible but still difficult, do we really need a better mousetrap?
J Healthcare Inf Manag 2006;20:71e8.
Rector AL. The Interface Between Information, Terminology, and Inference Models.
In: Tenth World Conference on Medical and Health Informatics: MedInfo-2001.
IHTSDO. SNOMED CT Style Guide: Situations with Explicit Context? International
Healthcare Terminology Standards Development Organization, Copenhagen, 2008.
Raloff LS. The CES-D scale: a self-report depression scale for research in the general
population. Appl Psychol Measure 1977;1:385e401.
Bakken S, Cimino JJ, Haskell R, et al. Evaluation of the clinical LOINC (Logical
Observation Identifiers, Names, and Codes) semantic structure as a terminology
model for standardized assessment measures. J Am Med Inform Assoc
346J Am Med Inform Assoc 2011;18:341e346. doi:10.1136/amiajnl-2011-000107