Page 1 of 10
The Promise of the CCD: Challenges and Opportunity for Quality
Improvement and Population Health
John D. D’Amore, MS1, Dean F. Sittig, PhD1, Adam Wright, PhD2,
M. Sriram Iyengar, PhD1 Roberta B. Ness, MD, MPH3
1School of Biomedical Informatics and 3School of Public Health at the University of Texas
Health Science Center, Houston, TX; 2Division of General Medicine, Brigham and
Women’s Hospital, Boston, MA
Interoperability is a requirement of recent electronic health record (EHR) adoption incentive programs in the
United States. One approved structure for clinical data exchange is the continuity of care document (CCD). While
primarily designed to promote communication between providers during care transitions, coded data in the CCD
can be re-used to aggregate data from different EHRs. This provides an opportunity for provider networks to
measure quality and improve population health from a consolidated database. To evaluate such potential, this
research collected CCDs from 14 organizations and developed a computer program to parse and aggregate them.
In total, 139 CCDs were parsed yielding 680 data in the core content modules of problems, medications, allergies
and results. Challenges to interoperability were catalogued and potential quality metrics evaluated based on
available content. This research highlights the promise of CCDs for population health and recommends changes
for future interoperability standards.
Recent incentives and policy decisions are promoting the rapid adoption of electronic health records (EHRs) in the
United States. The American Reinvestment and Recovery Act of 2009 commits up to $27 billion in payments,
beginning in 2011, to eligible professionals and hospitals that meaningfully use EHRs1. Those reimbursements will
come in three stages and are expected to propel ambulatory and hospital EHR adoption to over 70% by 20202. The
rapid timeline for uptake, however, will lead to a heterogeneous environment of technology. With over 400 EHRs
certified for the first stage of ‘Meaningful Use,’ interoperability will remain a concern.3 As providers seek to
improve quality and population health, technology standards advanced by this federal legislation will enable new
methods for data aggregation.
In July 2010, the Department of Health and Human Services adopted the Continuity of Care Document (CCD) as an
option to meet the goals of clinical data exchange for ‘Meaningful Use’4. Using an extensible markup language
(XML) based structure, the CCD was collaboratively developed in 2006 by harmonizing standards from the
American Society for Testing and Materials and Health Level 7 (HL7)5. The CCD provides a flexible format for the
communication of free-text and codified data. Given the recent emergence of the standard, most health information
exchanges are not routinely using CCDs today, although select institutions have launched pilots to explore their
potential6, 7. The lack of widespread use means that EHR developers must rely on guidance from standards
organizations, such as the Health Information Technology Standards Panel (HITSP), Integrating the Healthcare
Enterprise (IHE) and HL7, on how to create and exchange CCDs.
HITSP released for implementation its first CCD patient summary construct, named C32, in 20078. That construct is
directly referenced in the final federal rule for Stage 1 of ‘Meaningful Use’4. The most recent C32 specification
references two other constructs developed by HITSP as well as technical frameworks previously released by IHE
and HL79, 10. Naturally, documents and specifications from different organizations and developed at different times
may lead to varying interpretations about requirements. For Stage 1 of ‘Meaningful Use,’ the National Institute for
Standards and Technology (NIST) has released the definitive testing procedures to determine whether an EHR-
generated CCD meets the standards for ‘Meaningful Use’ certification11. These procedures focus on the ability of
EHRs to generate, receive and display four categories of coded patient data with specific vocabularies: problem lists,
Page 2 of 10
diagnostic test results, medication lists and medication allergy lists. Although the CCD can encode additional
clinical content, these four sections plus patient demographic information form the foundation for what certified
EHRs will be capable of exchanging for the first two years of the federal incentive program.
New models of care integration being advanced by private insurers and federal payers require data from multiple
clinical entities to determine if patients are receiving appropriate care12. While the primary intent of clinical data
exchange is provider-to-provider communication, the structured data of the CCD has potential re-use. Specifically,
provider based care networks could create CCD extracts for all patients and consolidate this information into a
longitudinal clinical data warehouse.
This strategy overcomes four significant barriers facing providers in the United States: 1) providers avoid the costs
and competitive threats associated with joining a health information exchange13, 2) this strategy works in regions
that do not have an existing health information exchange, 3) networks do not need to consolidate independent care
providers onto a single EHR technology14, and 4) quality measurement exempts data aggregation from privacy
restrictions since protected health information may be shared among clinicians for quality improvement15.
This research examines the promise of using C32 compliant CCDs to create a normalized database for quality
improvement and population health management. The context for such data aggregation would be a care model
where providers with different certified EHRs have entered into a data use agreement to share identified medical
records. This research focuses on the clinical content modules of the CCD being tested for Stage 1 EHR certification
as part of ‘Meaningful Use’ (See Table 1 for all CCD sections and associated vocabularies). To examine the
potential for quality measurement, each of the ambulatory measures endorsed as part of Stage 1 ‘Meaningful Use’
were evaluated based on the parsed CCD clinical content modules.
Table 1. Sections and Vocabularies of the CCD and Modules Examined in this Research.
C32 Standard for Clinical Content
Directive Medicine (SNOMED CT)
Drug Sensitivity (UNII) & RxNorm
Comment Free Text
(Problems) Classification of Diseases (ICD)*
Encounters Uniform Billing (UB) Standard
Healthcare Provider National Provider Identifier
Immunizations Vaccine Value Sets
Insurance Provider X12 Billing Standard
Language Spoken Language Value Set
Person Information Free Text & HL7
Plan of Care Free Text
Required by Stage 1
EHR Testing Procedures
Systematized Nomenclature of
Uniform Ingredient Identifier
Medication Allergies Only
SNOMED CT & International
Implied, but not specified
Required for hospital
but not ambulatory
SNOMED CT, ICD and Current
Procedural Terminology (CPT)*
Logical Observation Identifiers
Names and Codes (LOINC)
Free Text & HL7
* While the HITSP preferred vocabulary for problems and procedures is SNOMED, the Final Rule for EHR certification
provides flexibility for other vocabularies as specified in the Table 1. Note: Information Source and Pregnancy status are modules
not included in the above table since they overlap with content provided in the CCD header and condition modules.
Implied, but not specified
Page 3 of 10
To assess the feasibility of a CCD-based aggregation strategy, samples needed to be collected from multiple EHR
vendor products. No large collection of sample CCDs from multiple sources has been made available to the public,
so the research team contacted vendors and healthcare organizations for EHR-generated samples of fictitious patient
data conformant to the HITSP C32 standard. A request for sample CCDs was included in the February eNews
distribution email sent from the Certification Commission for Health Information Technology to approximately
12,000 recipients. Additional contacts were made in person during HIMSS 2011 at the Interoperability Showcase
and with individuals in the exhibit hall. As an incentive to submit, organizations providing sample CCDs were
offered feedback on the parsing results of their samples. Organizations that submitted samples were also assured that
this research not would identify specific EHRs and that their CCDs would not be publicly released. These two
methods yielded the majority of participating organizations, but additional CCDs were also examined from
standards organizations and a research library of synthetic EHR data from ExactData (Rochester, NY).
To aggregate and analyze the collected CCDs, a program was written in Python 3.1 (Python Software Foundation,
Wolfeboro Falls, NH) to parse clinical content modules and patient demographic details. The Python program
utilized the document object module (DOM) library for XML parsing and used HITSP documentation for the
identification of relevant sections in the CCD. Any data provided outside the modules identified for this research
were ignored. String-based lists encoded clinical content for each of the modules. Data elements were imported
based on the tags identified in the C32 construct. Separate lists were created to store the primary content of a module
(e.g. medication code), its associated vocabulary (e.g. RxNorm) and other relevant content (e.g. dose, dosing interval
and brand name). At least one clinical content module was imported from each CCD.
The Python program included a timestamp on each of the CCDs as they were imported to report efficiency of XML
parsing. All processing was performed on a quad-core Intel Core2 computer running Windows 7 (Redmond, WA)
with 3Gb of RAM. Results included the processing time of each CCD as well as total counts of data elements for
each of the primary content modules. Generic medication codes were counted for the medication module, problem
codes for the condition module, laboratory result codes for the results module and allergy codes for the allergy
module. Null values were only counted if the section tags were present or corresponding detail was provided as
The programming included error traps and empty string notifications to catalog the challenges of CCDs imported
from various EHRs. These warnings prompted formative evaluation of approximately 30 XML files to manually
identify the causes for unsuccessful parsing. At least 1 CCD was selected for manual inspection from each
submitting organization. All inspections were conducted by a single reviewer to maintain evaluative consistency,
although no formal instrument was used given the wide range of issues encountered. In addition, the data were
examined for what potential conflicts may exist if imported into a single relational database. Since the imported
content was under 1,000 clinical content modules, those data were reviewed in Microsoft Excel 2007 (Redmond,
To determine if the collected content was adequate for objective quality assessment, each of the 44 ambulatory
quality measures included in Stage 1 ‘Meaningful Use’ was evaluated16. If coded data in patient detail and clinical
modules were sufficient to calculate the measure, then that quality metric was recorded as possible using the CCD
parser. If not, the fact that content was missing was recorded. The electronic standards for ambulatory quality
measures have been approved through the National Quality Forum and retrieved from the federal website for
Page 4 of 10
In total, 196 CCDs were collected from 14 different organizations representing at least 10 different EHRs (Table 2).
Several CCDs (n=57) were excluded from analysis since no clinical data were present for parsing or they were
redundant with other submitted CCDs. This left 139 CCDs which were successfully parsed with at least one clinical
data element. Parsing time averaged 80ms (SD 53ms) per CCD and the mean number of clinical data elements was
4.9 (SD 5.4) per CCD (Figure 1 shows timing and content distribution of CCDs). The counts of parsed data elements
by clinical module were: 109 for allergies, 220 for problems, 168 for medications and 183 for diagnostic results.
Table 2. CCD Collection by Source Figure 1. Parsing Time and Content for 139 CCDs.*
6 93 87
5 93 49
3 10 3
Total 14 196 139
*Chart whiskers show absolute minima and maxima; the blue box represents lower and upper quartile around the
median center line.
Challenges were categorized into three major themes: 1) Problematic CCD hierarchy and organization, 2)
Inconsistency in data representation and 3) Data conflict or redundancy within the CCD (Illustrative examples
included in Table 3).
1. Problematic CCD Hierarchy and Organization
One of the common issues encountered when working through the CCDs was the lack of consistent template root
identifiers for the clinical content modules. This is critical to extract codified data since identifiers reference the
technical specification applied in XML formatting. Specifically, a majority of EHR samples did not include the root
template identifiers for HITSP C83 content modules even though these identifiers are referenced in the 2009 HITSP
C32 specification and CCD samples8,10. To accommodate the missing identifiers, the Python program was modified
to include checks so that any template root identifier from a standards organization would be sufficient to parse the
content modules analyzed in this research.
Next, the use of tabs and line breaks was inconsistent between different EHRs. Some EHRs do not use line breaks or
spacing, some use tabs without consistent line breaks and others use a combination to provide formatting similar to
the examples from standard organizations. While line breaks and spacing do not affect the ability of programs
utilizing the DOM library for XML parsing, human review of the CCD to identify problematic sections and data
elements becomes significantly more difficult.
The optionality of data elements within each clinical content module presented additional difficulty in creating a
normalized database. Relevant data elements with a problem, medication, allergy or result include information on
the time of onset, current status, units of measurement, dosing interval, severity and result interpretation.
Unfortunately, the C32 construct leaves most of the associated data as optional, which can therefore be omitted in a
compliant CCD. Our parsing results display a large number of records that omit data like a date of a problem’s onset
or drug dosing interval.
Page 5 of 10
2. Inconsistency in Data Representation
Since EHR data typically undergo a translation between the source system and the normalized vocabularies for
Stage 1 ‘Meaningful Use’, there is the opportunity for mappings to no known code. Incomplete mappings were
observed in this research and have been previously identified in medication interoperability research and
harmonization discussions for the CCD6,17. An example of this occurs when a null value is set for a clinical content
code such as generic medication in RxNorm, while a translation code is populated, such as brand name in the
National Drug Classification (NDC).
In addition to missing values, non-uniformity was observed in the associated data content for each code. One
example repeatedly noted in our research was inconsistency of effective time for problems and laboratory results.
While the examples provided by HITSP are generally eight characters in YYYYMMDD format, some EHRs adapt
this format to include six more characters for time in HHMMSS format. Others also append a hyphen and four digits
for HHMM. One EHR did not comply with the character date format at all, instead inserting values such as ‘% 2m
%’, presumably meaning two months. Inconsistency was also noted in the dose quantity tag for solid oral
medications, which included alternative unit labels such as ‘tablet,’ ‘Tablet,’ ‘tab,’ ‘tab(s),’ ‘mg,’ and ‘g.’ Only the
mass terms qualify as standard units, although some flexibility is permitted in specifications to identify non-units,
like tablet or red blood cell count, when deemed important18.
Another data representation challenge identified in this research includes content relation between laboratory results.
One example is that corresponding procedures coded within the same module could not always be related to
appropriate results. While the published example CCD from NIST approaches this by collapsing a single procedure,
such as a blood draw, with all corresponding results into a single XML entry, other EHRs included all procedures
and results within the same entry. This thereby eliminates any codified relation between the procedure and result.
Another example was the use of a comprehensive code for a testing panel with multiple lab values. Since multiple
lab results are returned for comprehensive panels, such as blood urea nitrogen and transaminases within a metabolic
panel, the generic label of a lab panel was insufficient to interpret the content of each test result.
3. Data Conflict or Redundancy
The final rule for Stage 1 ‘Meaningful Use’ specifies either SNOMED-CT or ICD-9 for the codification of problems
and both vocabularies were observed in the sample CCDs collected. In addition to conflict generated from two
acceptable vocabularies for the same information, many sample CCDs utilized code systems that did not conform to
standards. Observed misapplied terminologies included NDC for generic drug code, NDC for medication allergy
and Multum Drug Allergy codes for medication allergy. In addition, a few CCDs had codes that did not correspond
to any recognized standard vocabulary.
Lastly, there was a level of redundancy of content transmitted in portions of the CCD. One example is the RxNorm
vocabulary, using codes like ‘309362’ for ‘clopidogrel 75 mg oral tablet.’ This code contains information on drug
dose and route of administration, but many CCDs included additional XML elements that also code such
information. In other cases, some CCDs omitted redundant content by skipping optional data elements altogether.