Content uploaded by Senthil K Nachimuthu
Author content
All content in this area was uploaded by Senthil K Nachimuthu
Content may be subject to copyright.
Practical Issues in Using SNOMED CT as a Reference Terminology
Senthil K. Nachimuthu, MD, Lee Min Lau, MD, PhD
3M Health Information Systems, Salt Lake City, Utah, USA
Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA
Abstract
SNOMED CT® was created by the merger of SNOMED RT
(Reference Terminology) and Read Codes Version 3 (also
known as Clinical Terms Version 3). SNOMED CT is consid-
ered to be among the most extensive and comprehensive bio-
medical vocabularies available today. It is considered for use
as the Reference Terminology of various institutions. We re-
view the adequacy of SNOMED CT as a Reference Terminol-
ogy and discuss the issues in its use as such. We discuss issues
with content coverage of various clinical domains, data integ-
rity and validity, and the update frequency of SNOMED CT,
and why SNOMED CT alone is not adequate to serve as the
Reference Terminology of a healthcare organization.
Keywords:
SNOMED CT, reference terminology, implementation issues.
Introduction
SNOMED CT® (Systemized Nomenclature of Medicine –
Clinical Terms) is one of the most extensive and comprehen-
sive biomedical vocabularies currently available, with 370,702
concepts (US Edition, core distribution, July 2006). It is rec-
ommended for use as the provider of data dictionary for sever-
al electronic health information systems. The SNOMED CT
Technical Reference Guide January 2006 release mentioned
that “SNOMED CT is the most comprehensive clinical refer-
ence terminology available in the world.”[1] The July 2006
release of this document redefines it as “SNOMED CT is a
comprehensive clinical terminology that provides clinical con-
tent and expressivity for clinical documentation and report-
ing.”[2]
We examine the issues concerning use of SNOMED CT as the
exclusive reference terminology for an electronic clinical in-
formation system. These issues include the coverage of con-
cepts in various clinical domains included in the terminology,
data consistency and the frequency of update. We examine
these issues, and provide comparisons of several clinical do-
mains of content present in SNOMED CT and another refer-
ence terminology. We use a set of objective criteria to evaluate
SNOMED CT, and these criteria and methods may be used to
evaluate any clinical vocabulary for its use in a clinical infor-
mation system.
These three analyses have different objectives and assess dif-
ferent aspects of SNOMED CT. Hence, the materials and
methods, results and discussion are presented individually for
these three analyses for the sake of coherence.
1. Coverage of Content
Materials and Methods
All the active concepts in SNOMED CT (US Edition, July
2006 release) were included in the first experiment. A refer-
ence terminology created by 3M Health Information Systems,
namely the 3M Healthcare Data Dictionary (3M HDD) was
used to aid these comparisons.
The 3M HDD is a concept-based reference terminology that
serves as the interlingua for multiple biomedical vocabularies
which are mapped to it. The content included in the HDD is
derived by importing various standardized terminologies such
as SNOMED CT, LOINC, ICD-9-CM, CPT, HCPCS, First
Databank NDDF, etc. The HDD also includes various legacy
(homegrown) vocabularies such as the US Department of De-
fense lab, pharmacy and other clinical codesets and those from
various other community and university hospitals.
Two subsets of the HDD were created for these comparisons –
one including content provided by SNOMED CT alone, and
another including content from the rest of the HDD excluding
SNOMED CT. These two HDD subsets will here onwards be
referred to as “SNOMED CT ONLY dataset” and “NON
SNOMED CT dataset”. The reason that the rest of the HDD
was used for comparison is to provide a real world equivalent
of concepts present in reference terminologies of very large
enterprises. Similar results may be expected by comparing the
content coverage of SNOMED CT with that of the enterprise
reference terminology of any large multi-facility multi-
specialty hospital organization, or to a standard reference ter-
minology such as the UMLS.
SNOMED CT includes various domains of clinical concepts.
The content coverage of SNOMED CT is organized into 19
top-level hierarchies shown in Table 1. Various subhierarchies
of concepts are organized under these nineteen top level hier-
archies. The fully specified name of each SNOMED CT con-
cept includes a suffix which roughly corresponds to the clini-
cal domain that concept belongs to.
The suffix helps to disambiguate between two concepts that
might belong in different top level hierarchies, but have the
same textual name. For example, the suffix can help to differ-
entiate between a disease and a finding at a glance without
having to query its location in the SNOMED CT hierarchy.
Two such examples are Endometriosis (disorder) vs. Endome-
triosis (body structure) and Aspirin (substance) vs. Aspirin
(product).
Table 1. SNOMED CT top-level hierarchies
Body structure
Procedure
Clinical finding
Qualifier value
Environment or geographical
location
Situation with explicit
context
Event
Record artifact
Linkage concept
Social context
Observable entity
Special concept
Organism
Specimen
Substance
Staging and scales
Physical force
Pharmaceutical / biolog-
ic product
Physical object
Table 2 provides a partial listing of suffixes. Each top-level
hierarchy may contain concepts with one or more suffixes.
Table 2. SNOMED CT Concept Suffixes (partial list)
body structure
organism
disorder
specimen
procedure
substance
event
occupation
finding
product
We chose some of the clinical domains denoted by the top-
level hierarchies or concept suffixes listed in tables 1 and 2 to
compare the extent of coverage. We compared the concepts
present in SNOMED CT ONLY dataset with the NON
SNOMED CT dataset. The following results reflect compari-
sons of several domains of content included in SNOMED CT
with NON SNOMED CT content included in the real world
clinical information systems of various hospitals, both home-
grown and vendor-developed. Examples of concepts present in
these datasets are given for some of the domains to better ex-
plain the results.
Results
a. Pharmaceutical Ingredients
The SNOMED CT Substance top-level domain contains
pharmaceutical ingredients under the ‘Drug or medicament
(substance)’ subhierarachy. This is a clearly definable clinical
domain without any ambiguity, and hence this domain was
used for comparison. It is more constrained and universal, and
it is included in the core SNOMED CT data. This is in contrast
with pharmaceutical products which vary from one country to
another. In case of the US, pharmaceutical products are in-
cluded in the SNOMED CT US Drugs Extension, which is not
available free of cost, rather than the core US Edition.
Comparison was done by querying the specific domain of con-
tent in the NON SNOMED CT dataset and the SNOMED CT
ONLY dataset. The results of comparison of various ‘levels’
of pharmaceutical ingredients are presented in Table 3. First
Databank’s National Drug Data File (NDDF) was one of the
primary pharmacy content providers for the 3M HDD.
Comparisons were made at the level of Therapeutic class (An-
tibiotics, Antipyretics, Sedatives, etc.), Drug Class (Penicil-
lins, Aminoglycosides, Calcium Channel Blockers, etc.), In-
gredients (Diclofenac, Ampicillin, etc) and Ingredient-salt
(Diclofenac Sodium, Diclofenac Potassium, Ampicillin Sodi-
um, etc) levels. At all these levels, the NON SNOMED CT
dataset had significantly better coverage for pharmaceutical
ingredients than the SNOMED CT ONLY dataset. For exam-
ple, ‘Ketanserin’ and ‘Moxaverine’ are some concepts present
in NON SNOMED CT dataset but not in SNOMED CT
ONLY dataset.
Table 3. Pharmaceutical Ingredients
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Therapeutic Class
134
22
Drug Class
873
118
Ingredient
7,265
2,647
Ingredient Salt
3,167
1,135
b. Microorganisms
Microorganisms constitute another domain of content that can
be clearly defined, and hence the coverage of this domain was
compared between SNOMED CT ONLY dataset and NON
SNOMED CT dataset. We compared coverage of the entire
microorganisms domain as well as some subdomains.
The following results in Table 4 show that the SNOMED CT
ONLY dataset had about 80% to 90% content coverage com-
pared as that of the microorganisms domain of the NON
SNOMED CT dataset. For example, ‘Heterosporium species’
is a concept present in NON SNOMED CT dataset but not in
SNOMED CT ONLY dataset.
Table 4. Microorganisms
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Microorganism
13,766
11,516
Bacterium
8,080
7,393
Fungus
1,545
1,321
Parasite
3,128
2,899
Virus
1,959
1,835
c. Problems
The clinical problems domain is among those that are not
standardized and are highly variable. This domain will under-
go much refinement over time. However, this domain is often
used for picklists or dictionaries to manage electronic problem
lists. This clinical domain is becoming increasingly important
as clinical information systems try to implement problem-
oriented documentation and maintain a longitudinal or histori-
cal record of patients’ problems. Several authors have found
that SNOMED CT has around 80% to 90% coverage of prob-
lems included in medical problem list domains of individual
enterprise reference terminologies.[3][4][5]
We compared the problems domain of SNOMED CT with that
of our reference terminology, which is the union of multiple
enterprise reference terminologies, and our results indicate
approximately 75% coverage. This domain is highly subjective
and variable; however, the results still show very significant
differences. The high variability of the domain is reflected in
the concepts. For example, the NON SNOMED CT dataset
contains the concept ‘Intercostal bulging’ which is not present
in the SNOMED CT ONLY dataset.
Table 5. Problems
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Problem
48,971
36,633
d. Specimens
Clinical specimens are an important part of medical records,
and are an essential component of most biochemistry, patholo-
gy, microbiology and immunology lab orders. The following
table shows the numbers of clinical specimens present in the
NON SNOMED CT dataset and the number of specimens in
SNOMED CT ONLY dataset. SNOMED CT ONLY dataset
contained a significantly larger number of specimens com-
pared to the NON SNOMED CT dataset, which is not surpris-
ing considering that SNOMED was created by the College of
American Pathologists. For example, the concepts ‘Liver cyst
fluid’ is present in NON SNOMED CT dataset but not in the
SNOMED CT ONLY dataset. On the contrary, the concept
‘BCG site swab’ is present in SNOMED CT ONLY dataset
but not in the NON SNOMED CT dataset.
Table 6. Specimens
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Clinical specimen
823
1,048
e. Body Structure
The body structure domain contains concepts denoting various
normal and abnormal anatomical structures at the organ, organ
system or tissue level. The results in Table 7 show that the
NON SNOMED CT dataset has significantly higher coverage
compared to the SNOMED CT ONLY dataset.
Table 7. Body Structure
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Body Structure
34,257
31,999
f. Procedures
The procedures domain includes medical and surgical proce-
dures performed for diagnostic or therapeutic procedures, ad-
ministrative procedures (admission, discharge, transfer, bill-
ing, etc.), and so on. Procedures form an important part of
several orders, clinical or administrative. The following results
in Table 8 show that the NON SNOMED CT dataset contained
significantly higher number of procedures compared to
SNOMED CT ONLY dataset.
Table 8. Procedures
DOMAIN
NON
SNOMED CT
SNOMED
CT ONLY
Procedure
59,311
53,052
Discussion
The results above show that SNOMED CT contains an exten-
sive coverage of concepts but still lags behind a reference ter-
minology which serves as the superset of several standardized
and legacy vocabularies. Part of this difference is due to the
differences in granularity. But a reference terminology will
need to provide concepts with varying granularities, and may
not expect the users to perform the composition-
decomposition themselves. Lack of concepts with differing
levels of granularity causes problems with internal data storage
as well as external data exchange. Some of the differences are
due to the absence of several concepts in SNOMED CT which
are present in other standard or local vocabularies. These re-
sults show that continuous authoring and updating of concepts
will be required to keep SNOMED CT comprehensive and up
to date for its use as a reference terminology.
Studies by other authors have also shown that SNOMED CT
has anywhere between 30% to 90% coverage for various clini-
cal domains.[6][7][8] These studies have each compared a
single domain of content between SNOMED CT and a single
enterprise reference terminology. Comparing multiple domains
of content between SNOMED CT and the superset of multiple
reference terminologies also yields similar results as shown
above by this experiment.
Furthermore, we have not discussed several domains of con-
tent which are not covered by SNOMED CT, such as billing
and reimbursement vocabularies, lab test names (provided by
LOINC, etc. This denotes that several vocabularies need to be
used together to complement each other and to build the enter-
prise reference terminology of a healthcare organization.
2. Data Consistency
Materials and Methods
The SNOMED CT ‘core’ consists of three tables – concepts,
descriptions and relationships. We used the core tables provid-
ed by SNOMED CT and the US English dialect subset (both
obtained from the US Edition, July 2006 release) to assess
data consistency in SNOMED CT. We examined the compli-
ance of the descriptions table to the constraints defined in
SNOMED CT Technical Reference guide and User guide.
Table 9. SNOMED CT Representations
Representation
Type
Fully Specified Name
Scar (morphologic abnormality)
Preferred Term
Scar
Synonym
Scar tissue
Synonym
Fibrous scar
Synonym
Cicatrix
These constraints guide the importing of SNOMED CT into
the reference terminology of an institution, use of SNOMED
CT as the exclusive reference terminology of an institution,
creating subsets, applying extensions to the core content, ver-
sioning and updating, and so on. The subset mechanism is es-
pecially important as SNOMED CT is too large to be used
directly for a clinical application. SNOMED CT Subsets are
often created for specific clinical domains or applications.
According to the SNOMED CT User Guide, each concept has
one fully specified name, one preferred term and zero or more
synonyms.[9] The fully specified name gives the definition of
a concept, and is more explanatory than its preferred term.
Preferred term is a more concise name which is used in clinical
records. Synonyms are concise forms as well, and may be used
as alternatives for the preferred term. An example is given in
Table 9 – all these representations denote a single concept. We
encountered some problems and inconsistencies while import-
ing the SNOMED CT representations (textual names or sur-
face forms of concepts) into the 3M HDD. We present the
results below.
Results
The US Edition of SNOMED CT contains both American and
British English representations for several concepts. The rep-
resentations table in the core distribution denotes both the US
(en-US) and UK (en-GB) English representations with an “un-
specified type” for several concepts and a preferred term is not
defined in these cases. The SNOMED CT US Dialect Subset
redefines one of these representations as the Preferred Term
and the rest as synonyms (often including the en-GB represen-
tation) for US English.[10] We first imported the core repre-
sentations table into a relational database, and then applied the
US Dialect Subset to it.
After applying the US dialect subset to the core descriptions
table for the July 2006 US Edition, we found that 10,852 con-
cepts had multiple preferred terms, often with different status-
es (active, retired, limited, etc). The correct preferred term
needs to be selected based on the concept status in these cases.
Not all of these 10,852 concepts’ preferred terms could be
automatically selected based on the concept status and/or de-
scription status alone. 335 concepts had multiple preferred
terms with the same status. E.g. ConceptId 103497003, “Strep-
tococcus penumoniae 3 (organism)” [sic] has two preferred
terms in a “current” status.
Several of these occurrences are in direct violations of data
integrity constraints defined in the SNOMED CT User Guide.
These caused issues in automated importing SNOMED CT
data into our reference terminology. This was overcome by
defining one of the preferred terms or synonyms as the pre-
ferred term for some concepts by one of the authors. This task
was labor intensive, and required several hours of the domain
expert’s time.
Discussion
The above results show that SNOMED CT needs stricter qual-
ity analysis before it is released to the end users. Failure of
data consistency and violation of such constraints will lead to
problems in importing and using the terminology by the target
organizations. More studies of data consistency and integrity
need to be done which involve other components of SNOMED
CT. However, the authors wish to mention that SNOMED CT
is one of the well designed terminologies, and has less data
consistency issues than many other standard vocabularies.
3. Frequency of update
Updates to SNOMED CT content are done through the regu-
larly scheduled versioning mechanism. A new version is pub-
lished once every six months, in January and July of every
year. SNOMED has a web-based request mechanism for users
and contributors to request changes. These changes include
new concepts, representations, subsets, or extension
namespaces through which third parties can extend the content
in SNOMED CT. The users can also request modifications to
existing content. However, these changes are incorporated and
published once every six months through the regular release
mechanism. Each new version of SNOMED CT has intro-
duced anywhere between 2,500 and 8,000 new concepts. Sta-
tistics of changes in various core SNOMED CT tables over the
years have been studied by other authors.[11]
Most organizations update their data dictionaries on a continu-
ous basis, and some enterprise reference terminologies are
updated as often as every day. This makes it harder to use
SNOMED CT as the exclusive reference terminology for an
organization. This can be overcome by creating local exten-
sions to SNOMED CT via the SNOMED CT extension mech-
anism. The extension may be reconciled with the subsequent
SNOMED CT version by inactivating the concepts in the ex-
tension and superseding them with equivalent concepts that are
added to SNOMED CT. This process needs specialized exper-
tise in vocabulary and ontology creation and maintenance, and
requires contribution from subject matter experts in several
content domains. An alternative is to use an enterprise refer-
ence terminology which includes SNOMED CT as one of the
vocabulary sources along with other vocabularies.
Conclusion
The above analyses show that SNOMED CT does not have the
necessary coverage to be the exclusive content provider for the
enterprise reference terminology of an organization.
SNOMED CT has one of the most extensive coverage among
biomedical vocabularies, and has a well defined ontology and
semantic relationships. These features make SNOMED CT as
an essential component of an enterprise reference terminology.
However, it needs to be complemented with other standard
vocabularies, and in most cases, ‘homegrown’ vocabularies to
build a reference terminology.
SNOMED CT also has some data inconsistency issues which
need manual intervention for implementation. We expect that
these issues will be addressed in future releases. SNOMED CT
also has a slower update schedule than would be required for
the maintenance of a production enterprise reference terminol-
ogy.
Due to these reasons, we conclude that most healthcare organ-
izations will need further work to implement or adapt
SNOMED CT for their requirements and use it in combination
with other biomedical vocabularies, and may not be able to use
it as a plug-and-play vocabulary.
Acknowledgements
The authors thank Shaun C. Shakib, MPH for his valuable help with
this study.
References
[1] SNOMED CT® Technical Reference Guide. January 2006
Release. Jan 2006: 22.
[2] SNOMED CT® Technical Reference Guide. July 2006
Release. Jul 2006: 22.
[3] Elkin PL, Brown SH, Husser CS, Bauer BA, Wahner-
Roedler D, Rosenbloom ST, Speroff T. Evaluation of the
content coverage of SNOMED CT: ability of SNOMED
clinical terms to represent clinical problem lists. Mayo Clin
Proc. 2006 Jun;81(6):741-8.
[4] Penz JF, Brown SH, Carter JS, Elkin PL, Nguyen VN,
Sims SA, Lincoln MJ. Evaluation of SNOMED coverage
of Veterans Health Administration terms. Medinfo.
2004;11(Pt 1):540-4.
[5] Wasserman H, Wang J. An applied evaluation of
SNOMED CT as a clinical vocabulary for the computer-
ized diagnosis and problem list. AMIA Annu Symp Proc.
2003;699-703.
[6] Brown SH, Elkin PL, Bauer BA, Wahner-Roedler D,
Husser CS, Temesgen Z, Hardenbrook SP, Fielstein EM,
Rosenbloom ST. SNOMED CT®: Utility for a General
Medical Evaluation Template. Proc AMIA Symp 2006:
101-105.
[7] van der Kooij J, Goossen WT, Goossen-Baremans AT, de
Jong-Fintelman M, van Beek L. Using SNOMED CT
codes for coding information in electronic health records
for stroke patients. Stud Health Technol Inform.
2006;124:815-23.
[8] Kim H, Harris MR, Savova G, Chute CG. Content cover-
age of SNOMED-CT toward the ICU nursing flowsheets
and the acuity indicators. Stud Health Technol Inform.
2006;122:722-6.
[9] SNOMED CT® User Guide. January 2006 Release. Sec-
tion 3.2; Jan 06: 11.
[10]SNOMED CT® Technical Reference Guide. July 2006
Release. Jul 06: 83.
[11] Spackman KA. Rates of change in a large clinical termi-
nology: three years experience with SNOMED Clinical
Terms. AMIA Annu Symp Proc. 2005:714-8.
Address for correspondence
Senthil K. Nachimuthu, MD. Email: snachimuthu@mmm.com