Content uploaded by Huib Ten Napel
Author content
All content in this area was uploaded by Huib Ten Napel on Jun 06, 2014
Content may be subject to copyright.
Stud Health Technol Inform. 2006;124:801-6.
ClaML: A standard for the electronic
publication of classification coding
schemes
E.J. van der HARING
a,1
, S. BROËNHORST
b
, H. ten NAPEL
c
, S. WEBER
b
,
M. SCHOPEN
b
and P.E. ZANSTRA
a
a
Medical Informatics UMCN, Nijmegen, Netherlands
b
DIMDI, Cologne, Germany
c
WHO-FIC, Bilthoven, Netherlands
Abstract. This paper proposes a number of revisions to CEN/TS 14463 (ClaML),
which is a pre-standard mark-up language for the electronic publication of
classification coding schemes. A CEN Taskforce in close collaboration with the
WHO network carefully analysed 70 classifications from the healthcare domain.
All were transformed in ClaML using a dedicated classification management tool.
The proposal removes all formatting elements and adds a number of layout
structuring elements. Several elements have been replaced by attributes to enforce
internal consistency. A modest number of extensions are proposed to help users
and authors in maintenance and version control. A pilot implementation has shown
that ICD10 as one of the most complex traditional classifications can be
adequately represented to produce quality printed output.
Keywords: Medical Informatics, Classification, ICD-10, Standard
1. Introduction
In 2002 the European Committee for Standardization (CEN/TC251) published a XML
[1] based Technical Specification (TS) to represent the content of medical classification
systems (ClaML) [2]. The goal of this TS is to simplify the electronic maintenance and
publication of classification coding schemes for the use in information systems, and to
create a common ground of exchange formats for classifications to enable users to
compare data in a convenient and standardized way.
Rossi Mori recognizes three generations of classification schemes: (1) traditional
paper-based systems (first generation); (2) compositional systems built according to a
categorical structure and a cross-thesaurus (second generation) and (3) formal models
(third generation) [3]. First generation classifications are still used widely and will be
used far into the future, if only for statistical purposes, whilst third generation
classifications have not yet reached maturity [4]. The WHO maintains and publishes a
family of first generation classifications schemes (ICD, ICF, ICHI) that is widely used
internationally and throughout all branches of medicine. The ICD is one of the oldest
medical classification schemes and very important for statistical purposes. Although
1
Corresponding Author: E.J. van der Haring, MI-UMCN, Nijmegen, Netherlands,
e.vanderharing@mi.umcn.nl
a) 2
$
b) [Geef$de$titel$van$het$document$op]$
originally arising for historical reasons, a continuing and primary characteristic of ICD
is that layout of the texts of the rubrics in the classification follows certain rules [5].
WHO has expressed the need for a standard mark-up for their classifications [6].
The format should allow the generation of selections for EDP as well as printing
publications from one single source. The WHO collaborating centres network has
assessed [7] the suitability of the TS for these purposes. Amongst others the following
problems present in the current TS were identified. It is not possible to include meta-
information about a classification in the same file. The representation of information
elements is not always consistent. The texts can not be uniquely identified. It
completely ignores layout of texts within classifications, which makes it very hard to
use the same source both for computer applications and printing hardcopy. It contains
formatting elements, contradicting with the goal of separating presentation and
representation. It lacks constructs that reduce redundancy present within a
classification. It is not possible to record the history of changes made to a
classification.
The question is if the present TS can be augmented to serve the stated WHO needs.
Due to space limitations this paper focuses on the core revisions.
2. Material and Methods
To revise the current TS, CEN convened a taskforce comprising the developers of the
TS and interested members of the terminology workgroup. The TS was analysed in
order to improve its internal consistency and completeness.
Further inputs to the analysis included 70 classifications held by different
organisations already represented to some extent in the current TS and a corpus of
experimental representations involving extensions to the current TS [7]. The taskforce
also undertook a detailed analysis of several classifications, with the main focus on the
ICD-10, in cooperation with the national WHO collaborating centres of the
Netherlands and Germany. The formatting and layout of texts within these
classifications was carefully analysed and the layout constructs were categorised.
All changes to the TS were discussed within the taskforce and the WHO
collaborating centres via email, web forums, and personal communication to reach
consensus. Every change was illustrated by examples from ICD-10 and for the main
part by experimental implementations in the Kermanog Classification Manager [8].
3. Results
3.1. Meta information
The only metadata about a classification that can be included in the current TS consists
of name, title, version and date of publication. The revision extends this with the
possibility to include any kind of meta information concerning the classification by the
introduction of the element Meta (Figure 1). Although this can not be enforced, it is
suggested to format the date of publication according to IS0 8601:2004 [9], i.e. YYYY-
MM-DD. For versioning information the well-known major.minor.patch scheme is
Stud Health Technol Inform. 2006;124:801-6.
suggested. The meta information can be easily extended and converted to the format as
specified by the Dublin Core Metadata Registry [10].
A number of standards or semi-standards exist defining unique identifiers that may
be used to uniquely identify classification coding schemes, for example Health Coding
Scheme Designator (HCD) in the CEN standard Registration of Coding Schemes [11]
and ISO Object Identifiers [12] used in HL7 Version 3. The revision introduces an
optional element CodingSchemeId that may be used to refer to any of these unique
identifiers.
A comparison of the header information in the current TS and the proposed
revision is shown in Figure 1.
<CodingScheme>
<Name>icd10</Name>
<Title>International Classification of Diseases, version 10</Title>
<Version>10.1.0</Version>
<Date>03-01-2006</Date>
......................................................................
<ClaML>
<Meta name="css" value="default.css"/>
<Meta name="xslt" value="default.xslt"/>
<Meta name="state" value="in production"/>
<CodingSchemeId authority="HL7" uid="2.16.840.1.113883.6.3"/>
<CodingSchemeId authority="CEN" uid="AA123456"/>
<Title name="icd10" date="2006-01-03" version="10.1.0">
International Classification of Diseases, version 10
</Title>
<Authors>
<Author id="who">World Health Organisation</Author>
</Authors>
Figure 1. Comparison of header information in the current TS (top) and the proposed revision (bottom).
3.2. Improving consistency and completeness
In the current TS a code in the classification may occur as the content of several
elements, for example Symbol, SuperClass, etcetera. The revision introduces the
attribute code at every place where it is possible to refer to a code in the classification
(Figure 2).
<Class kind="category">
<Symbol>A00</Symbol>
<SuperClass>A00-A09</SuperClass>
<Rubric xml:lang="en" kind="preferred">Cholera</Rubric>
</Class>
......................................................................
<Class code="A00" kind="category">
<SuperClass code="A00-A09"/>
<SubClass code="A00.0"/>
<SubClass code="A00.1"/>
<SubClass code="A00.9"/>
<Rubric id="_006-0103-1136-4093" kind="preferred">
<Label xml:lang="en">Cholera</Label>
</Rubric>
</Class>
Figure 2. Comparison of the representation of a class.
a) 4
$
b) [Geef$de$titel$van$het$document$op]$
Figure 2 also shows the introduction of the element SubClass. In the current TS
only the parent classes of a class are explicitly included, whilst subclasses of a class are
only implied. The revision proposes to make the subclass also explicit.
3.3. Uniquely identified rubrics
Most classifications contain different kinds of rubrics. Usually each class has a
preferred meaning, which may be further defined by inclusions and exclusions, or be
explained by notes, definitions, descriptions, etcetera. In the current TS the different
kinds of rubrics are introduced when they are first used. The revision adds an element
RubricKinds at the beginning of the file, where each kind of rubric is defined, and only
allows rubric kinds that are defined within the RubricKinds element.
In the current TS a rubric may be represented in different languages, indicated by
the attribute xml:lang. However, those rubrics that are mutual translations can not be
explicitly related to each other, as in the excerpt from ICD-10 in Figure 3. The
introduction of the element Label, containing the text in a specific language for the
rubric, allows the grouping of translations of the same rubric.
<Class kind="category">
<Symbol>A04</Symbol>
<SuperClass>A00-A09</SuperClass>
<Rubric xml:lang="en" kind="preferred">Other bacterial intestinal infections</Rubric>
<Rubric xml:lang="en" kind="excludes">foodborne intoxications, bacterial (A05.-)</Rubric>
<Rubric xml:lang="en" kind="excludes">tuberculous enteritis (A18.3)</Rubric>
<Rubric xml:lang="de" kind="preferred">Sonstige bakterielle Darminfektionen</Rubric>
<Rubric xml:lang="de" kind="excludes">Bakteriell bedingte Lebensmittelvergiftun..</Rubric>
<Rubric xml:lang="de" kind="excludes">Tuberkulöse Enteritis (A18.3)</Rubric>
</Class>
......................................................................
<Class code="A04" kind="category">
<SuperClass code="A00-A09"/>
<Rubric id="_006-0103-1532-0589" kind="preferred">
<Label xml:lang="en">Other bacterial intestinal infections</Label>
<Label xml:lang="de">Sonstige bakterielle Darminfektionen</Label>
</Rubric>
<Rubric id="_006-0103-1532-3700" kind="excludes">
<Label xml:lang="en">foodborne intoxications, bacterial (A05.-)</Label>
<Label xml:lang="de">Bakteriell bedingte Lebensmittelvergiftungen (A05.-)</Label>
</Rubric>
<Rubric id="_006-0103-1533-1825" kind="excludes">
<Label xml:lang="en">tuberculous enteritis (A18.3)</Label>
<Label xml:lang="de">Tuberkulöse Enteritis (A18.3)</Label>
</Rubric>
</Class>
Figure 3. Representation of rubrics in different languages in the current TS and the revision. In the revision
the translations are explicitly related and rubrics are (optionally) uniquely identified.
3.4. History
During the development and maintenance of a classification it is important to document
all changes, including when the change was made, who made the change and for what
reason. For this purpose the revision introduces an element History, which records the
author and date of a change. Again for the date it is suggested to use the format as
defined in IS0 8601:2004 [9].
Stud Health Technol Inform. 2006;124:801-6.
3.5. Layout
The revision introduces a number of elements allowing layout information within
rubrics to be specified, including paragraphs, tables, lists and fragments. The first three
are self-explanatory and are not further described. The element Fragment may be used
to represent ‘repetition’-layout that is especially seen in ICD-10 (Figure 4).
A06.8 Amoebic infection of other sites
Amoebic:
· appendicitis
· balanitis+ ( N51.2* )
......................................................................
<Class code="A06.8" kind="category">
<SuperClass code="A06"/>
<Rubric id="_006-0105-1030-1314" kind="preferred">
<Label xml:lang="en">Amoebic infection of other sites</Label>
</Rubric>
<Rubric id="_006-0105-1030-3423" kind="includes">
<Label xml:lang="en">
<Fragment type="lhead1">Amoebic</Fragment>
<Fragment type="litem">appendicitis</Fragment>
</Label>
</Rubric>
<Rubric id="_006-0105-1031-3611" kind="includes">
<Label xml:lang="en">
<Fragment type="lhead1">Amoebic</Fragment>
<Fragment type="litem" usage="etiology">balanitis</Fragment>
<Reference usage="manifestation">N51.2</Reference>
</Label>
</Rubric>
</Class>
Figure 4. Example from ICD-10 illustrating layout in ICD-10. (The fragment attribute usage is used here to
describe dagger/asterisk. This attribute is outside the scope of this paper)
4. Discussion
Hoelzer et al [15] describe an extension to the current TS. Unfortunately their
effort has not been contributed to CEN and seems to have been used for their own
experiments and applications only. Although they published a paper about their
approach, the full specification of their extension is not publicly available. A
disadvantage of their approach is that it divides the contents of the ICD-10 between
more than 1600 XML documents, greatly increasing the risk of consistency problems
and undermining the exchange of classifications, which is the main purpose of the
standard.
In this proposal several elements from the TS have been replaced by attributes in order
to make the specification more internally consistent. The revision also removes the two
formatting elements, <i> and <b> and instead introduces a number of layout structuring
elements. Subsequent formatting of these elements for display should be achieved by
using separately defined style sheets, for example Cascading Style Sheets [13] or
XSLT [14]. The layout structure of texts has been added in such a way that each text is
meaningful even when the layout elements are ignored.
a) 6
$
b) [Geef$de$titel$van$het$document$op]$
Although the subclasses of a class may be derived by examination of other classes,
the order in which those subclasses should appear can not be. This is especially a
problem with multi-hierarchical classifications in which a class has multiple
superclasses. In the revision the element Class contains the complete environment of
the class in the classification, with both immediate super- and subclasses explicitly
stated, including ordering. Further, making the complete environment of a class explicit
means the information contained within the element Class is sufficient to completely
define a class.
5. Conclusion
Relatively modest revisions in the TS have proven to drastically improve expressive
power and internal consistency of the representation. First experimental
implementations of the revised TS show that it can represent the layout of texts within
the ICD-10 and format these comparable to bookformat.
References
[1] Extensible Markup Language (XML) 1.0 (Third Edition). W3C 2004 February,
http://www.w3.org/TR/2004/REC-xml-20040204
[2] CEN/TS 14463 Health Informatics - A syntax to represent the content of medical classification systems.
Brussels: CEN; 2002.
[3] Rossi Mori A, Consorti F, Galeazzi E. Standards to support development of terminological systems for
healthcare telematics . Methods Inf Med 1998 Nov;37(4-5):551-63.
[4] Zanstra PE, van der Haring EJ, Cornet R. Introduction of a Clinical Terminology in The Netherlands -
Needs, Constraints, Opportunities. nictiz 2003,
http://www.nictiz.nl/kr_nictiz/uploaddb/downl_object.asp?atoom=2128&VolgNr=1
[5] History of the development of the ICD. WHO 2000,
http://www.who.int/entity/classifications/icd/en/HistoryOfICD.pdf
[6] WHO business plan for medical classifications 2005-2010. WHO 2005,
http://www.who.int/classifications/BuisinessPlan.pdf
[7] ten Napel H, van der Haring EJ. Dutch ICD-10 and ICF in a CEN Technical Standard Format for
version control and maintenance. WHO 2003 October, http://www.rivm.nl/who-
fic/Colognepapers/cologne35.rtf
[8] Classification Manager. Kermanog 2005, http://www.kermanog.com/clam/index.html
[9] ISO TC154. Data elements and interchange formats -- Information interchange -- Representation of
dates and times. Geneva: ISO; 2004
[10] The Dublin Core Metadata Registry. Dublin Core Metadata Initiative 2006, http://dublincore.org
[11] CEN EN 1068:2005 Health Informatics - Health care Information Interchange - Registration of Coding
Schemes. Brussels: CEN; 2005.
[12] Mealling M. A URN Namespace of Object Identifiers, RFC 3061. NN 2001 February,
http://rfc.sunsite.dk/rfc/rfc3061.html
[13] Cascading Style Sheets, level 1, W3C Recommendation. W3C 1999 January 11,
http://www.w3.org/TR/1999/REC-CSS1-19990111
[14] Extensible Stylesheet Language (XSL) Version 1.0. W3C 2001 October,
http://www.w3.org/TR/2001/REC-xsl-20011015/
[15] Hoelzer S, Schweiger R, Dudeck J. Transparent ICD and DRG coding using information technology:
linking and associating information sources with the eXtensible Markup Language. J Am Med Inform
Assoc 2003;10(5):463-9.