Content uploaded by Anees Ul Hassan
Author content
All content in this area was uploaded by Anees Ul Hassan on Jun 26, 2018
Content may be subject to copyright.
Recommendation statements identification in clinical
practice guidelines using heuristic patterns
Musarrat Hussain
Department of Computer Science and
Engineering
Kyung Hee University
Yongin, South Korea
musarrat.hussain@oslab.khu.ac.kr
Anees Ul Hassan
Department of Computer Science and
Engineering
Kyung Hee University
Yongin, South Korea
anees@oslab.khu.ac.kr
Jamil Hussain
Department of Computer Science and
Engineering
Kyung Hee University
Yongin, South Korea
jamil@oslab.khu.ac.kr
Sungyoung Lee*
Department of Computer Science and
Engineering
Kyung Hee University
Yongin, South Korea
sylee@oslab.khu.ac.kr
Muhammad Sadiq
Department of Computer Science and
Engineering
Kyung Hee University
Yongin, South Korea
sadiq@oslab.khu.ac.kr
Abstract—Clinical Practice Guidelines (CPGs) are considered
as an effective tool to improve and standardize healthcare. A
number of organizations are developing and maintaining clinical
guidelines to provide state of the art healthcare services.
However, the guidelines contain background information along
with disease specific information which tends to create difficulties
in using it during actual practice as well as in transforming it into
a machine interpretable format. The most relevant information
needs to be isolated from irrelevant information. In this study, we
proposed a methodology that separates relevant information
known as recommendation statements from irrelevant
information by using heuristic patterns. We have extracted 10
patterns in a semi-automatic manner from hypertension
guideline and evaluated the extracted patterns for identifying
recommendation statements in the guideline and achieved
85.54% accuracy. These extracted patterns facilitate domain
expert to get disease specific information in real time during the
clinical workflow. Moreover, it can also work as a preprocessing
step during the transformation of guideline to computer
interpretable models.
Keywords—Heuristic patterns, Recommendation Statements
Identification, Clinical Practice Guidelines, Information Extraction
(IE)
I. INTRODUCTION
The Clinical Practice Guidelines (CPGs) are defined as
“systematically developed statements to assist practitioners and
patient decisions about appropriate healthcare for specific
circumstances” [1]. The emergence and ease of access to
internet increase the influence of CPGs. CPG is an essential
medium for standardizing and disseminating medical
knowledge. The primary ambition of CPGs is to restrict
practice variations and provide evidence based treatment [2].
The services CPG can offer includes, specific decision relevant
information retrieval, patient data summarization, context
specific intimation generation, and patient relevant clinical
option selection [3]. CPG services can improve the healthcare
quality employed in clinical workflows [4]. The CPGs along
with other information consists of recommendation statements
that specify process flow based on patients’ condition and
provide details of “what to do” with the patient [5]. For
example, the statement “the evidence statements supporting the
recommendations are in the online supplement” is an
informative sentence while, “in the general population aged =
60 years, initiate pharmacologic treatment to lower blood
pressure -LRB- BP -RRB- at systolic blood pressure -LRB-
SBP -RRB- = 150 mm Hg or diastolic blood pressure -LRB-
DBP -RRB- = 90 mm Hg and treat to a goal SBP < 150 mm
Hg and goal DBP < 90 mm Hg” is a recommendation
statements that described the detail actions needed to be taken
for the target users. Both statements are taken from
hypertension guideline [6] which is used for this study.
The goal of CPGs can be accomplished by integrating it
into clinical workflows. Publishing CPGs in medical journals is
facing issues in dissemination and is ineffective for changing
clinical practice behaviors [7]. Most of the healthcare
practitioners are unaware of the existence of CPGs, and even
they experience difficulties in understanding when directed
them toward concern CPG [8]. Those who knows about CPGs
usually do not utilize it during real practices. One primary
reason may be the structure and format of the CPGs. They are
written in natural language and consist of other related
information along with disease specific information.
Identifying disease specific and scenario based information in
real time seems inconceivable. This deficiency can be
overcome by automatically identifying recommendation
statements in CPGs.
CPGs can either be used by human experts during
healthcare flow or can be transformed to machine
understandable format to be part of the recommendation
systems. Both these cases need CPG understanding and
relevant information identification. Identifying and extracting
recommendation statements from other information is a time
consuming and erroneous task. CPGs contains disease specific
information, therefore, it requires domain knowledge. The
human burden needs to be reduced by automating this process.
The complete automation of the recommendation
statements identification process faces many issues. The major
hurdle is due to the variation in linguistic expression. All
recommendation statements do not follow “if condition then
action” format. The CPGs are written by different
organizations and most of the organizations have different
guidelines writing formats. However, in all formats, the
recommendation statements consist of some hidden patterns
and specifier phrases. Those patterns need to be identified that
can be used later on for automatic identification of
recommendation statements.
The objective of this study is to identify heuristic patterns
used in recommendation statements of CPGs. These patterns
can be used later for extracting recommendation statements in
the CPGs. This kind of study provides twofold advantages. It
can be used as preprocessing step for transforming CPGs to
computer interpretable format. Also, it can facilitate healthcare
provider to find disease specific information in CPGs during
clinical workflow in real time. We used hypertension guideline
[6] for heuristic pattern identification and evaluation. We
extracted a total 10 patterns for recommendation statements’
identification which outperforms and achieved 85.54%
accuracy.
The rest of the paper is organized as follows. Section II
presents the related work of the study. Section III describes our
proposed recommendation statement identification
methodology. Section IV presents result achieved and section
V concludes the paper and presents the future direction of the
study.
II. RELATED WORK
The historical backdrop of CPGs begins in the late 1970s
by the Nation Institute of Health Consensus Development
Program with the emanation of consensus statements for
improving healthcare quality through identification and
adaption of the best practices [9]. The ease of accessibility to
CPGs attract researchers to facilitate healthcare providers by
integrating the best practice and up to date research in clinical
workflows. The CPGs can easily be incorporated in clinical
workflow by transforming it into a machine understandable
format.
R. Servan et al. [10] developed a methodology to facilitate
human modelers for CPGs formalization and reduce manual
effort by using linguistic patterns. The authors used templates
and examine the role of knowledge templates in formalization
and modularization of CPGs. The methodology used medical
domain ontology for generating linguistic templates. The
activity needs to encompass includes, extract patterns, select
core patterns, apply patterns, generate executable model, and
evaluate the executable model. This methodology produced
reusable guidelines block/template for authoring and
formalizing CPGs. However, this approach needs a customized
domain ontology for mapping the concept while generating
template. The authors used classes from Unified Medical
Language System (UMLS) with customized classes for
generalization. Generating customized ontology and
generalizing the UMLS classes is a tedious task and erroneous
class generalization may lead to the incorrect template and
incorrect guideline modeling.
K. Kaiser et al. [11] proposed a system to analyze activities
formulations in CPGs. The authors used UMLS classes to
identified patterns which employed for activities representation
and the semantic relations among them. The study consists of
four steps. In the first step, they analyze CPG regarding actions
and procedures. In the second step, they explore the
relationship between actions and procedures. In the third step,
they expand the semantic type of the identified relation for
generalization. Finally, theyK. Kaiser et al. [11] proposed a
system to analyze activities formulations in CPGs. The authors
used UMLS classes to identified patterns which employed for
activities representation and the semantic relations among
them. The study consists of four steps. In the first step, they
analyze CPG regarding actions and procedures. In the second
step, they explore the relationship between actions and
procedures. In the third step, they expand the semantic type of
the identified relation for generalization. Finally, they generate
a dictionary of the identified actions, procedures, and their
relations. They used “Induction in labour” [12] guideline which
consists of 48 actions statements among 120 statements for the
pattern and their relation extraction. The experiment was
performed on “Management of labor” guideline and achieved a
recall of 67% and precision of 97%. Despite high precision and
recall, this system has limited capability of identifying only
actions and procedures. The other dimensions of information
for instance intentions, effects or parameters get ignored and
the system has no capability to detect them.
R Wenzina et al. [5] proposed a rule-based method using a
combination of linguistic and semantic information of UMLS
semantic type. The authors hypothesized that each guideline
statement had its owns domain dependent linguistic and
semantic patterns. They also induce weighting coefficient
called relevance rate that shows statements relevancy for
modeling. The relevance rate enables the authors to identify the
condition-action combination. Relevance rate show either the
statement is crucial for clinical pathway. Ashtma guideline was
used for pattern extraction. The pattern extracted from the
guideline was consisted of 12 “if” and 4 “should” statements.
The analysis showed that rules of type “if” has a better result
than the one of type “should”.
H. Hematialam et al. [13] proposed an automatic technique
of finding and extracting recommendation statements in CPGs.
The authors used a supervised machine learning model (Naïve
Bayes, J48, and Random Forest) that classify guideline
statements into three categories: NC (no condition), CA
(condition-action), and CC (condition consequence). The
domain expert annotated three types of guidelines
(hypertension, chapter 4 of asthma, and rhinosinusitis) and the
authors used these guidelines as a training set for training
machine learning model. The authors used Part of Speech
(POS) tags as features for the model to make the model more
domain independent. Each action-condition statement has a
modifier, and the most used modifiers in the guidelines used by
authors in their study were “if”, “in”, “for”, “to”, “which”, and
“when”. The statements were parsed by using CoreNLP Shift-
Reduce Constituency parser and the candidate statements were
find by using regular expressions. The identified candidate
statements were transformed/paraphrased to “if condition, then
consequences” format to be used for rule generations. The
authors used models are one shot models and required
retraining each time if a change occurs in the training dataset.
W. Gad El-Rab et al. [14] proposed a framework for CPGs
active dissemination and automatic knowledge extraction for
Clinical Decision Support System (CDSS). The framework
automates some of the manual activities to facilitate and reduce
manual efforts of human modelers. The framework follows
multi-step approach and used Unstructured Information
Management Architecture (UIMA) for identifying medical
concepts. The task performed in the pipeline includes XML
parsing, text cleansing, medical concept tagging, medical tags
disambiguation, clinical context pattern detection, clinical
context filtering, and clinical context mapping. The framework
achieved good outcomes, however, the primary obstruct of the
framework is, it required clinical context type. Due to the lack
of pre-annotated test data set the framework was tested on only
two clinical context types.
The clinician can exploit CPGs by integrating and utilizing
it in real practices. However, existing CPGs are published in
unstructured format and machine cannot directly understand it,
which diminishes CPGs and most of the latest research resides
and use for academic purpose only. CPGs need to be
transformed to a machine interpretable format so that it can be
used in real clinical workflows as well as can be utilized by
CDSS systems for up to date recommendation generation.
III. PROPOSED METHODOLOGY
The CPGs can be transformed to machine interpretable
model in three steps: recommendation statements
identification, recommendation statements understanding, and
rule generations as depicted in the Fig. 1. Recommendation
statements identification mainly focuses on extracting the
statements representing recommendations in CPGs from the
other statements. Recommendation statements’ understanding
mainly concerns about the analysis of identified statements to
find key feature (conditions and actions). While, in rule
generation, the identify key features are transform into a
machine interpretable model (in our case if-then rules). In this
work, our main focus is on the first step i.e. recommendation
statements identification.
In the CPG conversion process, the recommendation
statements identification being the first step has a pivotal role
and all subsequent steps depend on the result of this step.
Erroneous statements identification or missing any relevant
information ultimately leads to incorrect model/rule
generations. To accurately identify these statements, we
thoroughly analyze the recommendation statements of
hypertension [6] guideline annotated by a domain expert.
This analysis leads us to the conclusion that each
recommendation statement contains some clue word(s) also
known as a heuristic pattern through which the
recommendation statements can be differentiated from non-
recommendation statements.
The functional flow of the proposed methodology is
depicted in Fig. 2. The proposed research identifies and filters
out recommendation statements using two steps process:
preprocessing and recommendation identification.
Preprocessing deals with the reading guideline and
appropriately formatting it for successive steps.
Fig. 1. Steps required for CPGs conversion
While recommendation statements identification identifies the
intended recommendation statements. The details of these two
steps are as follows.
A. Preprocessing
The early research has proven that nearly 50% to 80% time
of the entire process is spent on preprocessing which shows the
importance of the process [15]. The purpose of preprocessing
phase is to convert the original data/input in to data mining
ready structure. Therefore, preprocessing was devised in the
proposed workflow which comprises of three sub steps. First,
the document reader reads the guideline documents. Secondly,
the word documents are transformed into document format
(Dom object). Finally, the document is then split into sentences
by the Sentence Extractor. These sentences are then passed to
Recommendation Identification component for filtering
required statements.
B. Recommendation Identification
The recommendation statements identification task is
formulated into a classification task which classifies the
guideline statements into two possible categories
Recommendation Sentences (RS) and Non-Recommendation
Sentences (NRS). The aforementioned classification is
performed based on extracted heuristic patterns. The dictionary
based tagger maps the input statements with patterns stored in
heuristic pattern base.
The statements are tagged with RS if any pattern matched,
otherwise, it is tagged as NRS. The output of the dictionary
based tagger is guideline statements with the corresponding
tag. The tag Filter component then filters out the intended
statements tagged as RS and discard NRS tagged sentences.
The proposed approach can benefit in two ways. It can
conciliate and assist healthcare professionals in identifying
patient specific information in the guideline during real clinical
scenarios. Also, it can work as the preprocessing step for the
transformation of CPGs to machine interpretable format and
can also work for other knowledge extraction systems, that
extracts knowledge from CPGs and store in machine
interpretable format.
Fig. 2. Proposed system architecture
IV. RESULT AND DISCUSSION
We analyzed published hypertension guideline [6] that
contains total 278 statements including 78 recommendation
statements annotated by a domain expert. The same guideline
was used and annotated by H. Hematialam et al. [13] in their
study as discussed earlier. They have trained and evaluated
Naïve Bayes, J48, and Random Forest algorithms and achieved
accuracy of 74%, 74%, and 81% respectively. In our study, we
considered all CA (Condition-Action), CC (Condition
Consequences), and A (Action) statements as recommendation
statements. We divided the guideline into approximately 70%
(195 including 58 recommendation statements), 30% (83
including 20 recommendation statements) for training and
testing data set respectively. We analyze the training set for
identifying patterns. We identified 10 heuristics patterns from
the training set. The extracted patterns are given in table I.
TABLE I.
EXTRACTED HEURISTICS PATTERNS
No Patterns
1
.*treatment (should|with|to).*
2 .*(recommend(ed)?) treatment.*
3 .*should (include|continue).*
4 .*(increase|decrease) .*dose.*
5 .*(add|remove) (.*) drug.*
6 .*Recommendation \d+\s+:.*
7 .*(dis)?continu(e|ed|ing|ation).*
8 .*to improve.*
9 .*(patient(s)?)?with (disease).*
10 .*regardless of.*
We examine the remaining statements (83) of the guideline
with the extracted patterns and achieved accuracy of 85.54%.
We compare the proposed system with the H. Hematialam et
al. [13] because both used the same hypertension guideline.
The comparison is depicted in Fig. 3. The proposed system
achieved higher accuracy as compared with H. Hematilams’
models. The confusion matrix and detail measure of the
proposed approach is given in table II and table III
respectively.
TABLE II. C
ONFUSION MATERIX OF PROPOSED APPROACH
TP TN
RS
14 6
NRS 6 57
TABLE III.
DETAIL MEASURES OF THE PROPSED APPRAOCH
Measure Value (%) Derivations
Sensitivity
0.7000 TPR = TP / (TP + FN)
Specificity 0.9048
SPC = TN / (FP + TN)
Precisoin 0.7000
PPV = TP / (TP + FP)
Nagative Predictive
Value 0.9048 NPV = TN / (TN + FN)
Fale Positive Rate 0.0952 FPR = FP / (FP + TN)
False Discovery Rate 0.3000 FDR = FN / (FP + TP)
Fasle Negative Rate 0.3000 FNR = FN / (FN + TP)
Accuracy 0.8554 ACC = (TP + TN) / (P + N)
Fig. 3. Comparison of proposed vs existing approaches
V. CONCLUSION AND FUTURE WORK
In this paper, we developed a methodology that classifies
the guideline statements into recommendation and non-
recommendation statements using heuristic patterns. Using our
approach, we achieved a higher level of accuracy as compared
to the existing work. The proposed work of filtering
recommendation statements can reduce the burden and assist
healthcare practitioner at the time of real practice for
identifying scenario based and disease specific evidence. Also,
it can facilitate guideline based machine learning model
generation. In future, we want to generalize this work by POS
tags and UMLS semantic network for the extracted patterns to
reduce the domain dependency.
ACKNOWLEDGMENT
This research was supported by the MSIT(Ministry of
Science and ICT), Korea, under the ITRC(Information
Technology Research Center) support program(IITP-2017-0-
01629) supervised by the IITP(Institute for Information &
communications Technology Promotion). This research was
also supported by Basic Science Research Program through the
National Research Foundation of Korea(NRF) funded by the
Ministry of Science, ICT & Future Planning (2011-0030079) .
REFERENCES
[1] [1] K. N. Lohr and M. J. Field, Clinical practice guidelines: directions
for a new program, vol. 90. National Academies Press, 1990.
[2] [2] D. A. Davis and A. Taylor-Vaisey, “Translating guidelines into
practice: a systematic review of theoretic concepts, practical experience
and research evidence in the adoption of clinical practice guidelines,”
Can. Med. Assoc. J., vol. 157, no. 4, pp. 408–416, 1997.
[3] [3] J. Fox, V. Patkar, I. Chronakis, and R. Begent, “From practice
guidelines to clinical decision support: closing the loop,” J. R. Soc.
Med., vol. 102, no. 11, pp. 464–473, 2009.
[4] [4] K. Kaiser, S. Miksch, and S. W. Tu, Computer-based support for
clinical guidelines and protocols: proceedings of the symposium on
computerized guidelines and protocols (CGP 2004). 2004.
[5] [5] R. Wenzina and K. Kaiser, “Identifying condition-action sentences
using a heuristic-based information extraction method,” in Process
Support and Knowledge Representation in Health Care, Springer, 2013,
pp. 26–38.
[6] [6] P. A. James et al., “2014 evidence-based guideline for the
management of high blood pressure in adults: report from the panel
members appointed to the Eighth Joint National Committee (JNC 8),”
Jama, vol. 311, no. 5, pp. 507–520, 2014.
[7] [7] D. A. Davis, M. A. Thomson, A. D. Oxman, and R. B. Haynes,
“Evidence for the effectiveness of CME,” Jama, vol. 268, no. 9, pp.
1111–7, 1992.
[8] [8] E. Kilsdonk, L. W. Peute, R. J. Riezebos, L. C. Kremer, and M. W.
Jaspers, “From an expert-driven paper guideline to a user-centred
decision support system: A usability comparison study,” Artif. Intell.
Med., vol. 59, no. 1, pp. 5–13, 2013.
[9] [9] P. B. Jacobsen, “Clinical practice guidelines for the psychosocial
care of cancer survivors,” Cancer, vol. 115, no. S18, pp. 4419–4429,
2009.
[10] [10] R. Serban, A. ten Teije, F. van Harmelen, M. Marcos, and C. Polo-
Conde, “Extraction and use of linguistic patterns for modelling medical
guidelines,” Artif. Intell. Med., vol. 39, no. 2, pp. 137–149, 2007.
[11] [11] K. Kaiser, A. Seyfang, and S. Miksch, “Identifying actions
described in clinical practice guidelines using semantic relations,” in
KR4HC 2010-2nd International Workshop on Knowledge
Representation for Health Care, 2010, pp. 99–108.
[12] [12] National Collaborating Centre for Women’s and Children’s Health
(UK), Induction of Labour. London: RCOG Press, 2008.
[13] [13] H. Hematialam and W. Zadrozny, “Identifying Condition-Action
Statements in Medical Guidelines Using Domain-Independent Features,”
ArXiv Prepr. ArXiv170604206, 2017.
[14] [14] W. Gad El-Rab, O. R. Zaïane, and M. El-Hajj, “Formalizing
clinical practice guideline for clinical decision support systems,” Health
Informatics J., vol. 23, no. 2, pp. 146–156, 2017.
[15] [15] V. Srividhya and R. Anitha, “Evaluating preprocessing techniques
in text categorization,” Int. J. Comput. Sci. Appl., vol. 47, no. 11, pp.
49–51, 2010.