Conference PaperPDF Available

Problem-oriented patient record summary: An early report on a Watson application


Abstract and Figures

As the use of Electronic Medical Records (EMRs) becomes widespread, the amount of data in an EMR becomes a challenge for its comprehension. We developed problem-oriented EMR summarization to address this issue, as a part of a larger effort of adapting IBM Watson to the medical domain. The problem-orientation refers to the central role of a patient's medical problems in the summary. The summarization uses a generated problem list, relates these generated medical problems to relevant clinical data, and organizes the clinical data in a medically meaningful manner. Watson analytics are used for creating the summarization. This is a step in building the next generation EMR, one that is based not on just keeping record but instead on a conceptual understanding of medicine, thereby crossing the threshold from record storage to an intelligent entity for clinical decision making.
Content may be subject to copyright.
Problem-Oriented Patient Record Summary:
An Early Report on a Watson Application
Murthy Devarakonda, Dongyang Zhang, Ching-Huei Tsou, Mihaela Bornea
IBM Research and Watson Group
Yorktown Heights, NY
Abstract As the use of Electronic Medical Records (EMRs)
becomes widespread, the amount of data in an EMR becomes a
challenge for its comprehension. As a part of a larger effort of
adapting IBM Watson to the medical domain, we developed
problem-oriented EMR summarization to address this issue. The
problem-orientation refers to the central role of a patient's
medical problems in the summary. The summarization uses a
generated problem list, relates these generated medical problems
with relevant clinical data, and organizes the clinical data in a
medically meaningful manner. Watson analytics are used for
creating the summarization. This is a step in building the next
generation EMR, one that is based not on just keeping record but
instead on a conceptual understanding of medicine, thereby
crossing the threshold from record storage to an intelligent entity
for clinical decision making.
Keywords—Electronic Medical Records; Problem-oriented
patient record summary; Summarization; Clinical summarization;
Medical concepts; Watson; UMLS; Text analysis;
I. I
As Electronic Medical Records (EMRs) are widely
adopted in patient care, the data they store for a patient has also
grown accordingly. A typical EMR contains several hundreds
of unstructured plain text clinical notes, as well as large
amounts of semi-structured data, such as medications ordered,
lab test values, procedures, and vitals. So, the very technology
that allows recording every aspect of patient care is also
making it (quite unintentionally) difficult to comprehend it
quickly. Since manual summarization is time consuming and
prone to errors, there is a pressing need for automatic methods.
Summarization, in particular text summarization, is a well-
known problem in Artificial Intelligence. The task is one of
maximizing the information coverage while minimizing the
redundancy within a limited amount of space. Developing
accurate patient record summaries requires sophisticated
medical semantic analysis of EMR data and is a fertile ground
for applying the IBM Watson technology.
Watson effectively analyzed vast amounts of unstructured
text to answer natural language questions in defeating two all-
time winning champions on the American TV quiz show
Jeopardy! [1] [2]. Since then, we are adapting Watson to the
medical domain. The value Watson provides in EMR
summarization is in identifying key relationships among
clinical concepts with a granularity that matches clinical
decision making, e.g. inferring the purpose of specific
medications that a patient is taking for curing a disease or
palliative relief of symptoms.
Text summarization research goes back to the 1950s [3].
Today, it is generally accepted that a good summary should
include the most important information and it should be short
[4] [5]. While text summarization is researched extensively,
clinical summarization, developing a summary of a patient’s
clinical data, is at a nascent stage. The key difference is in the
nature of data from which the summary is produced. Unlike in
text summarization, a patient’s clinical data is a mix of
unstructured plain text and semi-structured data. While the
purpose of text summarization is often amorphous, clinical
summarization has one clear goal, that is, to help a physician
care for a patient, which is the goal of our summarization.
The cognitive process in manually summarizing a patient
record sheds some light on the requirements for automatic
summarization. When asked to create a summary from a
previously unseen EMR, it was reported [6] that physicians
spend significant time studying clinical notes and labs.
Diagnostic procedures and medications are the next most
reviewed items. Physicians used a strategy of identify, validate,
and ascertain status, as a way to understand patient problems.
An automated summary should efficiently provide the
information accessed in the manual process, and indeed that is
a part of our summarization.
In the seminal paper on keeping effective patient records,
Weed [7] suggested that medical records should be organized
by patient problems. He called medical records so organized as
problem-oriented medical records. Diagnosing, treating, and
managing a patient’s medical problems should be central to
keeping a patient record. Therefore, it makes sense to organize
the patient summary around patient problems.
Succinct visualization of a patient record can be
considered as a form of summarization [8]. AnamneVis [9]
framework uses the journalistic approach of Five W’s (who,
when, what, where, and why) to show a patient record. A
medical incident is shown as a connected chain of symptoms,
tests, diagnoses, and treatment. Our goal is to develop
information content for summary, but not its visualization per
se, and therefore, our summary can drive this or other similar
visualization techniques [10].
What should be the summarization model since its purpose
is to provide a clinician with a quick and easy way to grasp the
most important information about a patient? What are the
To Appear in IEEE HealthCom 2014 (16
Int’l Conf. on E-health: Networking, Applications & Services), Natal, RN Brazil.
semantic elements in this model where the Watson technology
plays an important role? This section discusses these topics.
An approach to clinical summarization involving
increasingly sophisticated abstractions of aggregation,
organization, reduction and/or transformation, interpretation,
and synthesis is proposed in [11]. Such a linear abstraction
works well for a lab or a single patient problem, but a model
for the extensive collection of data types found in a typical
EMR should include semantic relationships that exist among
various data types. For instance, a lab may be associated with a
problem in the sense that it is indicative of the problem status.
So, our model consists of multiple types of clinical data, as
well as relationships among the data. We group the elements of
the data aggregates in a clinically meaningful way. Numerical
data is interpreted and presented concisely, and detailed data is
only one or two clicks away. Details are described below.
A. Summarization Model
Since a patient record contains various collections of data
about a patient and their care, i.e. problems, medications, labs,
procedures, allergies, and so on, the natural way to achieve the
coverage and brevity as needed for summarization is to start
with aggregates of these collections, which we call clinical
data aggregates of a patient.
Elements of each of these aggregates may themselves be
summarized to some level of abstraction as conceptualized in
[11]. For example, results of a lab test may be organized,
transformed and interpreted such that the summary shows the
latest value and an indication as to whether it is now, or has
ever been, out of the normal range. By clicking on it (as
explained later) a detailed timeline can be seen with abnormal
values highlighted.
The next key part of our summarization is the clinical
relationships, which identify semantic relations between the
elements of the aggregates. For example, a problem is treated
by one or more medications. Neither the problem data
aggregate nor the medications data aggregate contains this
important semantic association. These relationships are not
directly present in an EMR, but they are the result of a
physician’s judgment. As described later, we apply the Watson
technology to identify such semantic relations.
The next element of the model is the similarity of elements
in a data aggregate. The nearness attribute identifies how
closely an element is related to the other elements of the
aggregate. For example, for the medications aggregate, the
clinically relevant feature space for determining the nearness
consists of the pharmacologic mechanisms of a medication and
the classes of pharmacologic effects on human physiology.
This is an example of how our summarization determines the
clinically meaningful grouping of aggregates.
One of the key data aggregates is the patient encounter
clinical notes, i.e. clinician written notes for patient contact
points. A clinician may be a primary care physician, specialist,
emergency medicine doctor, or a nurse. Each contact results in
a clinical note being written. Thus a clinical note and a patient
encounter are one to one. The encounters, and therefore the
clinical notes, need to be categorized by the practice for
subsequent reference, e.g. it would help answer the question,
when did the patient last see a cardiologist? While the clinical
notes are a significant part of an EMR, the practice and
specialty data is missing in the header of a clinical note,
especially when the service is provided by a physician from an
outside clinic. So, our summarization involves analytics to
identify this missing data and then use it to categorize clinical
notes (and thus encounters).
Yet another element of the summarization which we have
partially implemented is a filter that determines the data to
show and/or prioritize based on the specialty of the clinician.
For example, a cardiologist may want to see only heart related
problems, medications, labs, and so on, or may want this data
prioritized over the rest.
B. Problem-Oriented Summary
The central aggregate of this summarization is a generated
problem list, and hence we refer to this summarization as the
problem-oriented patient summary. The problem list, which is
a list of the most important medical disorders of a patient that
require care and treatment [7], is abstracted or “generated” by
our application from the clinical notes text and other data in the
patient’s EMR. This is different from (and more accurate than)
the data in the problems section of an EMR, which is typically
entered by the clinical staff (and not curated by physicians,
hence not consistently reliable). The details of our problem list
generation are beyond the scope of this paper, but we note that
the recall and precision of the generated problem list are far
higher than the entered problem list based on the ground truth
created by medical experts on a set of actual patient records.
Navigation to other clinical aggregates works best from
the problems list aggregate because all the clinical relationships
start with it. For navigational purposes, the other aggregates are
secondary to the problem list. It is expected that a physician
would start with the problem list and then explore the other
data aggregates.
The problem-oriented summarization model described so
far is shown in Figure 1. Notice the clinical data aggregates of
the summary, the centrality of the problem list, and the clinical
relationships of a problem to other clinical data. The value of
such a summarization is the ability to see the most relevant
Summarization model showing generated problems list,
the other data aggregates, and clinical
relationships among them.
patient data from a problem perspective. It is, however,
possible to consider more than one problem at a time, and in
that case, the relationships would represent the “union” of
Our patient record summarization consists of the following
data aggregates:
Generated problem list
Lab tests
Timeline of patient encounters
Social history, allergies, and demographics
Summarization automatically generates the following clinical
Relationships between the problem list entries and the
elements of the other clinical data aggregates
Clinically meaningful grouping of elements in each
data aggregate
Categorization of patient encounters based on the
physician specialty
Filtered and/or prioritized summary data based on the
specialty of the physician using the summary
C. Visualization of Patient Record Summarization
Figure 2 shows the visualization of the patient record
summarization. Each table in the view holds a data aggregate,
and it has a default presentation based on the clinical grouping,
but can also be re-ordered based on date, alphabetical, or other
aggregate specific characteristics. For example, the generated
problems list table is shown with clinical grouping, by default;
however, the table can be re-ordered to show problems by the
diagnosed date.
The patient encounters are shown in a timeline and they
are categorized by the clinician type. The Specialties category
can be expanded to see the most frequently visited specialists.
The timeline can be narrowed to focus on a shorter period of
time, rather than the entire time range.
Selecting one or more problems changes the visualization
of several data aggregates in order to highlight elements in
them that are clinically related to the problem(s). As shown in
Figure 3, when Diabetes Mellitus, Non-insulin Dependent is
selected, the related medications that the patient is taking,
Metformin and Glipizide, are highlighted and shown at the top
of the list. Similarly, related labs, procedures, and clinical
encounters are highlighted when a problem is selected. A
physician viewing this summary can therefore quickly grasp
this patient’s treatments and labs for the selected problem(s)
and quickly find relevant notes from previous encounters.
Figure 3 When a medical problem is selected, the dashboard
highlights related patient medications and brings them to the top.
D. One or Two Click Access to Raw Data
If a physician needs to access detailed clinical data about a
patient, in our summary visualization, he/she can do so rapidly
without unnecessary mouse clicks and mouse movement. For
instance, if a physician needs to see the history of a lab,
clicking on the specific lab in the labs table opens a new
window that shows the historical values of the lab (see Figure
4). Similarly, clicking on a medication in the medications table
will bring up the timeline for it.
A dashboard-style visualization of a patient record summary, showing clinical data in tables and patient contacts as a timeline.
Reviewing clinical notes from previous encounters is
sometimes necessary. Clicking on the markers in the
encounters timeline in the summary view opens a window
showing the corresponding clinical note. Relevant clinical
notes for a problem can also be accessed by clicking on the
problem. A list of relevant clinical notes appears, each with a
brief synopsis. The physician can preview the synopsis and
then click to fully open the corresponding clinical note. In the
clinical note, references to the problem are highlighted.
Figure 4 One click access to lab test results (Hemoglobin A1C) to
see data, as well as a plot with reference high and low.
The summarization described above depends on natural
language processing, information retrieval, and semantic
reasoning techniques from the Watson system. The foundation
of the analysis is the medical concepts identification in an
EMR’s clinical notes and in its metadata, which we will
describe now.
A. UMLS concepts extraction
Our analyses use Unified Medical Language System
(UMLS) [12] defined Concept Unique Identifiers (CUIs) to
reason about medical concepts in the EMR data. UMLS
concepts are now commonly used in medical text analytics, as
it facilitates reasoning in a standardized vocabulary. Published
literature often cites UMLS Metamap software [12] for
mapping plain text to UMLS concepts, however, we use the
Watson NLP and medical concept analytic which offers
significant functional refinement and runtime improvement.
Figure 5 shows a typical clinical note and how the text is
annotated to identify UMLS concepts. The natural language
processing component of Watson includes an English language
parser, a concept mapper, a negation detector, and related
technologies. As seen in the figure, we identify various UMLS
concepts (e.g. Diabetes Mellitus) and their semantic types (e.g.
Disease or Syndrome) [13].
Figure 5 Medical concepts in the EMR clinical notes are identified
as UMLS concepts in preparation for reasoning about the EMR
contents using the UMLS standardized vocabulary
In addition to the clinical notes text, we identify UMLS
concepts for the entries in the EMR semi-structured data, such
as the name of a medication. Here, there is no sentence
structure and the term represents a certain clinical entity (e.g. a
medication). Therefore, we can directly find the term’s UMLS
concepts in the corresponding semantic type. This helps to find
accurate concept identifiers for the term.
B. Relationship Scoring
As mentioned earlier, an important part of the
summarization is to establish clinically meaningful
relationships between the generated medical problems and the
elements of the other clinical data aggregates. In order to do so,
the summarization needs to quantify pair-wise clinical
association between the problems and medications, labs, and
Watson used a combination of rule-based and statistical
approaches to learn relations between entities from the broad-
domain corpora for the Jeopardy game [14]. This approach was
later extended to relations between medical concepts in
adapting Watson to the medical domain [15] and was also
enhanced using the UMLS relations between medical concepts
[12] [16]. In addition, Latent Semantic Analysis [17] applied to
the medical corpus can also provide an association score
between medical concepts. An even more accurate approach
called Distributional Relation Detection, incorporating
Distributional Semantics [18], is being developed for scoring
associations between medical concepts in Watson.
We applied two of these methods, the Latent Semantic
Analysis and the Distributional Semantics, to score relations
between problems and elements from the other clinical
aggregates (e.g. medications). We measured the accuracy of
the two methods by testing with the “ground truth” created by
medical experts for twenty de-identified medical records of
actual patients made available to us by Cleveland Clinic under
an IRB protocol for the study. The medical experts reviewed
the patient medical records and identified the relationships.
Table 1 shows the accuracy of the relations scoring algorithms
for problems and medications compared to the ground truth.
While the accuracy improvement is still in progress, the
preliminary results are encouraging for the Distributional
Semantics approach.
Table 1 The analysis accuracy that determines if a medication treats
a problem is shown for two different analysis methods we tried; The
area under the curve (1.0 is the best) is calculated from the precision-
recall curve at different threshold values for positive association.
Relationship Detection Algorithm Area Under the Precision-Recall
Latent Semantic Analysis (LSA) 0.36
Distributional Semantics 0.54
C. Relating Problems to Notes
To show the clinical notes relevant to a problem, we
identify UMLS disorders (i.e. medical concepts that belong to
the semantic type disorders in UMLS) in a clinical note and
match them with (meaning equal to or close variants of) the
concept unique identifiers of the problem. For example, for
Diabetes Mellitus from the problem list, clinical notes that
contain one or more UMLS concept identifiers matching that
of Diabetes Mellitus are identified as relevant to this problem.
D. Grouping Analysis
The clinical grouping analysis for medications starts with
an unordered list of medications from an EMR, and ends with a
clinically ordered medications list in which related medications
are together. The analysis first maps each medication to a set of
general classes from The National Drug File Reference
Terminology (NDF-RT) [19], which models each drug in terms
of various classes including its ingredients, chemical structure,
dose form, mechanism of action and pharmacokinetics. The
next step in the analysis clusters the medications based on the
similarity of their classes. The clustering is a bottom-up
hierarchical method using cosine similarity of their class
vectors. The resulting hierarchical clustering is shown for a
patient’s medications in Figure 6.
Notice that in the clinically grouped medications list, the
patient’s steroidal asthma treatments - Prednisone,
Dexamethasone, and Medrol - are close to each other, but as a
group they are distant from the patient’s antipyretics and
analgesics - Aspirin, Acetaminophen, and Motrin.
A similar grouping analysis is conducted for the medical
problems using MeSH [20] Class 1 descriptors, under diseases
and mental disorders, from UMLS to create class vectors, and
then using the same clustering method used for the
medications. The process yields a clinically meaningful
grouping of the problems list of each patient.
E. Note Type Categorization
Another analysis we used is categorizing clinical notes by
the type of the practice that created it, i.e. whether it was
created by a primary care physician, a specialist, a nurse, or by
an Emergency Department doctor. We call this note
categorization for short. The clinical note metadata
(description) in the EMR is not a reliable means of identifying
its note category. However, in presenting the timeline of a
patient’s encounters with clinicians, it is useful to correctly
categorize the encounters by practice because such a
categorized timeline allows a physician viewing the summary
to easily find the note from a particular type of previous
encounter. Once such a note is identified in the timeline, using
the one click access function described in section III.D, the
physician can quickly open the needed note.
We use a machine learning algorithm to identify the note
category. Machine learning features extracted from each note
for this purpose include UMLS medical concepts occurring in
the note text, whether there are certain informal sections (e.g.
previous medical history, assessment & plan) in the note, and
any physician specialty information in the note. We developed
the training and test data sets for about 2100 notes with the
help of medical experts - they categorized the notes by
practice. We used 1300 notes from the ground truth to train a
maximum entropy model, and used the remaining 800 to test
the model. Results as shown in Table 2 indicate reasonable
accuracy (overall F1 score of 0.782) for the model.
Table 2 Accuracy of note categorization analysis is shown here; each
note is categorized as one of the five shown types using a maximum
entropy model; the overall F1 score is reasonably high.
Note Type Precision Recall F1 Score
Primary Care 0.636 0.677 0.656
Specialties 0.804 0.830 0.817
Emergency 0.824 0.737 0.778
Nursing 1.000 0.500 0.667
Other 0.746 0.798 0.771
Total 0.782 0.782 0.782
V. F
The application and analytics described here are the
beginning of an effort to apply the Watson technologies to
analysis of a patient record. The patient record summary
Clinically related medications are grouped together in our
described here includes a generated problem list and clinical
data aggregates such as medications, lab tests, procedures, and
clinical encounter notes. The Watson analytics provide
clinically relevant relationships between problems and the
other clinical data. The analytics also provide a means to group
data aggregates semantically, and to categorize clinical notes
(and therefore, encounters). The Watson analytics are also used
for the problem list generation, but the method is not described
in this paper. The summary can be visualized in a dashboard of
clinical data aggregates and clinical note timelines. The
dashboard also shows semantic relations, grouping, and clinical
note categorization. In addition, it also provides rapid access to
actual notes, and the current and historical values of
medications and labs via a single click in the application. The
intent of the summarization is to help physicians quickly grasp
all of the important aspects of a patient record, with easy access
to details as needed.
The larger goal of this research is to apply Watson
technology to build a clinical decision support system that
works directly with a complete Electronic Medical Record of a
patient. As a near term goal, we will further improve patient
record summarization and conduct experiments to assess the
effectiveness of this record summary in patient care. Improving
patient record summarization is the process of establishing
increasingly richer clinical relationships, including disease
progression and causal associations, in a patient’s EMR. Many
of the Watson technologies, including Deep Question and
Answering, can help develop the necessary algorithms.
We thank the physicians and IT staff at Cleveland Clinic
who guided definition of the requirements for this application
and provided de-identified EMRs under an IRB protocol for
the study. We also acknowledge the groundbreaking work of
our Watson team colleagues, past and present, which made this
application possible.
D. Ferrucci, E. Brown, J. Chu-
Carroll, J. Fan, D. Gondek, A.
A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager,
N. Schlaefer and C. Welty, "Building Watson: An overview of
the DeepQA project," AI Magazine, vol. 31, no. 3, pp. 59-79,
"This Is Watson,"
IBM Journal of Research and Development,
vol. 56, no. 3.4, pp. 1:1 - 1:15, 2012.
D. Das and F. T. M. Andre, "A Survey on Automatic Text
Summarization," Carnegie Mellon University, 2007.
R. Alterman, "Understanding and Summarization,"
Intelligence Review, vol. 5, no. 4, pp. 239-254, 1991.
D. R. Radev, E. Hovy and K. McKeown, "Introduction to the
special issue on text summarization,"
Linguistics, vol. 28, no. 4, December 2002.
D. Reichert, D. Kaufman, B. Bloxham, H. Chase and N.
Elhadad, "Cognitive Analysis of the Summarization of
Longitudinal Patient Records," in AMIA Annu Symp Proc
L. L. Weed, "Medical Records That Guide and Teach,"
England Journel of Medicine, pp. 652-657, March 1968.
C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller and B.
Schneiderman, "LifeLines: Using Visualization to Enhance
Navigation and Analysis of Patient Records," in
Symp Proc, 1998.
Z. Zhang, F. Ahmed, A
. Mittal, I. Ramakrishnan, R. Zhao, A.
Viccellio and K. Mueller, "AnamneVis: A Framework for the
Visualization of Patient History and Medical Diagnostics
Chains," in
Workshop on Visual Analytics in Healthcare:
Understanding the Physician Perspective, Provi
dence, RI,
T. D. Wang, C. Plaisant, A. J. Quinn, R. Stanchak and B.
Shneiderman, "Aligning temporal data by sentinel events:
discovering patterns in electronic health records," in
Proceedings of the ACM SIGCHI Conference on Human
Factors in Computing Systems (CHI '08), 2008.
J. C. Feblowitz, A. Wright, H. Singh, L. Samal and D. F.
Sittig, "Summarization of clinical information: A conceptual
model," Jounral of Biomedical Informatics, vol. 44, pp. 688-
699, 2011.
"UMLS Reference M
anual," National Library of Medicine
(US), September 2009. [Online]. Available: [Accessed 15
04 2014].
"UMLS Semantic Groups," National Library of Medicine
(US), [Online]. Available:
xt. [Accessed 15 4 2014].
C. Wang, A. A. Kalyanpur, J. Fan, B. Boguraev and D.
Gondek, "Relation Extraction and Scoring in DeepQA,"
Journal of Research and Development, 2012.
D. Ferrucci, A. Levas, S. Ba
gchi, D. Gondek and R. T.
Mueller, "Watson: Beyond Jeopardy!,"
Artificial Intelligence,
pp. 93-105, 2013.
C. Wang and J. Fan, "Medical Relation Extraction with
Manifold Models," in
The 52nd Annual Meeting of the
Association for Computational Linguistics (ACL 2014), 2014.
S. Deerwester, D. T. Susan, G. W. Furnas, T. K. Landauer and
R. Harshman, "Indexing by Latent Semantic Analysis,"
Journal of the American Society for Information Science,
41, no. 6, pp. 391-407, September 1990.
A. Gliozzo, "Beyond Jeopardy! Adapting Watson to New
Domains Using Distributional Semantics," [Online].
k_20121109_gliozzo.pdf. [Accessed 18 04 2014].
"National Drug File - Reference Terminology (NDF-
National Library of Medicine (US), [Online]. Available:
ent/NDFRT. [Accessed 15 04 2014].
"MeSH," National Library of Medicine (US), [Online].
Available: htt
[Accessed 16 04 2014].
... The reasoning chain is visualized through multi-stage flow charts enriched with examination data. Devarakonda et al. developed a visualization method based on summarisation of Electronic Medical Records (EMRs) created by Watson analytics, which relates a patient's problem to relevant clinical data [13]. Dabek et al. [14] described methods for aggregating and summarizing of electronic health records. ...
Conference Paper
Full-text available
Since pathology is supported by information tech- nology new opportunities and questions have arisen. The digital age enables analyzing histopathological data with artificial intel- ligence methods to reveal further information and correlations. In this paper existing approaches to visualization of medical decision processes are presented as well as the relevance of explainability in decision making. The first step for implementing decision-paths in systems is to retrace an experienced patholo- gist’s diagnosis finding process. Recording a route through a landscape composed of human tissue in terms of a roadbook is one possible approach to collect information on how diagnoses are found. Choosing the roadbook metaphor provides a simple schema, that holds basic directions enriched with metadata regarding landmarks on a rally - in the context of pathology such landmarks provide information on the decision finding process.
... In [16] it has been used to parse medical texts through the combination of deep linguistic learning analysis and background resources to detect and match entities and relations [16]. In [8] it has been used to build a system able to summarize the great amount of information contained into medical texts to create a new generation of Electronic Medical Records (EMR). ...
... For the training of their system, they use an active learning methodology, in which the user interactively provides the desired output. In [13], the authors highlight the power of IBM Watson in identifying key relationships among clinical concepts. They aggregate data by type, e.g. ...
Conference Paper
Clinical summarization means the collection and synthesis of a patient's significant data, undertaken in order to support health-care providers in the process of patient care. Considering that medical information comes from multiple sources, a system for the automatic generation of problem lists could prove to be very effective in terms of saving time in the analysis of large amounts of medical data. In this paper, we propose a system able to acquire and present relevant references to medical disorders from a patient's history, producing a subject-oriented summary. The implemented system relies on an NLP pipeline, for the extraction of relevant medical entities contained in narrative health records, and on several queries, necessary for the scanning of structured documents. The tool aggregates any medical problems, performed procedures, and prescribed medications, providing the healthcare practitioner with a visual summary of the patient's data.
... Electronic Health Records (EHRs 2 ) are expected to improve patient outcomes by providing the most important patient information in a single location [1]. A common way to provide the holistic information is in the form of a patient-centered problem list [2], by itself or as part of a summary [3,4]. Ideally, the problem list would include all clinically significant issues that a care provider should consider when managing a patient. ...
Full-text available
Objective An accurate, comprehensive and up-to-date problem list can help clinicians provide patient-centered care. Unfortunately, problem lists created and maintained in electronic health records by providers tend to be inaccurate, duplicative and out of date. With advances in machine learning and natural language processing, it is possible to automatically generate a problem list from the data in the EHR and keep it current. In this paper, we describe an automated problem list generation method and report on insights from a pilot study of physicians’ assessment of the generated problem lists compared to existing providers-curated problem lists in an institution’s EHR system. Materials and methods The natural language processing and machine learning-based Watson¹ method models clinical thinking in identifying a patient’s problem list using clinical notes and structured data. This pilot study assessed the Watson method and included 15 randomly selected, de-identified patient records from a large healthcare system that were each planned to be reviewed by at least two internal medicine physicians. The physicians created their own problem lists, and then evaluated the overall usefulness of their own problem lists (P), Watson generated problem lists (W), and the existing EHR problem lists (E) on a 10-point scale. The primary outcome was pairwise comparisons of P, W, and E. Results Six out of the 10 invited physicians completed 27 assessments of P, W, and E, and in process evaluated 732 Watson generated problems and 444 problems in the EHR system. As expected, physicians rated their own lists, P, highest. However, W was rated higher than E. Among 89% of assessments, Watson identified at least one important problem that physicians missed. Conclusion Cognitive computing systems like this Watson system hold the potential for accurate, problem-list-centered summarization of patient records, potentially leading to increased efficiency, better clinical decision support, and improved quality of patient care.
... With vast amounts of data being recorded in a patient record, manual retrieval of relevant information for a specific clinical task is challenging, often causing cognitive overload and inefficiency for physicians [1]. Various systems have been developed to support physician decision making by automatically generating clinical summaries [1][2] [3], problem lists [4], and treatment performance measures [5] based on EHR data. An important insight that can be automatically extracted from EHRs and provided to a physician is the status of active problems. ...
Conference Paper
Full-text available
This paper presents a natural language processing (NLP) based cognitive decision support system that automatically identifies the status of a disease from the clinical notes of a patient record. The system relies on IBM Watson Patient Record NLP analytics and supervised or semi-supervised learning techniques. It uses unstructured text in clinical notes, data from the structured part of a patient record, and disease control targets from the clinical guidelines. We evaluated the system using de-identified patient records of 414 hypertensive patients from a multi-specialty hospital system in the U.S. The experimental results show that, using supervised learning methods, our system can achieve an average 0.86 F1-score in identifying disease status passages and average accuracy of 0.77 in classifying the status as controlled or not. To the best of our knowledge, this is the first system to automatically identify disease control status from clinical notes.
... Furthermore, it could analyze in detail every medical record of each patient by searching for similarities. This is possible developing an Electronical Medical Record (EMR) problem-oriented summary (Devarakonda et al., 2014 ) based on analytics services , which can extract and summarize only important information, useful to the medical analysis in a specific field of investigation. ...
Full-text available
Cognitive computing is the new wave of Artificial Intelligence (AI), relying on traditional techniques based on expert systems and also exploiting statistics and mathematical models. In particular, cognitive computing systems can be regarded as a “more human” artificial intelligence. In fact, they mimic human reasoning methodologies, showing special capabilities in dealing with uncertainties and in solving problems that typically entail computation consuming processes. Moreover, they can evolve, exploiting the accumulated experience to learn from the past, both from errors and from successful findings. From a theoretical point of view, cognitive computing could replace existing calculators in many fields of application but hardware requirements are still high, even if the cloud infrastructure, which is expected to uphold its rapid growth in the very next future, can support their diffusion and ease the penetration of such a novel variety of systems, fostering new services as well as changes in many settled paradigms. In this paper, we focus on benefits that this technology can bring when applied in the education field and we make a short review of relevant experiences.
Full-text available
This article describes the use of the PI ProcessBook software tool for visualization and indirect monitoring of occupancy of SHC rooms from the measured operational and technical quantities for monitoring of daily living activities for support of independent life of elderly persons. The proposed method for data processing (predicting the CO2 course using neural networks from the measured temperature indoor Ti (°C), temperature outdoor To (°C) and the relative humidity indoor rHi (%)) was implemented, verified and compared in MATLAB SW tool and IBM SPSS SW tool with IoT platform connectivity. Within the proposed method, the Stationary Wavelet Transform de noising algorithm was used to remove the noise of the resulting predicted course. In order to verify the method, two long-term experiments were performed, (specifically from February 8 to February 15, 2015, from June 8 to June 15, 2015) and two short-term experiments (from February 8, 2015 and from June 8, 2015). For the best results of the trained ANN BRM within the prediction of CO2, the correlation coefficient R for the proposed method was up to 90%. The verification of the proposed method confirmed the possibility to use the presence of persons of the monitored SHC premises for rooms ADL monitoring.
We present a new model of patient record search, called SemanticFind, which goes beyond traditional textual and medical synonym matches by locating patient data that a clinician would want to see rather than just what they ask for. The new model is implemented by making extensive use of the UMLS semantic network, distributional semantics, and NLP, to match query terms along several dimensions in a patient record with the returned matches organized accordingly. The new approach finds all clinically related concepts without the user having to ask for them. An evaluation of the accuracy of SemanticFind shows that it found twice as many relevant matches compared to those found by literal (traditional) search alone, along with very high precision and recall. These results suggest potential uses for SemanticFind in clinical practice, retrospective chart reviews, and in automated extraction of quality metrics.
Conference Paper
We present CEST, a generic method for detection and rich summarization of events occurring in a city. CEST exploits Twitter metadata, does not need prior information on events, and is event category and structure agnostic. We developed CEST to process unstructured documents and take advantage of shorthand notations, hashtags, keywords, geographical and temporal data, as well as sentiment within tweets to both detect and summarize arbitrary events without prior knowledge. We also introduce a novel strategy that analyzes sentiment and tweeting behavior over time to create a qualitative score that captures events' overall appeal to attendees.
Full-text available
The medical history or anamnesis of a patient is the factual information obtained by a physician for the medical diagnostics of a patient. This information includes current symptoms, history of present illness, previous treatments, available data, current medications, past history, family history, and others. Based on this information the physician follows through a medical diagnostics chain that includes requests for further data, diagnosis, treatment, follow-up, and eventually a report of treatment outcome. Patients often have rather complex medical histories, and visualization and visual analytics can offer large benefits for the navigation and reasoning with this information. Here we present AnamneVis, a system where the patient is represented as a radial sunburst visualization that captures all health conditions of the past and present to serve as a quick overview to the interrogating physician. The patient's body is represented as a stylized body map that can be zoomed into for further anatomical detail. On the other hand, the reasoning chain is represented as a multi-stage flow chart, composed of date, symptom, data, diagnosis, treatment, and outcome.
Full-text available
This paper presents a vision for applying the Watson technology to health care and describes the steps needed to adapt and improve performance in a new domain. Specifically, it elaborates upon a vision for an evidence-based clinical decision support system, based on the DeepQA technology, that affords exploration of a broad range of hypotheses and their associated evidence, as well as uncovers missing information that can be used in mixed-initiative dialog. It describe the research challenges, the adaptation approach, and finally reports results on the first steps we have taken toward this goal.
Full-text available
The increasing availability of online information has necessitated intensive research in the area of automatic text summarization within the Natural Lan-guage Processing (NLP) community. Over the past half a century, the prob-lem has been addressed from many different perspectives, in varying domains and using various paradigms. This survey intends to investigate some of the most relevant approaches both in the areas of single-document and multiple-document summarization, giving special emphasis to empirical methods and extractive techniques. Some promising approaches that concentrate on specific details of the summarization problem are also discussed. Special attention is devoted to automatic evaluation of summarization systems, as future research on summarization is strongly dependent on progress in this area.
Conference Paper
In this paper, we present a manifold model for medical relation extraction. Our model is built upon a medical corpus containing 80M sentences (11 gigabyte text) and designed to accurately and efficiently detect the key medical relations that can facilitate clinical decision making. Our approach integrates domain specific parsing and typing systems, and can utilize labeled as well as unlabeled examples. To provide users with more flexibility, we also take label weight into consideration. Effectiveness of our model is demonstrated both theoretically with a proof to show that the solution is a closed-form solution and experimentally with positive results in experiments.
Detecting semantic relations in text is an active problem area in natural-language processing and information retrieval. For question answering, there are many advantages of detecting relations in the question text because it allows background relational knowledge to be used to generate potential answers or find additional evidence to score supporting passages. This paper presents two approaches to broad-domain relation extraction and scoring in the DeepQA question-answering framework, i.e., one based on manual pattern specification and the other relying on statistical methods for pattern elicitation, which uses a novel transfer learning technique, i.e., relation topics. These two approaches are complementary; the rule-based approach is more precise and is used by several DeepQA components, but it requires manual effort, which allows for coverage on only a small targeted set of relations (approximately 30). Statistical approaches, on the other hand, automatically learn how to extract semantic relations from the training data and can be applied to detect a large amount of relations (approximately 7,000). Although the precision of the statistical relation detectors is not as high as that of the rule-based approach, their overall impact on the system through passage scoring is statistically significant because of their broad coverage of knowledge.
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Conference Paper
Electronic Health Records (EHRs) and other temporal databases contain hidden patterns that reveal important cause-and-effect phenomena. Finding these patterns is a challenge when using traditional query languages and tabular displays. We present an interactive visual tool that complements query formulation by providing operations to align, rank and filter the results, and to visualize estimates of the intervals of validity of the data. Display of patient histories aligned on sentinel events (such as a first heart attack) enables users to spot precursor, co-occurring, and aftereffect events. A controlled study demonstrates the benefits of providing alignment (with a 61% speed improvement for complex tasks). A qualitative study and interviews with medical professionals demonstrates that the interface can be learned quickly and seems to address their needs.
This article is an overview of the literature on narrative summarization. The capacity to summarize is a fundamental property of intelligence and has significance for several areas of artificial intelligence research and development. The first part of the paper includes a description of four critical features of a summary. The bulk of this review is concerned with sorting available summarization frameworks and techniques. A latter section of the paper describes the significance of summarization technology to three current topics in artificial intelligence: explanation-based learning, case-based reasoning, and plan evaluation.