The Practice of Informatics
Application of Information Technology?
Essie: A Concept-based Search Engine for
Structured Biomedical Text
NICHOLAS C. IDE, MS, RUSSELL F. LOANE, PHD, DINA DEMNER-FUSHMAN, MD, PHD
A b s t r a c t
serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and
concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that
query terms are often conceptually related to terms in a document, without actually occurring in the document
text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text
REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC
Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC
Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching,
and concept based query expansion is a useful approach for information retrieval in the biomedical domain.
? J Am Med Inform Assoc. 2007;14:253–263. DOI 10.1197/jamia.M2233.
This article describes the algorithms implemented in the Essie search engine that is currently
A rapidly increasing amount of biomedical information in
electronic form is readily available to researchers, health
care providers, and consumers. However, readily available
does not mean conveniently accessible. The large volume of
literature makes finding specific information ever more
difficult. Development of effective search strategies is time
consuming,1requires experienced and educated searchers,2
well versed in biomedical terminology,3and is beyond the
capability of most consumers.4
Essie, a search engine developed and used at the National
Library of Medicine, incorporates a number of strategies
aimed at alleviating the need for sophisticated user queries.
These strategies include a fine-grained tokenization algo-
rithm that preserves punctuation, concept searching utiliz-
ing synonymy, and phrase searching based on the user’s
This article describes related background work, the Essie
search system, and the evaluation of that system. The
Essie search system is described in detail, including its
indexing strategy, query interpretation and expansion,
and ranking of search results.
The Essie search engine was originally developed in
2000 at the National Library of Medicine to support
ClinicalTrials.gov,5,6an online registry of clinical research
studies. From the beginning, Essie was designed to use
synonymy derived from the Unified Medical Language
System (UMLS)7to facilitate consumers’ access to infor-
mation about clinical trials. The UMLS Metathesaurus
contains concepts (meanings) from more than 100 medical
vocabularies. Each UMLS concept can have several
names, referred to as terms in this article.
Many consumers searching for medical information are
unlikely to be familiar with the medical terminology found
in medical documents and to use more common language in
their queries. Most of the ClinicalTrials.gov documents
about heart attacks do not contain the phrase “heart attack,”
but instead use the clinical term “myocardial infarction.”
Concept-based searching, which utilizes the UMLS-derived
synonymy, has the potential to bridge this terminology gap.
One of the first retrieval systems that implemented auto-
matic concept-based indexing and extraction of the UMLS
concepts from users’ requests was SAPHIRE.8SAPHIRE
utilized the UMLS Metathesaurus by breaking free text
into words and mapping them into UMLS terms and
concepts. Documents were indexed with concepts, queries
were mapped to concepts, and standard term frequency
and inverse document frequency weighting was applied.
When measured with combined recall and precision,
This article is written by an employee of the US Government and is
in the public domain. This article may be republished and distrib-
uted without penalty.
The views expressed in this paper do not necessarily represent those
of any U.S. government agency, but rather reflect the opinions of the
Affiliations of the authors: Lister Hill National Center for Biomed-
ical Communications, National Library of Medicine, Bethesda, MD,
and Thoughtful Solutions, Inc., McLean, VA.
The authors thank Dr. Alexa McCray, Dr. Deborah Zarin, and the
research community at the Lister Hill Center for their support and
Correspondence and reprints: Nicholas C. Ide, Lister Hill National
Center for Biomedical Communications, National Library of Medi-
cine, 8600 Rockville Pike, Bethesda, MD 20894; e-mail: ?ide@nlm
Received for review: 7/31/2006; accepted for publication: 1/26/
Journal of the American Medical Informatics Association Volume 14Number 3 May / June 2007
SAPHIRE searches performed as well as physicians using
Medline, but not as well as experienced librarians.9The
core functionality of mapping free text to UMLS concepts
is generally useful and is freely available in the NLM
Other experiments with synonymy have produced mixed
results. Voorhees11found a 13.6% decrease in average
precision comparing effectiveness of conceptual indexing
with baseline indexing of single words for 30 queries and
1,033 medical documents. Srinivasan12demonstrated an
overall improvement of 16.4% in average precision, primar-
ily due to controlled vocabulary feedback (expanding que-
ries by adding controlled vocabulary terms). Aronson and
Rindflesch13achieved 14% improvement in average preci-
sion through query expansion using automatically identified
controlled vocabulary terms that were expanded using
inflectional variants (gender, tense, number, or person) from
the SPECIALIST lexicon14and synonyms encoded in the
UMLS. Finally, concept-based indexing of Medline citations
based on manual and semiautomatic indexing15is utilized in
Similar to concept-based indexing, phrase indexing is be-
lieved to be useful in improving precision.16However,
adding phrase indexing to otherwise good ranking schemes
has not been demonstrated to improve performance dramat-
ically.17Regardless of this ambiguity, phrases are necessary
for identifying UMLS concepts.18Essie searches for phrases
and maps them to UMLS concepts for synonymy expansion.
Queries can be further expanded with morphological vari-
ants (inflectional and derivational) of individual words.19
Alternatively, queries and documents can be normalized
using stemming.19In experiments with newswire text, Hull
and Grefenstette20obtained approximately 5% improve-
ment by stemming. Bilotti et al21revisited the exploration of
stemming vs. morphological expansion, and found that
morphological expansion resulted in higher recall.
Another factor affecting search is the strategy used to decide
what constitutes a unit of text.22This tokenization deter-
mines what bits of text can be found. In many systems, one
must specify whether certain characters, like hyphens, are
part of a word or not part of a word. Such decisions
frequently limit the ability of a system to deal properly with
words and phrases that contain punctuation characters. The
importance of tokenization in biomedical domains was
demonstrated repeatedly in the Text REtrieval Conference
(TREC) Genomics track evaluations. For example, much of
Essie’s success in the 2003 evaluation can be attributed to
tokenization. Further, the best average precision in the TREC
2005 evaluation was achieved by a system that broke text at
hyphens, letter–digit transitions, and lower/upper case
Equally important to retrieving relevant information is the
order of presentation of the search results. With the excep-
tion of Boolean systems, presentation order is typically
determined using a scoring function that sorts documents in
descending order of relevance. A survey of relevance rank-
ing methods can be found in Singhal24and Baeza-Yates.25
Essie adopts many of the ideas explored in earlier work.
Essie implements concept-based searching by expanding
queries with synonymy derived from UMLS concepts. Un-
like SAPHIRE, Essie includes phrase searches of the original
text and inflectional variants in addition to concepts; thus it
is less reliant on concept mapping and should be more
robust when concept mapping fails. Essie searches for
phrases from the user’s query by preserving word adjacency
as specified in the query rather than indexing terms from a
controlled vocabulary. Queries are further expanded to
include a restricted set of inflectional variants, as opposed to
many search engines that rely on stemming.25Tokenization
decisions in Essie are driven by characteristics of biomedical
language3in which punctuation is significant. Phrase-based
searching of synonymy, which equates dramatically differ-
ent phrases, has forced a new approach to document scor-
ing. Essie scoring is based primarily on where concepts are
found in the document, rather than on their frequency of
The Essie system was formally validated in the context of the
TREC Genomics track. Essie participated in the 2003 and
2006 evaluations. The 2003 ad hoc retrieval evaluation was
conducted on a document collection consisting of 525,938
Medline citations. The task was based on the definition of a
Gene Reference Into Function (GeneRIF)26: For gene X, find
all Medline references that focus on the basic biology of the
gene or its protein products from the designated organism.
Randomly selected gene names distributed across the spec-
trum of organisms served as queries (50 for training and 50
for testing of the systems). The available GeneRIFs were
used as relevance judgments.
The 2006 Genomics track collection consists of 162,259
full-text documents subdivided into 12,641,127 paragraphs.
The task for participating systems was to extract passages
providing answers and context for 28 questions formed from
four genomic topic templates.27Each question contains
terms that define: (1) biological objects (genes, proteins, gene
mutations, etc.), (2) biological processes (physiological pro-
cesses or diseases), and (3) a relationship between the objects
and the processes. Relevance judges determined the rele-
vance of passages to each question and grouped them into
aspects identified by one or more Medical Subject Headings
(MeSH) terms. Document relevance was defined by the
presence of one or more relevant aspects. The performance
of submitted runs was scored using mean average precision
(MAP) at the passage, aspect, and document level.
Essie approached the task as document retrieval, indexing
each paragraph as a document and applying the standard
Essie retrieval strategies to queries created for each question.
The goal was to use Essie “as is” in a new retrieval task to
explore the applicability of the underlying algorithms. The
results of the evaluation are presented in the Validation
A detailed description of the Essie algorithms for tokeniza-
tion, morphological variation, concept expansion, and doc-
ument scoring follows.
The Essie search system consists of two distinct phases:
indexing and searching. The indexing phase identifies and
records the position of every token occurrence in the corpus.
The searching phase uses query expansions to produce a set
IDE et al., Essie: A Concept-based Search Engine
30. Demner-Fushman D, Hauser S, Thoma G. The Role of Title,
Metadata and Abstract in Identifying Clinically Relevant Jour-
nal Articles. AMIA Annu Symp Proc. 2005:191–5
31. Demner-Fushman D, Humphrey SM, Ide NC, et al. Finding
relevant passages in scientific articles: Fusion of automatic
approaches vs. an interactive team effort. In: Proceedings of the
Fifteenth Text REtrieval Conference, 2006 Nov 14–17. Gaithers-
burg, MD: National Institute of Standards and Technology
32. Genetics Home Reference. 2006. Available at: http://ghr.nlm
.nih.gov. Accessed July 26, 2006.
33. NLM Gateway. 2006. Available at: http://gateway.nlm.nih.gov.
Accessed July 26, 2006.
34. Medline Database on Tap. 2006. Available at: http://mdot.nlm.
nih.gov/proj/mdot/mdot.php. Accessed July 26, 2006.
35. Hauser SE, Demner-Fushman D, Ford GM, Thoma G. Prelimi-
nary comparison of three search engines for point of care access
to Medline citations. AMIA Annu Symp Proc. 2006;945.
36. Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for
meeting clinical information needs. Bull Med Libr Assoc. 1993;
37. Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair
JC. Developing optimal search strategies for detecting clinically
sound studies in Medline. J Am Med Inform Assoc. 1994;1:447–
38. Pratt W, Hearst MA, Fagan LM. A knowledge-based approach
to organizing retrieved documents. In: AAAI ‘99: Proceedings of
the 16th National Conference on Artificial Intelligence. Orlando,
Florida: AAAI Press (American Association for Artificial Intel-
39. Demner-Fushman D, Lin J. Knowledge extraction for clinical
question answering: Preliminary results. In: Proceedings of the
AAAI-05 Workshop on Question Answering in Restricted Do-
mains, 2005 Jul 9–13. Pittsburgh, PA: AAAI Press (American
Association for Artificial Intelligence), 2005, pp 1–10.
40. Harman D. Relevance feedback revisited. In: Belkin NJ, Ingw-
ersen P, Pejtersen AM (eds). SIGIR 1992: Proceedings of 15th
Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, 1992 June 21–24. Copen-
hagen, Denmark: ACM Press, 1992, pp 1–10.
41. Ruthven I, Lalmas M. A survey on the use of relevance feedback
for information access systems. Knowledge En Rev. 2003;18:95–
Journal of the American Medical Informatics AssociationVolume 14Number 3 May / June 2007