Natural Language Processing in the Electronic
Assessing Clinician Adherence to Tobacco Treatment Guidelines
Brian Hazlehurst, PhD, Dean F. Sittig, PhD, Victor J. Stevens, PhD, K. Sabina Smith, BA, CCRP,
Jack F. Hollis, PhD, Thomas M. Vogt, MD, MPH, Jonathan P. Winickoff, MD, MPH, Russ Glasgow, PhD,
Ted E. Palen, PhD, MD, Nancy A. Rigotti, MD
Comprehensively assessing care quality with electronic medical records (EMRs) is not
currently possible because much data reside in clinicians’ free-text notes.
We evaluated the accuracy of MediClass, an automated, rule-based classifier of the EMR
that incorporates natural language processing, in assessing whether clinicians: (1) asked if
the patient smoked; (2) advised them to stop; (3) assessed their readiness to quit;
(4) assisted them in quitting by providing information or medications; and (5) arranged
for appropriate follow-up care (i.e., the 5A’s of smoking-cessation care).
We analyzed 125 medical records of known smokers at each of four HMOs in 2003 and
2004. One trained abstractor at each HMO manually coded all 500 records according to
whether or not each of the 5A’s of smoking cessation care was addressed during routine
Measurements: For each patient’s record, we compared the presence or absence of each of the 5A’s as
assessed by each human coder and by MediClass. We measured the chance-corrected
agreement between the human raters and MediClass using the kappa statistic.
For “ask” and “assist,” agreement among human coders was indistinguishable from
agreement between humans and MediClass (p ?0.05). For “assess” and “advise,” the
human coders agreed more with each other than they did with MediClass (p ?0.01);
however, MediClass performance was sufficient to assess quality in these areas. The
frequency of “arrange” was too low to be analyzed.
MediClass performance appears adequate to replace human coders of the 5A’s of
smoking-cessation care, allowing for automated assessment of clinician adherence to one
of the most important, evidence-based guidelines in preventive health care.
(Am J Prev Med 2005;29(5):434–439) © 2005 American Journal of Preventive Medicine
that these systems will improve care quality through
faster and more accurate clinical data analyses.1–5A
significant portion of these electronic data, however, is
unusable by available automated analysis methods be-
nterest and support for widespread implementa-
tion and adoption of electronic medical records
(EMRs) is increasing, fueled in part by the hope
cause it is not systematically coded. These so-called
“free-text” portions of medical records often contain
critical information that would allow more comprehen-
sive assessment of evidence-based care. One recent
analysis concluded that, of the information necessary
for a comprehensive quality assessment of a health plan
with a modern EMR, a maximum of 50% could be
obtained from administrative data and the clinical
codes for lab results, procedure results, vital signs, and
signs and symptoms.6This upper-bound estimate was
calculated by considering which quality measures
could be addressed by these coding schemes. It does
not account for the known usability and process
challenges of actually achieving structured data entry
using these coding schemes that further reduce this
coverage in practice.7–9These problems are often
amplified for preventive care activities, such as coun-
seling about smoking cessation, which are based on
From the Center for Health Research, Kaiser Permanente (Hasle-
hurst, Sittig, Stevens, Smith, Hollis, Vogt), Portland, Oregon; To-
bacco Research and Treatment Center, Massachusetts General Hos-
pital, Harvard Medical School (Winickoff, Rigotti); Department of
Ambulatory Care and Prevention, Harvard Medical School (Rigotti),
Boston, Massachusetts; and Kaiser Permanente Clinical Research
Unit (Glasgow, Palen), Denver, Colorado
Address correspondence and reprint requests to: Brian L. Hazle-
hurst, PhD, Center for Health Research, Kaiser Permanente, 3800 N.
Interstate Ave., Portland OR 97227. E-mail: brian.hazlehurst@
Am J Prev Med 2005;29(5)
© 2005 American Journal of Preventive Medicine • Published by Elsevier Inc.
0749-3797/05/$–see front matter
the content of complex discussions between provider
One potential solution to this dilemma would be to
replace all narrative sections of an EMR with structured
data entry. However, clinical notes in the record play an
important role in communications both within and
between providers.10,11They have not proven to be
“replaceable” with structured data captured at the user
interface to the EMR. For complex conversations in-
volving multiple topics, the myriad of alternative
choices required to create codes is simply impractical.
An alternative solution is to develop computer systems
capable of automatically processing the free-text por-
tions of the medical record.12–14Development of natu-
ral language processing (NLP) systems has become
more feasible with increased availability of electroni-
cally recorded (but uncoded) clinical data,15as well as
recent advances in data storage capacity, increases in
computational power, and new programming tech-
niques.16–22This paper reports on the evaluation of an
NLP system, called MediClass (MC, a “medical classi-
fier”), configured to automatically assess delivery of
evidence-based smoking-cessation care, using informa-
tion in both the coded and free-text portions of the
The 5A’s of Smoking-Cessation Care
This test of the MediClass system uses smoking cessation
because tobacco use is the leading preventable cause of
death in the United States,23–25and because evidence-
based guidelines for delivering tobacco-cessation treat-
ments in primary care settings have been developed,
The recommended treatment model involves five
steps, that is, the “5A’s”: (1) ask patients about smoking
status at every visit; (2) advise all tobacco users to quit;
(3) assess a patient’s willingness to try to quit; (4) assist the
patient’s quitting efforts (provide smoking-cessation treat-
ments or referrals); and (5) arrange follow-up (provide
or arrange for supportive follow-up contacts). The 5A’s
approach has been widely endorsed by healthcare
organizations and used by regulatory organizations to
assess healthcare quality (e.g., HEDIS). It has become
the national model for tobacco treatment (Table 1).26
Although some of the 5A steps are easily coded on
entry into the EMR (e.g., identification of smoking
status or prescriptions for smoking-cessation drugs),
other steps are typically recorded in free-text progress
notes in the EMR (e.g., assessment of readiness to
change, provision of behavior change counseling).
Some of these free-text clinical notes are not suitable
for coding at the EMR user interface, yet provide the
information essential to continuity of patient care that
is critical to tobacco cessation and relapse prevention.
The MediClass system was designed and developed
by members of the research team. A complete technical
description of the system is reported elsewhere.31In
essence, MediClass maps the contents of each encoun-
ter to a controlled set of clinical concepts based on
(1) phrases detected in free-text sections, and (2) codes
detected in structured sections of the medical record.
Classifications are performed by context-sensitive rules
that select for the clinical concepts of interest for the
application. For assessing delivery of the 5A’s, these
concepts include smoking-cessation medications, dis-
cussions, referral activities, and quitting activities, as
well as smoking and readiness-to-quit assessments doc-
umented by the care provider. This knowledge was
encoded into the MediClass system; the process began
with the guideline definitions of the 5A’s.26A subgroup
of the research team (including clinicians and tobacco-
cessation experts) met over several weeks to operation-
alize these definitions by defining the concepts in-
volved and the types of phrases that provide evidence
for each concept. Finally, health plan–specific details
about smoking-cessation care were incorporated into
the definitions and the system.
This research was conducted in 2003 and 2004 in four
nonprofit HMOs: Harvard Pilgrim Health Care in Massachu-
setts, and three regions of Kaiser Permanente (Northwest,
Colorado, and Hawaii). Following Institutional Review Board
approval from each of the four health plans, the project
requested electronic copies of medical records for about 1000
known smokers at each institution. These EMRs include the
relevant data from single office visits with primary care
clinicians. The data were extracted from the offline data
warehouse at each institution and saved as structured files,
which included progress notes, patient instructions, medica-
Table 1. The “Five A’s” recommended by the current U.S.
Public Health Service Clinical Practice Guideline for
tobacco treatment and prevention
Example in free-text
section of EMR
Ask Identify tobacco user
status at every visit
Advise all tobacco
users to quit
make a quit
Aid the patient in
contact, in person
or via telephone
“Patient smokes 1 ppd”
Advise“It is important for you
to quit smoking now”
“Patient not interested
in quitting smoking”
Assist“Started patient on
“Follow-up in 2 weeks
for quit progress”
Example text segments shown could appear in clinical notes or
patient instructions generated for the encounter.
EMR, electronic medical records; ppd, pack per day.
Am J Prev Med 2005;29(5)
tions ordered, referrals ordered, reason for visit, and other
smoking-related data in the EMR. Some EMR systems in-
cluded fields for assessing smoking status, and one supported
structured entry for indicating provision of cessation advice.
However, two of the four data systems included no smoking-
specific data fields.
In preliminary work, 125 records were randomly selected
from each of the four sites, and coded by four trained chart
abstractors (see below) and by MediClass (MC). The study
assessed disagreements between MC and the trained human
raters. MediClass was improved and re-run against these
records to ensure that the revised system did not inadver-
tently introduce new misclassifications. The records used in
this preliminary work were then removed from the data pool,
leaving 875 records from each health plan in the data pool.
This preliminary work, as well as previous studies, showed
that several of the “A’s” are typically infrequent in the data.27
Therefore, the validation study used a sample composed of
both random and “enriched” portions, as follows. The en-
riched portions included records with “Assist” (a subgroup
with target size of 15 records) and “Arrange” (a subgroup with
target size of five records) from each of the four health plans.
The MediClass system was used to locate these records for
inclusion in the enriched portion of the total sample. If the
system located more records than were needed for a sub-
group, then the final records were randomly selected to
produce the subgroup. In some cases, there were not enough
records to fill a subgroup, in which case all of the located
records were used and the subgroup was smaller than the
target size. The final size of the enriched portion of our
sample was 77 records. These records were then removed
from the data pool. The remainder of our sample (423
records) was then randomly drawn from the data pool,
stratified by health plan. The final sample of 500 records
contained 125 records from each of the four health plans (see
These 500 records, each representing a single primary care
visit, were automatically transformed into HTML files for
convenient viewing. Four trained medical record chart re-
viewers coded all 500 records, using a standard web browser
application to view the HTML files.32Abstractors were trained
to look for evidence of documentation of the 5A’s of smoking-
cessation counseling. For each record, they were asked to
identify whether each “A” was absent or present. They were
encouraged to use information from all parts of the record
(i.e., both the coded portions, such as medications ordered,
and the free-text or progress notes section). Each site submit-
ted the results of this work back to our coordinating center as
anonymous database records that indicated, for each clinical
encounter, the presence or absence of each of the 5A’s.
Training Human Abstractors to Identify the 5A’s
A 2-day training meeting was attended by one medical record
coder from each of the four participating study sites. Before
the meeting, each coder was given the evidence-based clinical
practice guidelines to learn the content area,26and immedi-
ately before training took a brief true/false, multiple-choice
test to evaluate their understanding of some of the basic
concepts in tobacco cessation. All coders scored at least 80%
on the test.
The training included a review of the development of the
5A’s tobacco-cessation guidelines by a tobacco expert from
the research team (JFH). The instructor (KSS) reviewed the
coding manual, coding definitions, common tobacco terms,
and common medical record terms and abbreviations. The
coders were issued the coding manual, and worked on
computers set up with data review and entry utilities.
An intensive case review of five examples—segments of
progress note narrative relevant to smoking cessation—was
conducted. The proper coding of each case was discussed in
detail. All participants coded ten new examples and entered
their results into their respective databases. Results from all
coders were compared with each other and discussed until a
common understanding was reached. The second day con-
sisted of additional case reviews, and included some example
records in HTML format, to familiarize the coders with the
actual form of data for the validation study. Several weeks
later, a preliminary study was run that allowed abstractors to
try out the entire process on an initial set of 500 records.
Significant differences among coders were noted and fol-
low-up training was conducted during site visits by the instruc-
tor (KSS), over the next several weeks. Finally, an additional
500 records were abstracted for the validation study reported
Table 2 contains the mean agreement on each of the
A’s between MC and each of the human raters (four
pairs of raters in “mean MC agreement” group) along
with the mean agreement among all pairs of human
raters (six pairs of raters in “mean non-MC agreement”
group). These means are averages of the kappa statistic
(known as Light’s kappa33) for each group. The differ-
ence between these two groups was tested for signifi-
cance using Student’s t -test.
For two of the A’s (“ask” and “assist”), the mean
agreement between the human coders is indistinguish-
able from agreement between the humans and Medi-
Class (Student’s t -test, p ?0.05). For two other A’s
(“assess” and “advise”), the humans agreed more often
with each other than they did with MediClass (Stu-
dent’s t -test, p ?0.01). The fifth A (“arrange”) was
coded too infrequently by the human abstractors to be
Figure 1. The validation study sample. An “enrichment”
process (see text for explanation) was used to select up to 20
records from each health plan, and random selection then
used to fill out each plan’s sample to 125 records. The total
validation study sample of 500 included 500 primary care visit
records of known smokers.
American Journal of Preventive Medicine, Volume 29, Number 5
compared, and was dropped from further consider-
ation in our analyses.
Another way to analyze these data is to create a “gold
standard” using the majority opinion of the human
raters (i.e., three or four of the human raters agree on
either the presence or absence of a particular “A”), and
then compute the accuracy of MediClass against this
gold standard. Because there were four human abstrac-
tors rating the cases, “ties” had to be adjudicated. To
break the ties, an expert on the team who had been
involved in training the humans (KSS) independently
coded just these cases. There were a total of 72 out of
2000 cases (500 records times four possible codes) that
required adjudication (4 of “ask,” 24 of “advise,” 14 of
“assess,” and 30 of “assist”).
With the coding ties broken, MediClass agreed with
the gold standard 91% of the time. Table 3 shows
MediClass performance as measured by two common
comparison against a gold standard. As shown in Table
3, point estimates of sensitivity were found to be 0.97,
0.68, 0.64, and 1.0, while for specificity, they were 0.95,
1.0, 0.96, and 0.82, respectively, across the four A’s
(“ask,” “assess,” “advise,” and “assist”) for which mea-
surement was possible.
This study is the first to evaluate an automated system
that abstracts data from the coded and free-text fields
of the EMR to assess clinicians’ adherence to tobacco
treatment guidelines. Even in state-of-the-art EMR sys-
tems, most data are entered as free-text narrative, and
are therefore not amenable to currently available auto-
mated assessment methods. This hinders the potential
of the EMR for assessing healthcare quality. This study
evaluated an automated EMR classifier that incorpo-
rates natural language processing techniques and han-
dles both free-text and coded record data. The Medi-
Class system was encoded with smoking-cessation
guideline knowledge, and performance in automated
coding for the 5A’s was compared with that of trained
human record abstractors. The system performed sim-
ilarly to human abstractors in determining whether
clinicians at four different health plans performed the
five tasks recommended by evidence-based smoking-
cessation clinical practice guidelines.
There are strengths and weaknesses in both manual
and automated means of quality assessment through
chart abstraction. For humans performing manual ab-
straction, the 5A’s coding task is difficult for a number
of reasons, such as the data can be buried in the middle
of clinical notes, the shorthand of clinicians makes it
hard to spot the relevant portions of the note, and
variability in word usage and sentence construction
make interpretation of what was written problematic
from the standpoint of identifying effective counseling
behaviors. Importantly, the human raters in this study
had a difficult time consistently applying the 5A’s
guidelines to all 500 records, despite careful training.
This inconsistency shows up in reduced levels of inter-
rater agreement among the human raters (see Table 2,
fourth column). The difficulty of this task, possibly in
combination with relatively low prevalence of events in
the data, created lower magnitudes in our agreement
measure (kappa) than is ideal.34In particular, the low
frequency of “assess” and “assist” in the data may be
partly responsible for lower kappas for these events.
MediClass was entirely consistent (as would be ex-
pected from a computer program), yet lacked the
ability to detect subtle differences in how clinicians
record the encounter, some of which may represent
important distinctions in the efficacy of care delivered.
The case of “smoking-cessation discussion” provides an
example. MediClass was quite literal in interpreting a
“smoking-cessation discussion” as a form of “assist.”
However, the human raters appeared to sometimes use
implicit social knowledge to decide that a tersely
Table 3. MediClass performance against gold standard
created from human raters
(n ? 500) Sensitivity 5A stepSpecificity
NA, not applicable.
Table 2. MC agreement with human raters compared with agreement among human raters for each of 5A’s (n ? 500).
Frequency (mean across
all four human coders) (Light’s kappa)
Mean MC agreement Mean non-MC agreement
Notes: Mean MC agreement, n ? 4 pairs; mean non-MC agreement, n ? 6 pairs.
*Two-tailed Student’s t-test of difference between MC agreement and non-MC agreement, p ? 0.01 (bolded).
MC, MediClass; NA, not applicable.
Am J Prev Med 2005;29(5)
worded note, although it clearly documented a “discus-
sion” about smoking cessation, did not really count as
assistance. The humans would often agree about these
cases, but MediClass would not make the necessary
distinctions. In principle, if this tacit knowledge were
made explicit, it could be encoded into the system to
Relatively low values of specificity on “assist” and
sensitivity on “advise” (Table 3) reveal a significant
difference in the classifications of MediClass and the
human raters. Upon review, 12 cases were found coded
as “assist” (and not “advise”) by MediClass, but coded as
“advise” (and not “assist”) in the gold standard. Look-
ing in detail at these 12 cases revealed four reasons for
the discrepancy: (1) in five cases, language in the
progress note such as “counseled for smoking cessa-
tion” qualified (for MediClass) as assist and not advise,
but was coded conversely by the human raters in the
gold standard; (2) similarly, in three cases, the “reason
for visit” code for “tobacco-cessation discussion” quali-
fied as assist and not advise for MediClass, but con-
versely for the humans; (3) in two cases, MediClass
made context errors (e.g., interpreting the text seg-
ment “URI education done RTC PRN quit smoking” as
delivery of smoking-cessation education); and (4) in
two cases, MediClass erred by incorrectly spell-correct-
ing words to the word “counsel,” which combined
(unfortunately) with local context to indicate that
smoking-cessation counseling was given when no such
evidence exists. Getting just 8 of the 12 cases (reasons 1
and 2 above) converted to “assist” in the gold standard
would increase MediClass sensitivity for “advise” from
0.68 to 0.72, while simultaneously increasing MediClass
specificity for “assist” from 0.82 to 0.83.
Finally, there are significant “cost” differences be-
tween MediClass and human abstractors. Even with the
data-system peculiarities that would need to be accom-
modated at each new installation site, it would be
relatively inexpensive to use MediClass to assess smoking-
cessation care delivery at additional health plans. Once
installed, MediClass performance is more than two
orders of magnitude (?100 times) faster than a trained
human abstractor, and it operates at negligible cost.
Furthermore, the MediClass system is easily replicated
on additional computers—at the cost of purchase,
installation, and maintenance of each additional desk-
top personal computer—multiplying
throughput with each replication.
Although coded data (e.g., ICD and CPT codes) can
provide unambiguous records of diagnoses and treat-
ment, not all important diagnostic and treatment activ-
ities fit easily into coded categories. Particularly for
counseling treatments for patients, free-text notes may
be much more valuable for documenting and facilitat-
ing continuity of care between visits and between care
providers. This test of a natural language application
shows that automated coding of free-text notes can be
practical and potentially provide high-quality measures
of health and treatment patterns in large populations.
Informative and essential data within uncoded clinical
notes and other text portions of the EMR are unavail-
able to current automated health and care assessment
methods. Due to the clinical value of narrative, and the
poor acceptance thus far of structured data entry, it is
unlikely that wholesale replacement of the EMR narra-
tive with structured data entry will succeed. This study
demonstrates the feasibility of an automated coding
system for processing the entire EMR, enabling assess-
ment of smoking-cessation care delivery. Such a system
can be similar in accuracy to that of trained human
coders. Systems such as MediClass can help bridge the
gap between the promise and the realization of value in
We would like to acknowledge the work of Jessica Warmoth
(Kaiser Permanente-Hawaii), William Franson (Harvard Pil-
grim Health Care), Marilyn Pearson (Kaiser Permanente-
Colorado, and Kim Olson (Kaiser Permanente-Northwest) for
their help in reviewing and manually coding all of the
medical records used in this study. This work was supported
by a grant from the National Cancer Institute (U19 CA79689)
for The HMO Cancer Research Network (CRN2). The Can-
cer Research Network (CRN) consists of the research pro-
grams, enrollee populations, and databases of ten HMOs that
are members of the HMO Research Network, including
Group Health Cooperative, Harvard Pilgrim Health Care,
Henry Ford Health System, HealthPartners Research Foun-
dation, the Meyers Primary Care Institute of the Fallon
Healthcare System/University of Massachusetts, and Kaiser
Permanente in five regions (Colorado, Hawaii, Northwest
[Oregon and Washington], Northern California, and South-
ern California). The overall goal of the CRN is to increase the
effectiveness of preventive, curative, and supportive interven-
tions that span the natural history of major cancers among
What This Study Adds . . .
Although evidenced-based guidelines for tobacco-
cessation treatment have been developed, effi-
cient methods for assessing adherence to these
guidelines have been lacking.
This study is the first to evaluate an automated
system that abstracts data from coded and free
text fields of the electronic medical record to
assess smoking-cessation treatment.
The automated system performed similarly to
human abstractors in determinations of whether
clinicians at four different health plans per-
formed the tasks recommended by evidence-
American Journal of Preventive Medicine, Volume 29, Number 5
diverse populations and health systems through a program of Download full-text
This work was supported in part by a grant from the
National Cancer Institute (U19 CA79689) for The HMO
Cancer Research Network (CRN2).
No financial conflict of interest was reported by the authors
of this paper.
1. Institute of Medicine, Dick RS, Steen EB, Detmer DE. The computer-based
patient record: an essential technology for health care. Rev. ed. Washing-
ton DC: National Academy Press, 1997.
2. Corrigan JM, Donaldson MS, Kohn LT, eds. Crossing the quality chasm: a
new health system for the 21st century. Washington DC: National Academy
3. Schneider EC, Riehl V, Courte-Wienecke S, Eddy DM, Sennett C. Enhanc-
ing performance measurement: NCQA’s road map for a health informa-
tion framework. JAMA 1999;282:1184–90.
4. Thompson TG, Brailer DJ. The decade of health information technology:
delivering consumer-centric and information-rich health care—framework
for strategic action. Washington DC: U.S. Department of Health and
Human Services, July 21, 2004.
5. Vogt TM, Aickin M, Ahmed F, Schmidt M. The Prevention Index: using
6. Hicks J. The potential of claims data to support the measurement of health
care quality. PhD diss. RAND Graduate School, 2003. (Available as of
December 20, 2004, at: www.rand.org/cgibin/Abstracts/ordi/getabbydoc.
7. McDonald C. Quality measures and electronic medical systems. JAMA
8. McDonald C. The barriers to electronic medical record systems and how to
overcome them. J Am Med Inform Assoc 1997;4:213–21.
9. Kaplan B. Reducing barriers to physician data entry for computer-based
patient records. Top Health Info Manag 1994;15:24–34.
10. Walsh SH. The clinician’s perspective on electronic health records and how
they can affect patient care. BMJ 2004;328:1184–7.
11. Coeira E. When conversation is better than computation. J Am Med Inform
12. Rottger P, Sunkel H, Reul H, Klein I. New possibilities of statistical
evaluation of autopsy records. Computer free text analysis. Methods Info
13. Fenichel RR, Barnett GO. An application-independent subsystem for
free-text analysis. Comput Biomed Res 1976;9:159–67.
14. Sager N, Wong R. Developing a database from free-text clinical data. J Clin
15. Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language
processing to translate clinical information from a database of 889,921
chest radiographic reports. Radiology 2002;224:157–63.
16. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of suspected
tuberculosis patients based on natural language processing of chest radio-
graph reports. Proc AMIA Symp 1996:542–6.
17. Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB,
Clayton PD. Unlocking clinical data from narrative reports: a study of
natural language processing. Ann Intern Med 1995;122:681–8.
18. Friedman C, Knirsch C, Shagina L, Hripcsak G. Automating a severity score
guideline for community-acquired pneumonia employing medical lan-
guage processing of discharge summaries. Proc AMIA Symp 1999:256–60.
19. Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of
clinical documents based on natural language processing. J Am Med
Inform Assoc 2004;11:392–402.
20. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding
neuroradiology reports for the Northern Manhattan Stroke Study: a
comparison of natural language processing and manual review. Comput
Biomed Res 2000;33:1–10.
21. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language
processing and the representation of clinical data. J Am Med Inform Assoc
22. Honigman B, Lee J, Rothschild J, et al. Using computerized data to identify
adverse drug events in outpatients. J Am Med Inform Assoc 2001;8:254–66.
23. Centers for Disease Control and Prevention. Cigarette smoking-attributable mor-
bidity—United States, 2000. MMWR Morb Mortal Wkly Rep 2003;52:842–4.
24. U.S. Department of Health and Human Services. The health consequences
of smoking: a report of the Surgeon General. Atlanta GA: Centers for
Disease Control and Prevention, National Center for Chronic Disease
Prevention and Health Promotion, Office on Smoking and Health, 2004.
25. Mokdad AH, Marks JS, Stroup DF, Gerberding JL. Actual causes of deaths
in the United States, 2000. JAMA 2004;291:1238–45.
26. Fiore MC, Bailey WC, Cohen SJ, et al. Treating tobacco use and depen-
dence: a clinical practice guideline. Rockville MD: U.S. Department of
Health and Human Services, 2000 (also available at: www.surgeongeneral.
27. Quinn VP, Stevens VJ, Hollis JF, et al. Tobacco-cessation services and
patient satisfaction in nine non-profit health plans. Am J Prev Med
28. Hollis JF, Bills R, Whitlock E, Stevens VJ, Mullooly J, Lichtenstein E.
Implementing tobacco interventions in the real world of managed care.
Tobacco Control 2000;9(suppl 1):i18–i24.
29. Lancaster T, Stead L, Silagy C, Sowden A. Effectiveness of interventions to
help people stop smoking: findings from the Cochrane Library. BMJ
30. U.S. Public Health Service. Tobacco Use and Dependence Clinical Practice
Guideline Panel. A clinical practice guideline for treating tobacco use and
dependence. JAMA 2000;283:3244–54.
31. Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: a system for
detecting and classifying encounter-based clinical events in any EMR. J Am
Med Inform Assoc 2005;12 (in press) (electronic preprint available at:
32. Hripcsak G, Kuperman GJ, Friedman C, Heitjan DF. A reliability study for
evaluating information extraction from radiology reports. J Am Med
Inform Assoc 1999;6:143–50.
33. Conger A. Integration and generalization of kappas for multiple raters.
Psychol Bull 1980;88:322–8.
34. Di Eugenio B, Glass M. The kappa statistic: a second look. Computational
Am J Prev Med 2005;29(5)