Page 1
CLIN. CHEM.25/12, 20342037 (1979)
2034 CLINICALCHEMISTRY.Vol. 25, No. 12, 1979
The Performance of Delta Check Methods
Lewis B. Shelner, Lawrence A. Wheeler,1 and John K. Moore
The percentage of mislabeled specimens detected (true
positive rate) and the percentage
specimens misidentified(falsepositive
puted for three previously proposed delta check methods
and two linear discriminant functions. The truepositive
was computed from a set of pairs of specimens,
having one member replaced by a member from another
pair chosen at random. The relationship
positive and falsepositiverates was similar among the
delta check methods tested, indicating equal performance
for all of them over the range of falsepositive
terest. At a practical falsepositive operating rate of about
5%, delta check methods
mislabeled specimens; even if the actual mislabeling rate
is moderate (e.g., 1%), only about 10% of specimens
flagged by a delta check will actually have been misla
beled.
of correctly
rate) were com
labeled
rate
each
between true
rates of in
detectonly about50% of
Additional
statistics
Keyphrases:
quality control
‘
computers
Delta check
laboratory
methods have
results
been proposed
(13).
test results
on hospitals
agencies
to evaluate
for detecting
of the impor
for patient
and laboratories
to improvequalitycontrol
the effectiveness
er
roneous
tance of accurate
the increasing
ernment
techniques,
delta check methods.
For a given patient,deltacheck methods compare
ferences(deltas) between
spondingprevious test values
exceeds its threshold, the
check andis suspectedof being
laboratoryerrormay cause one or more ofa set of test values
to fail a delta check. The method
as a methodfor detectingtwo important
imen mislabeling (i.e., assigning
different fromthe one fromwhom
taken)and error in reportingtest results,
of errors are not detectableby any other a posteriori
such as checkingcontrol specimens
limits.
This study reports an evaluation of the abilityof three
previouslysuggested deltacheck
discniminantfunctionsto detect
evaluation is based on simulated
rors with almost3000 pairs of actual
6) determinations.
test
Because
laboratory
pressure
accrediting
it is appropriate
care and
by gov
and
of
the dif
corre
If a delta
the delta
source
today’s
with given
value
test values
thresholds.
and
for “today”
erroneous.
fails
Anyof
is particularlyinteresting
of error,
valuesto a patient
the sample
because
typesspec
“today’s”
was actually
thesetypes
means,
againstqualitycontrol
methods
mislabeled
specimen
(13) and two linear
specimens.
identification
continuousflow
The
er
(SMA
Department
San Francisco,
1 Present
Pathology,
Received
of Laboratory
CA 94143.
address:
1100 W. Michigan
May 10, 1979; accepted
Medicine, Universityof California, S.F.,
IndianaUniversity,
St., Indianapolis,
Sept. 10, 1979.
Department
IN 46202.
of Clinical
The essential
suggested
approximately
their relative
mens(truepositive
positive
tainable
falsepositive
delta check is therefore
ficationprocedures.
finding
check
equivalent
abilities
of the study
methods
when
to correctly
rate)while
However,
deltacheck
rates (about
no panacea
is thatthe previously
SMA6 tests
is judged
mislabeled
at similar
truepositive
operating at acceptable
only about
for lax specimenidenti
delta for the usual
performance
identify
operating
the highest
methods
5%) is modest,
are
by
speci
false
rate
rates.
by these
at
50%. The
Materials and Methods
Thetest results
clinical laboratory
Center, San
Francisco,
ComputingLaboratory
tient historyfile, as of autumn
for the preceding
of SMA6 (Technicon
10591)results
for which the second determination
2.5 days after the first. No test result
All patientsin the history
in this study. Therefore,
sample of patients
or outpatientstatus,
lectiol) was that the physicians
orderedtwo or more SMA
riod.
A totalof 2988 pairs of SMA 6 results,
from 749 patients, was available.
revealedthat in 19 pairs
gical resultsexisted(e.g., the test value was zero or impossibly
large).In each such casethat
cludedfrom thestudybecause
detection of specimen mislabeling.
The three delta check methods
of the six SMA 6 tests they check
thresholds.The method of Wheelerand
all six SMA 6 values(Na,K,
trogen, and creatinine), plus the serum urea nitrogen/creati
nine ratio and the “aniongap”
this methodwill hereafter be designated
methodof Whitehurst et al. (2), which
SMA6 values, will be designated
of the Ladensonmethod
(1), method
serum ureanitrogen,and creatinine.
Thethreshold values of methods
propriatereferences,were selected
authors.The thresholdvalues
empirically observeddistribution
groupof hospital
patients. Method
for eachtest value,the most stringent
1% of (presumably)correctly
exceedit as a resultof physiological
used in this study
of the University
with
computer
were collected
of California
of its Community
system.
1977 (containing
was searched
Instruments Corp.,
(electrolytes,serum
in the
Medical
Health
use
Thesystem’s
all test results
to identify
Tarrytown,
nitrogen,
was madebetween
was usedmore than
file were considered
the data arise from an unselected
who vary with respect to age, sex, inpatient
and clinical state.
caringfor these
6 testswithin
pa
60 days),all pairs
NY
ureacreatinine)
0.9 and
once.
for inclusion
Theonly basis
patients
thespecified
for se
had
pe
representingdata
An examination
obviously
of the pairs
nonphysiolo one or more
pairof SMA
we sought
6 results
to evaluate
was ex
only
investigated
and in the magnitude
differ in which
of their
Sheiner
(3) checks
bloodCl,HCO:r, ureani
(Nat + K

C1
method

HCO31;
A. The
onlythechecks
B. Our adaptation
C, checks only Na,
six
method
K,
B and C, given
intuitively
of method
of deltas
A defines
being
identified
shifts
in the ap
by the methods’
A are based
in an unselected
three
so high that
specimen
in test
on the
thresholds
only
will
the
pairs
values;
Page 2
D7
‘U
>
0a
LU
I
05
10 1520
CLINICALCHEMISTRY,Vol. 25, No. 12, 19792035
second
and
thresholds
respectively.
Because
neously,,one
a specimen
fail the specimen
ceedsits threshold,
exceeds
thresholds,
for each parent
We computed
methods
applying
we studied.
“false positives”
include at most
The truepositive
specimens
putedfrom
cluding a “mislabeled”
of the first SMA 6 value from one of the original 2969 pairs and
the second SMA6 value from another
from among the pairs for other
Obviously any submethodthat
rate and lower truepositive rate than some other submethod
can be ignored in assessing the performance
remainingsubmethodsof the
cannotbe bettered by any other submethod
at a given falsepositive
rate, are called “optimal.”
ceiverOperating Characteristic
depicts
its performance.
An ROC curve
plotting the truepositiverates
vertical axis againsttheir falsepositive
axis and connectingthesepoints
1).
ROC curves make one kind
especially simple:one method
to anotherif its ROC curve everywhere
above the ROC curve of the latter, for then the former method
has a highertruepositiverate
positive rate.
Lineardiscriminantfunctions,
niquefor classifying multivariate
(4), have not beenusedpreviously
checks. Our lineardiscriminant
linear in a set of variables derived
of the lineardiscriminant functions
portedis given by equation 1, below.)
“correctlylabeled” and“mislabeled”
whether theirdiscriminant function
threshold.Each differentdiscriminant
sidered
a method.Adjustment
methods with specifiedfalsepositive
Given a set of variablesderived
“indexset” of pairsassumed
index set of pairsknown to be mislabeled,
discriminantfunction of these
minimize the total probability
indexsets into the two groups.
of variablesto include in the discriminant
by computeralgorithms thatdetermine
of a givensize best differentiates
variables are available, there
minant functionswith almost
betweenthem is essentially arbitrary.
A discriminantfunction is always
osyncracies of the data set from
will be exceeded
least
will be designated
by physiological
5% of the
as methods
shifts
These
Aol, A02, and A05,
2% of the time;
threethe stringent, time.setsof
each method
may define a hierarchy
has failed the delta check. For example,
only if one particular
if at least
its threshold,if at least
etc. Each such distinct
method.
the falsepositive
of each of the parent
eachsubmethod to the 2969 pairs
All pairs failing by a submethod
because we assume
very few mislabeled
ratesthe
detectedby the various
a “mislabeled”data
specimen.
checksa number
of rules for deciding
of deltas simulta
that
one may
delta(e.g., Na)ex
one of all the deltas
two deltas
rule defines
checked
exceed
a submethod
their
rate
check
for each
methods
of SMA
are, by definition,
thatthe original
specimens.
proportion
submethodsare
set of 2969
Each of these
of the sub
by simply
6 results
delta
data
of mislabeled
com
pairs,
pairs
each
consists
in
pair, selectedrandomly
patients.
has a higherfalsepositive
of a method.
whose
of that method
The
method, performance
The Re
(ROC)curve of a method
is constructed
submethods
rateson the horizontal
to form a curve
by
of optimalon the
(see Figure
of comparison
is in all circumstances
of methods
superior
lies to the left of and
thanthe latterat any false
a familiar
observations
to our knowledge
function
from one SMA 6 pair.
for which
Pairsare assigned
groups
value
function
of the threshold
rates.
from an SMA 6 pair, an
to be correctly
the “best”
variables has coefficients
of misclassification
Choice of the appropriate
function
which
the two groups
usually existnumerous
identical properties;
statistical
into two groups
tech
for delta
thatis a formula is
(One
are re
to the
results
accordingto
exceedsa certain
can be con
yieldssub
labeled, and an
linear
that
of the two
sets
is aided
set of variables
(5). If many
discri
the choice
influenced
it is computed.
by the idi
The which
FALSEPOSITIVE (#{190})
Fig. 1. ROC curves (truepositive
falsepositiverate as a percentage)
discussed in the text
A, the composite curve for the methods
Sheiner(3); 8, the method of Whitehisst et al. (2); C, the method of Ladenson
(1); D30, the 30variable discriminant function; Dl, the 7variable dlscriminant
function. The ROC curves of methods A. B. and C connect the ROC points of
their optimal
submethods (see text)
rate as a percentage
for the various methods
vs
AOl, A02, and A05of Wheeler
and
more
is this influence.
here probably
data sets.
Altogether,
were considered
Sevenvariables
of values
analyte
the delta,
SMA 6, the delta divided
mens, and the absolute
Results for two discriminant
sults:
the best function
D7, and the bestfunctioninvolving30 variables, denoted
The D7 discriminantfunction
variables thatare included
Hence,
wouldnot perform
in the function, the stronger
reported
independent
the discriminantfunctions
so well on other
56 variables
for use in the linear discriminant
werecomputed
by deltacheck
in the first SMA6 set, the delta,
the deltadivided
by the time elapsed
values of the two quotients.
computed from eachSMA
functions.
6 pair
from
method
eachof the eight
A: the
the absolute
value in the first
between
pairs
of the
value
testedvalue
of
by the analyte
speci
functionsare reported in Re
denotedinvolving only seven variables,
D30.
is:
F
= .0552I/.NaI
.0775IHCO3I
.471I2KI .0471IClI
.0247ILureaNI
.O959ICrI
+.0159
of the
ureaN1
delta
blood
+1.29(1)
where
where
a pair. Given
greater
withF less than
classified
Results
Figure
sponding
methods
among the curves
become
confidence
±2%, even for a data
variability
dataset in exactly
reestimated
standard
truepositive
95%confidence
At a falsepositive
have about
suggested
discriminant
.x
is the absolute
ureaN1is the value
a cutoff
thanF
value
of the first
value,
for test
urea
pairs
labeled,
check,
x, and
nitrogenof
F
F
as properly
fail the delta
(see Results),
with
are classified
or equal
andpairs
to F andare
as mislabeled.
1 presents
to falsepositive
of this
the portions
rates
There
of the
less
are
A, B, and C. These
when we note
rates
ROC curvescorre
all thethan20% for
minorpaper.
of methods
interesting
of the estimated
set as large as ours.
rates, we created
the same way as the first (see Methods)
all thetruepositive
deviation of the differences
estimates was 0.9, which
interval of ±2%.
rate of about
a 10% higher truepositive
method, but recall that
functionon the index
only differences
differences
theevenlessthat95%
limitsare on the order
To get an idea of the
a second“mislabeled”
of
of these
and
theratesfrom it. Indeed,
corresponding
an approximate
between
yields
2% the D30 method
rate than
theperformance
setsis likely
does
any previously
of the
to be better
Page 3
07, 030k
FP%
1.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
07
F0
1.495
0.765
0.345
0.115
0.025
0.125
0.215
0.295
0.365
030
TP%
23
41
56
65
71
76
79
82
85
D30
TP%
36
51
60
68
75
79
81
83
86
21
38
49
57
70
76
84
A02
AOl
A05
B
B
A05
A05
(CR)
(B/3)
(CR)
(B/2)
(B12)
(A/2)
Table 1. ROC Values for Discriminant Functions, and Comparisons with Delta Check Methods
2036 CLINICAL CHEMISTRY,Vol. 25,No. 12, 1979
Comparable
TP%
values for methods
Method
A, B, or Cb
(submethod)
(A/4)
a D7 Is the discriminant
and truepositive
is classified as properly labeled, and one whose discriminant function value is less than F Is classified
b B (CR)names the submethod that Only compares the creatinine delta to the threshold of method B; A05 (B13) names the submethod
ttesholds
of method
A05 the deltas checked by method
B and fails the specimen
are named similarly. The method and submethod in this column have the greatest TP% among those for any submethods of Method A, B, or C for which FP%
exceeds the value in the first column by less than 0.1%; there are no such submethods wIth FP% close to 12.5% or 17.5%.
function using seven variables
rates expressed as percentages. F is the cutoff value for the D7 function; an SMA 6 paIr for which the discriminant function value exceeds
(see Materials and Methods) and D30 is that using 30 variables. FP% and TP% denote the falsepositive
F,,
as mislabeled.
that compares with the
Other methods and submethods
If any three of thesedeltas fail these checks.
than
for methods
Figure
curves:
only
specificity
operating
Figure
of truly
Table
minant
performances
and C. The
can be expected on any other
C.
us of a universal
sensitivity
if one is willing
(increased falsepositive
point on the
1 indicates that
mislabeled specimens.
1 presents the performances
functionsubmethods
of comparable
abovefindings
data set. No such bias exists
A, B, and
1 reminds
increased
possible
characteristic
truepositive
theprice
rate).
of 5% falsepositive
to detect
of ROC
rate) (increased
to pay
is
of decreased
At a reasonable
ROC
one may expect
curve rate,
only 5060%
of certain
compares
of the discri
them with
of Methods
obvious.
and
submethods
are again
the
A, B,
Discussion
Laboratory
differs from that which would have been obtained ifthe test
were correctly performed on a specimen
tient.Therefore, mistakes in labeling
mance of tests, and reportingof results
oratoryerror. In thisstudy, we evaluate
three previouslydescribed delta
“mislabeled” specimens.Although
of delta checks, it is animportant
therefore meritsome consideration.
Our results leadus to two conclusions.
to identifyingmislabeled specimens,
used for the deltacheck is relatively
suggested methodsare capable
performance.Second, no currently
methodoffersa truly substantial
mislabeled specimens;a practical
identification ofonly halfthe trulymislabeled specimens.
The first conclusion is supported
1: the eight
tests
used for method
little to the performancepossible
Method C. This is undoubtedly
betweenthe various tests, especially
values.Althoughwe did not specifically
we speculate that deleting serum
(but not both)fromthe four Method
detrimental effect, because these
correlated.
As mentioned in Results,
only the 30variable
functionappears to improve
othermethods, andthis result
error may be definedas reporting a valuethat
from
of specimens,
all contribute
the effectiveness
methods
this is not the only function
one,and
the correct pa
perfor
to lab
of
check for detecting
ourfindings
First,
particular
with respect
methodthe
unimportant:
of approximately
available
guarantee
expectation
all currently
the
delta
of detecting
is correct
same
check
by inspection
A (and
with the four tests
because of the high correlation
among
test for the possibility,
urea nitrogen
C tests would
twotests
of Figure
the D7 method) add
for used
theelectrolyte
or creatinine
have little
also arehighly
discriminant
upon
is highly
theperformance
suspect.
of the
Indeed, we
deliberately
discriminant
performance
another
involves
performance
then,
as an overly
(linear)
conclusion:
check.
Any practical
account.
parison
a negligible
checkis another
rerunning
tigation.
drawing
a falsepositive
reporting
laboratory
ratehigher
course,
between
fraction
When
the delta
trueand falsepositive
light. This
belingrate.
each 100 specimens
5% falsepositive
fail the check, (b) almost
menswill not be detected,
specimensidentified
quentinvestigation,
Of course,
mens thatare not detected
the firstday’s
second day’svalues forthe patient;thislowers the “cost” of
not detecting mislabeledspecimens
Further,although
we feelthatthe major conclusions discussed
aboveare warranted, we caution
much reliance on theexactvaluesofour ROC
them as more than a guideline
have
function
could
not given the equation
we have so little
be duplicated
Thisis because
coefficients that
by exploiting idiosyncracies
we regardthe 30variable
optimisticupper
deltacheck system,
onemust not
for the 30variable
faith
on a dataset gathered
the 30variable
it verylikely
of the data
discriminant
boundon the performance
we are led directly
expecttoo much
becausethat its
at
institution.
so many
function
achieves its
set. If,
more
of any
function
to our second
fromthe delta
system
actual
must take costs aswellasbenefits into
computation of the
by a laboratory computer
investigationof a specimen
matter. If mislabeling
the specimendoes not constitute
Failure of a delta check
and running a new specimen.
rate of 5% will raise costs at least
of 5% of results.Although
directors would permit operation
than5%, the exactrate
a matter of individualchoice,
operating costsand the benefits
of mislabeled specimens.
one considersthe actualfraction
checkthatactually will have
figuresof Table
fractiondepends,of course,
For example,if the mislabeling
is incorrectly identified),
rate meansthat(a) 5.5% of all specimens
half of the truly
and
(c) about
as possiblymislabeled
prove to have
because the test values
by the delta
values, they will likely
The
to a threshold
cost. The
deltaand its com
system
that fails the
simply
adequate
logically
Operating
5% and delay
we think it unlikely
at a falsepositive
at whichto operate
involving
of identifying
has
is suspected,
inves
require should
this way at
that
is, of
a tradeoff
some
of all tests
beenmislabeled,
1 are seen in another
on the actual
rateis 1% (one of
operating
failing
the
misla
at a
will
mislabeled
90% of the
will, on subse
false alarms.
of the mislabeled
check differ
differ little
speci
5.5% of
been
speci
fromlittle
from the
by an unknownamount.
our readers not to place
figures, ortouse
of the point
too
for the choice on
Page 4
CLINICALCHEMISTRY,Vol. 25, No. 12, 1979 2037
the
system.
There
First, the falsepositive
the true values
mislabeled
Second, the truepositive
becausethey refer only to detection
Both of theseconsiderations
equally,and do not, therefore,
further doubt their
to seriouslyweaken
A third reason
tern of clinicalcharacteristics
be representative
institutions.This
delta checkmethods.
truthof two propositions:
patient
are apt to be more “alike” than specimens
random from two different
term changes in the test results
homogeneouspopulation
to a screeningclinic) may violate
population of desperately
calamitousshifts in clinical status
sive care unit) may violate
delta check ROC curves for such patient
seems reasonablethat they would
tive positionsas in our study.
ROC curveat which to operatetheirown delta check
are three reasonsfor our caution.
rates we find are upper
our data set probably
bounds on
becausecontains a few
specimens.
rates we find may be lower bounds
of specimen
affect all delta
weaken our first conclusion.
of sufficient quantitative
our secondconclusion.
for somecaution is that
in our group
of the characteristics
could change the ROC curves
All delta check methods
(a) serial specimens
mislabeling.
checkmethods
We
being importance
the particular
of patients
of patients
for the various
depend
from
pat
may not
at other
on the
a single
selected
short
are unlikely.
at
patients, and (b) “large”
of a patient
of patients (e.g., “normals”
the first proposition,
ill patients,undergoing
(e.g., patients
the second. We cannot
A
coming
and a
andabrupt
in an inten
predict
However,
the same
the
groups. it
still display
If so, our first conclusion
rela
would
be insensitive
cumstance
(poorer
appears
Thus, despite
than qualitative
suggested
tecting
proper
tion can be relaxed.
to differences
would
discrimination)
conservative.
some cautions
matters,
deltacheck
mislabeled
laboratory
in groups of patients.
likely shiftthe ROC curves
so that our second
Either
to the right
conclusion
cir
still
concerning quantitative
that
equivalently
rather
we have shown
methods
specimens,
vigilance concerning
all currently
perform
but none perform
in de
so well that
identifica specimen
This work was supported in part by NIH Grant GM 16496, GM
00001.
References
1. Ladenson,
puter
to identify “laboratory
(1975).
2.Whitehurst, P.,DiSilvio, T. V.,and l3oyadjian,
discrepanciesin patients’
quality control. Clin. Chem. 21,8792 (1975).
3. Wheeler, L. A., and Sheiner, L. B., Delta
Technicon SMA 6 continuousflow
(1977).
4. Anderson,T. W., An Introduction
Analysis.
John Wiley and Sons, Inc., New York,NY 1958.
5. Nie, N. H., Hull, C. H., Jenkins,
Package for the Social Sciences),
New York, NY, 1975.
J. H., Patientsas their own controls:
error.” Clin.
Use of the com
21, 16481653
Chem.
G.,Evaluation of
of computerassistedresultsan aspect
checktables for the
analyzer.
Clin. Chem. 23, 216219
to Multivariate Statistical
J. G., et al., SPSS
2nd ed., McGrawHill
(Statistical
Book Co.,