Page 1
CLIN. CHEM.25/12, 20342037 (1979)
2034 CLINICALCHEMISTRY.Vol. 25, No. 12, 1979
The Performance of Delta Check Methods
Lewis B. Shelner, Lawrence A. Wheeler,1 and John K. Moore
The percentage of mislabeled specimens detected (true
positive rate) and the percentage
specimens misidentified (falsepositive
puted for three previously proposed delta check methods
and two linear discriminantfunctions. The truepositive
was computed from a set of pairs of specimens,
having one member replaced by a member from another
pair chosen at random. The relationship
positive and falsepositiverates was similar among the
delta check methods tested, indicating equal performance
for all of them over the range of falsepositive
terest. At a practical falsepositive operating rate of about
5%, delta check methods
mislabeled specimens; even if the actual mislabeling rate
is moderate (e.g., 1%), only about 10% of specimens
flagged by a delta check will actually have been misla
beled.
of correctly
rate) were com
labeled
rate
each
between true
rates of in
detect onlyabout50%of
Additional
statistics
Keyphrases:
qualitycontrol
‘
computers
Deltacheck
laboratory
methodshave
results
beenproposed
(13).
test results
on hospitals
agencies
to evaluate
for detecting
of the impor
for patient
and laboratories
to improvequalitycontrol
the effectiveness
er
roneous
tance of accurate
the increasing
ernment
techniques,
delta check methods.
For a given patient,deltacheck methods compare
ferences (deltas)between
spondingprevioustest values
exceedsits threshold, the
check andis suspectedof being
laboratoryerrormay cause one or more ofa set of test values
to fail a delta check. The method
as a methodfor detectingtwo important
imenmislabeling(i.e., assigning
differentfrom theone fromwhom
taken) and errorin reportingtest results,
of errors are not detectableby any other a posteriori
suchas checkingcontrol specimens
limits.
This study reports an evaluation of the abilityof three
previouslysuggesteddeltacheck
discniminantfunctionsto detect
evaluationis based on simulated
rorswith almost3000 pairsof actual
6) determinations.
test
Because
laboratory
pressure
accrediting
it is appropriate
care and
by gov
and
of
the dif
corre
If a delta
thedelta
source
today’s
with given
value
testvalues
thresholds.
and
for “today”
erroneous.
fails
Any of
is particularlyinteresting
of error,
valuesto a patient
the sample
because
typesspec
“today’s”
was actually
thesetypes
means,
against qualitycontrol
methods
mislabeled
specimen
(13) and two linear
specimens.
identification
continuousflow
The
er
(SMA
Department
San Francisco,
1 Present
Pathology,
Received
of Laboratory
CA 94143.
address:
1100 W. Michigan
May 10, 1979; accepted
Medicine, University of California, S.F.,
IndianaUniversity,
St., Indianapolis,
Sept. 10, 1979.
Department
IN 46202.
of Clinical
The essential
suggested
approximately
their relative
mens (truepositive
positive
tainable
falsepositive
delta check is therefore
ficationprocedures.
finding
check
equivalent
abilities
of the study
methods
when
to correctly
rate) while
However,
delta check
rates (about
no panacea
is thatthe previously
SMA 6 tests
is judged
mislabeled
at similar
truepositive
operating at acceptable
only about
for lax specimenidenti
delta for the usual
performance
identify
operating
the highest
methods
5%) is modest,
are
by
speci
false
rate
rates.
by these
at
50%. The
Materials and Methods
The test results
clinical laboratory
Center,San
Francisco,
ComputingLaboratory
tient historyfile, as of autumn
for the preceding
of SMA 6 (Technicon
10591)results
for which the second determination
2.5 days after the first. No test result
All patientsin the history
in this study.Therefore,
sample of patients
or outpatientstatus,
lectiol) was that the physicians
ordered two or more SMA
riod.
A totalof 2988 pairs of SMA 6 results,
from 749 patients, was available.
revealedthat in 19 pairs
gical results existed (e.g., the test value was zero or impossibly
large).In eachsuchcasethat
cluded fromthe studybecause
detectionof specimenmislabeling.
The three delta check methods
of the six SMA 6 tests they check
thresholds.The methodof Wheelerand
all six SMA6 values (Na,K,
trogen,and creatinine),plus the serum urea nitrogen/creati
nine ratio and the “aniongap”
thismethodwill hereafterbe designated
method of Whitehurst et al. (2), which
SMA6 values,will be designated
of the Ladenson method
(1), method
serumureanitrogen,and creatinine.
Thethresholdvaluesof methods
propriatereferences,were selected
authors.The thresholdvalues
empiricallyobserveddistribution
groupof hospital
patients.Method
for eachtest value, the moststringent
1% of (presumably)correctly
exceedit as a resultof physiological
used in this study
of the University
with
computer
were collected
of California
of its Community
system.
1977 (containing
was searched
InstrumentsCorp.,
(electrolytes,serum
in the
Medical
Health
use
The system’s
all test results
to identify
Tarrytown,
nitrogen,
was madebetween
was used more than
file were considered
the dataarise from an unselected
who vary with respect to age, sex, inpatient
and clinicalstate.
caringfor these
6 testswithin
pa
60 days),all pairs
NY
ureacreatinine)
0.9 and
once.
for inclusion
The only basis
patients
thespecified
for se
had
pe
representing data
An examination
obviously
of the pairs
nonphysioloone or more
pair of SMA
we sought
6 results
to evaluate
was ex
only
investigated
and in the magnitude
differ in which
of their
Sheiner
(3) checks
blood Cl,HCO:r,urea ni
(Nat+ K

C1
method

HCO31;
A. The
onlythechecks
B. Our adaptation
C, checksonly Na,
six
method
K,
B and C, given
intuitively
of method
of deltas
A defines
being
identified
shifts
in the ap
by the methods’
A are based
in an unselected
three
so high that
specimen
in test
on the
thresholds
only
will
the
pairs
values;
Page 2
D7
‘U
>
0a
LU
I
05
10 1520
CLINICALCHEMISTRY,Vol. 25, No. 12, 19792035
second
and
thresholds
respectively.
Because
neously,,one
a specimen
fail the specimen
ceedsits threshold,
exceeds
thresholds,
for each parent
We computed
methods
applying
we studied.
“falsepositives”
include at most
The truepositive
specimens
putedfrom
cludinga “mislabeled”
of the first SMA 6 value from one of the original 2969 pairs and
the second SMA6 value from another
from among the pairs for other
Obviously any submethodthat
rate and lower truepositiverate than some other submethod
can be ignored in assessingthe performance
remainingsubmethods of the
cannotbe betteredby any other submethod
at a given falsepositive
rate, are called “optimal.”
ceiver OperatingCharacteristic
depicts
its performance.
An ROC curve
plottingthe truepositive rates
verticalaxis againsttheirfalsepositive
axis andconnectingthese points
1).
ROC curvesmakeone kind
especiallysimple: one method
to another if its ROC curve everywhere
above the ROC curve of the latter, for then the former method
has a highertruepositive rate
positiverate.
Lineardiscriminant functions,
niquefor classifyingmultivariate
(4), havenot beenused previously
checks.Our lineardiscriminant
linearin a set of variables derived
of the linear discriminantfunctions
ported is given by equation1, below.)
“correctlylabeled” and“mislabeled”
whether theirdiscriminant function
threshold.Eachdifferentdiscriminant
sidered
a method. Adjustment
methods with specifiedfalsepositive
Given a set of variables derived
“index set”of pairsassumed
index set of pairsknown to be mislabeled,
discriminantfunction of these
minimize the totalprobability
index sets into the two groups.
of variablesto includein the discriminant
by computer algorithmsthatdetermine
of a given size bestdifferentiates
variablesare available, there
minant functionswith almost
betweenthem is essentiallyarbitrary.
A discriminant functionis always
osyncracies of thedataset from
will be exceeded
least
will be designated
by physiological
5% of the
as methods
shifts
These
Aol, A02, and A05,
2% of the time;
threethestringent,time. setsof
each method
may define a hierarchy
has failed the delta check. For example,
only if one particular
if at least
its threshold,if at least
etc. Each such distinct
method.
the falsepositive
of each of the parent
eachsubmethodto the 2969 pairs
All pairsfailingby a submethod
because we assume
very few mislabeled
ratesthe
detectedby the various
a “mislabeled”data
specimen.
checksa number
of rules for deciding
of deltassimulta
that
one may
delta(e.g., Na)ex
one of all the deltas
two deltas
rule defines
checked
exceed
a submethod
their
rate
check
for each
methods
of SMA
are, by definition,
that theoriginal
specimens.
proportion
submethodsare
set of 2969
Each of these
of the sub
by simply
6 results
delta
data
ofmislabeled
com
pairs,
pairs
each
consists
in
pair, selectedrandomly
patients.
has a higher falsepositive
of a method.
whose
of that method
The
method,performance
The Re
(ROC) curve of a method
is constructed
submethods
rates on the horizontal
to forma curve
by
of optimal on the
(see Figure
of comparison
is in all circumstances
of methods
superior
lies to the left of and
thanthelatter at any false
a familiar
observations
to our knowledge
function
fromone SMA 6 pair.
for which
Pairsare assigned
groups
value
function
of the threshold
rates.
from an SMA 6 pair, an
to be correctly
the “best”
variableshas coefficients
of misclassification
Choice of the appropriate
function
which
the two groups
usually existnumerous
identicalproperties;
statistical
into two groups
tech
for delta
thatis a formulais
(One
are re
to the
results
accordingto
exceedsa certain
can be con
yields sub
labeled, and an
linear
that
of the two
sets
is aided
set of variables
(5). If many
discri
the choice
influenced
it is computed.
by the idi
Thewhich
FALSEPOSITIVE (#{190})
Fig. 1. ROC curves (truepositive
falsepositive rate as a percentage)
discussed in the text
A, the composite curve for the methods
Sheiner(3); 8, the method of Whitehisst et al. (2); C, the method of Ladenson
(1); D30, the 30variable discriminant function; Dl, the 7variable dlscriminant
function. The ROC curves of methods A. B. and C connect the ROC points of
their optimal
submethods(see text)
rate as a percentage
for the various methods
vs
AOl, A02, and A05 of Wheeler
and
more
is this influence.
here probably
data sets.
Altogether,
were considered
Sevenvariables
of values
analyte
the delta,
SMA 6, the delta divided
mens, and the absolute
Resultsfor two discriminant
sults:
the best function
D7, and the bestfunctioninvolving30 variables, denoted
The D7 discriminant function
variablesthatare included
Hence,
would not perform
in the function,the stronger
reported
independent
the discriminantfunctions
so well on other
56 variables
for use in the linear discriminant
werecomputed
by deltacheck
in the first SMA6 set, the delta,
the delta divided
by the time elapsed
values of the two quotients.
computedfrom eachSMA
functions.
6 pair
from
method
eachof the eight
A: the
the absolute
value in the first
between
pairs
of the
value
testedvalue
of
by the analyte
speci
functions are reportedin Re
denoted involving only seven variables,
D30.
is:
F
= .0552I/.NaI
.0775IHCO3I
.471I2KI.0471IClI
.0247ILureaNI
.O959ICrI
+.0159
of the
ureaN1
delta
blood
+1.29 (1)
where
where
a pair. Given
greater
with F less than
classified
Results
Figure
sponding
methods
amongthe curves
become
confidence
±2%, even for a data
variability
dataset in exactly
reestimated
standard
truepositive
95%confidence
At a falsepositive
have about
suggested
discriminant
.x
is the absolute
ureaN1is the value
a cutoff
thanF
value
of the first
value,
for test
urea
pairs
labeled,
check,
x, and
nitrogenof
F
F
as properly
fail the delta
(see Results),
with
are classified
or equal
andpairs
to Fandare
as mislabeled.
1 presents
to falsepositive
of this
theportions
rates
There
of the
less
are
A, B, and C. These
whenwe note
rates
ROCcurves corre
all the than20% for
minorpaper.
of methods
interesting
of the estimated
set as large as ours.
rates,we created
the same way as the first (see Methods)
all thetruepositive
deviationof the differences
estimateswas 0.9, which
intervalof ±2%.
rate of about
a 10% highertruepositive
method, butrecall that
functionon the index
onlydifferences
differences
theevenlessthat 95%
limitsare on the order
To get an idea of the
a second“mislabeled”
of
of these
and
the rates fromit. Indeed,
corresponding
an approximate
between
yields
2% the D30 method
rate than
theperformance
sets is likely
does
any previously
of the
to be better
Page 3
07,030k
FP%
1.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
07
F0
1.495
0.765
0.345
0.115
0.025
0.125
0.215
0.295
0.365
030
TP%
23
41
56
65
71
76
79
82
85
D30
TP%
36
51
60
68
75
79
81
83
86
21
38
49
57
70
76
84
A02
AOl
A05
B
B
A05
A05
(CR)
(B/3)
(CR)
(B/2)
(B12)
(A/2)
Table 1. ROC Values for Discriminant Functions, and Comparisons with Delta Check Methods
2036 CLINICAL CHEMISTRY,Vol. 25,No. 12, 1979
Comparable
TP%
valuesfor methods
Method
A, B, or Cb
(submethod)
(A/4)
a D7 Is the discriminant
and truepositive
is classified as properly labeled, and one whose discriminant function value is less than F Is classified
b B (CR)names the submethod that Only compares the creatinine delta to the threshold of method B; A05 (B13) names the submethod
ttesholds
of method
A05 the deltas checked by method
B and fails the specimen
are named similarly. The method and submethod in this column have the greatest TP% among those for any submethods of Method A, B, or C for which FP%
exceeds the value in the first column by less than 0.1%; there are no such submethods wIth FP% close to 12.5% or 17.5%.
function using seven variables
rates expressed as percentages. F is the cutoff value for the D7 function; an SMA 6 paIr for which the discriminant function value exceeds
(see Materials and Methods) and D30 is that using 30 variables. FP% and TP% denotethe falsepositive
F,,
as mislabeled.
that compares with the
Other methods and submethods
If any three of thesedeltas fail these checks.
than
for methods
Figure
curves:
only
specificity
operating
Figure
of truly
Table
minant
performances
and C. The
can be expectedon any other
C.
us of a universal
sensitivity
if oneis willing
(increased falsepositive
point on the
1 indicates that
mislabeledspecimens.
1 presents the performances
functionsubmethods
of comparable
abovefindings
data set. No such bias exists
A, B, and
1 reminds
increased
possible
characteristic
truepositive
theprice
rate).
of 5% falsepositive
to detect
of ROC
rate)(increased
to pay
is
of decreased
At a reasonable
ROC
one may expect
curverate,
only 5060%
of certain
compares
of the discri
themwith
of Methods
obvious.
and
submethods
are again
the
A, B,
Discussion
Laboratory
differs from that which would have been obtained ifthe test
were correctlyperformed on a specimen
tient.Therefore,mistakes in labeling
manceof tests, andreporting of results
oratory error.In this study,we evaluate
three previously describeddelta
“mislabeled”specimens. Although
of deltachecks,it is animportant
therefore meritsomeconsideration.
Ourresultslead us to two conclusions.
to identifyingmislabeled specimens,
used for the delta checkis relatively
suggested methodsare capable
performance.Second,nocurrently
method offersa trulysubstantial
mislabeled specimens;a practical
identification ofonly halfthe trulymislabeled specimens.
Thefirst conclusion is supported
1: the eight
tests
used for method
little to the performancepossible
MethodC. Thisis undoubtedly
between thevarious tests,especially
values.Althoughwe did not specifically
we speculate thatdeleting serum
(but not both)from the four Method
detrimentaleffect, becausethese
correlated.
As mentionedin Results,
only the 30variable
functionappearsto improve
othermethods,and this result
error may be definedas reportinga valuethat
from
of specimens,
all contribute
the effectiveness
methods
this is not the only function
one,and
the correctpa
perfor
to lab
of
check for detecting
our findings
First,
particular
with respect
methodthe
unimportant:
of approximately
available
guarantee
expectation
all currently
the
delta
of detecting
is correct
same
check
by inspection
A (and
with the four tests
because of the high correlation
among
test for the possibility,
urea nitrogen
C tests would
two tests
of Figure
the D7 method) add
forused
the electrolyte
or creatinine
have little
also arehighly
discriminant
upon
is highly
theperformance
suspect.
of the
Indeed, we
deliberately
discriminant
performance
another
involves
performance
then,
as an overly
(linear)
conclusion:
check.
Any practical
account.
parison
a negligible
checkis another
rerunning
tigation.
drawing
a falsepositive
reporting
laboratory
ratehigher
course,
between
fraction
When
thedelta
trueand falsepositive
light. This
belingrate.
each 100 specimens
5% falsepositive
fail the check, (b) almost
menswill not be detected,
specimensidentified
quentinvestigation,
Of course,
mensthat are not detected
thefirstday’s
second day’svalues forthe patient;thislowers the “cost” of
not detectingmislabeledspecimens
Further,although
we feelthatthe major conclusions discussed
aboveare warranted,we caution
much reliance on theexactvaluesofour ROC
them as more than a guideline
have
function
could
not given the equation
we have so little
be duplicated
This is because
coefficientsthat
by exploiting idiosyncracies
we regardthe 30variable
optimisticupper
deltachecksystem,
onemust not
for the 30variable
faith
on a dataset gathered
the 30variable
it verylikely
of the data
discriminant
boundon the performance
we are led directly
expecttoo much
becausethatits
at
institution.
so many
function
achievesits
set. If,
more
of any
function
to our second
fromthe delta
system
actual
musttake costs aswellasbenefits into
computationof the
by a laboratorycomputer
investigation of a specimen
matter.If mislabeling
the specimen doesnot constitute
Failureof a delta check
and runninga new specimen.
rate of 5% will raise costs at least
of 5% of results.Although
directorswouldpermitoperation
than5%, the exact rate
a matterof individualchoice,
operatingcosts and the benefits
of mislabeled specimens.
one considersthe actual fraction
checkthatactuallywill have
figuresof Table
fractiondepends,of course,
For example,if the mislabeling
is incorrectly identified),
rate means that(a) 5.5% of all specimens
half of the truly
and
(c) about
as possiblymislabeled
proveto have
becausethe test values
by the delta
values,they will likely
The
to a threshold
cost. The
deltaandits com
system
that fails the
simply
adequate
logically
Operating
5% and delay
we thinkit unlikely
at a falsepositive
at which to operate
involving
of identifying
has
is suspected,
inves
requireshould
this way at
that
is, of
a tradeoff
some
of all tests
been mislabeled,
1 are seen in another
on the actual
rate is 1% (one of
operating
failing
the
misla
at a
will
mislabeled
90% of the
will, on subse
false alarms.
of the mislabeled
checkdiffer
differlittle
speci
5.5% of
been
speci
fromlittle
fromthe
by an unknownamount.
our readersnot to place
figures, ortouse
of the point
too
for the choiceon
Page 4
CLINICALCHEMISTRY,Vol. 25, No. 12, 19792037
the
system.
There
First, the falsepositive
the true values
mislabeled
Second, the truepositive
because they refer only to detection
Both of theseconsiderations
equally,and do not, therefore,
furtherdoubttheir
to seriously weaken
A thirdreason
tern of clinicalcharacteristics
be representative
institutions.This
deltacheck methods.
truthof two propositions:
patient
are apt to be more “alike” than specimens
randomfrom two different
term changesin the test results
homogeneouspopulation
to a screening clinic) may violate
populationof desperately
calamitousshifts in clinical status
sive care unit) may violate
delta check ROC curves for such patient
seemsreasonablethatthey would
tive positionsas in our study.
ROCcurve at whichto operate theirown deltacheck
are three reasonsfor our caution.
rates we find are upper
our data set probably
boundson
becausecontains a few
specimens.
rates we find may be lower bounds
of specimen
affectall delta
weaken our first conclusion.
of sufficient quantitative
our secondconclusion.
for somecautionis that
in our group
of the characteristics
could change the ROC curves
All deltacheckmethods
(a) serial specimens
mislabeling.
check methods
We
being importance
the particular
of patients
of patients
for the various
depend
from
pat
may not
at other
on the
a single
selected
short
are unlikely.
at
patients, and (b) “large”
of a patient
of patients(e.g., “normals”
the first proposition,
ill patients,undergoing
(e.g., patients
the second. We cannot
A
coming
and a
andabrupt
in an inten
predict
However,
the same
the
groups. it
still display
If so, our first conclusion
rela
would
be insensitive
cumstance
(poorer
appears
Thus, despite
thanqualitative
suggested
tecting
proper
tion can be relaxed.
to differences
would
discrimination)
conservative.
some cautions
matters,
deltacheck
mislabeled
laboratory
in groups of patients.
likely shiftthe ROC curves
so thatour second
Either
to the right
conclusion
cir
still
concerningquantitative
that
equivalently
rather
we have shown
methods
specimens,
vigilanceconcerning
all currently
perform
but none perform
in de
so well that
identificaspecimen
This work was supported in part by NIH GrantGM 16496, GM
00001.
References
1. Ladenson,
puter
to identify “laboratory
(1975).
2.Whitehurst, P.,DiSilvio, T. V.,and l3oyadjian,
discrepanciesin patients’
quality control. Clin. Chem. 21,8792 (1975).
3. Wheeler, L. A., and Sheiner, L. B., Delta
Technicon SMA 6 continuousflow
(1977).
4. Anderson,T. W., An Introduction
Analysis.
John Wiley and Sons, Inc., New York,NY 1958.
5. Nie, N. H., Hull, C. H., Jenkins,
Package for the Social Sciences),
New York, NY, 1975.
J. H., Patients as their own controls:
error.” Clin.
Use of the com
21, 16481653
Chem.
G.,Evaluation of
of computerassisted resultsanaspect
checktablesfor the
analyzer.
Clin. Chem. 23, 216219
to Multivariate Statistical
J. G., et al., SPSS
2nd ed., McGrawHill
(Statistical
Book Co.,