Techniques for estimating health care costs with censored data: an overview for the health services researcher.
ABSTRACT The aim of this study was to review statistical techniques for estimating the mean population cost using health care cost data that, because of the inability to achieve complete followup until death, are right censored. The target audience is health service researchers without an advanced statistical background.
Data were sourced from longitudinal heart failure costs from Ontario, Canada, and administrative databases were used for estimating costs. The dataset consisted of 43,888 patients, with followup periods ranging from 1 to 1538 days (mean 576 days). The study was designed so that mean health care costs over 1080 days of followup were calculated using naïve estimators such as fullsample and uncensored case estimators. Reweighted estimators  specifically, the inverse probability weighted estimator  were calculated, as was phasebased costing. Costs were adjusted to 2008 Canadian dollars using the Bank of Canada consumer price index (http://www.bankofcanada.ca/en/cpi.html).
Over the restricted followup of 1080 days, 32% of patients were censored. The fullsample estimator was found to underestimate mean cost ($30,420) compared with the reweighted estimators ($36,490). The phasebased costing estimate of $37,237 was similar to that of the simple reweighted estimator.
The authors recommend against the use of fullsample or uncensored case estimators when censored data are present. In the presence of heavy censoring, phasebased costing is an attractive alternative approach.
 Citations (31)
 Cited In (0)

Article: Medical cost analysis: application to colorectal cancer data from the SEER Medicare database.
[Show abstract] [Hide abstract]
ABSTRACT: Incompleteness is a key feature of most survival data. Numerous well established statistical methodologies and algorithms exist for analyzing life or failure time data. However, induced censorship invalidates the use of those standard analytic tools for some survivaltype data such as medical costs. In this paper, some valid methods currently available for analyzing censored medical cost data are reviewed. Some cautionary findings under different assumptions are envisioned through application to medical costs from colorectal cancer patients. Cost analysis should be suitably planned and carefully interpreted under various meaningful scenarios even with judiciously selected statistical methods. This approach would be greatly helpful to policy makers who seek to prioritize health care expenditures and to assess the elements of resource use.Contemporary Clinical Trials 11/2005; 26(5):58697. · 1.99 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Medical expenditure data typically exhibit certain characteristics that must be accounted for when deriving cost estimates. First, it is common for a small percentage of patients to incur extremely high costs compared to other patients, resulting in a distribution of expenses that is highly skewed to the right. Second, the assumption of homoscedasticity (constant variance) is often violated because expense data exhibit variability that increases as the mean expense increases. In this paper, we describe the use of the generalized linear model for estimating costs, and discuss several advantages that this technique has over traditional methods of cost analysis. We provide an example, applying this technique to the problem of determining an incidencebased estimate of the cost of care for patients with diabetes who suffer a stroke.Health Services and Outcomes Research Methodology 05/2000; 1(2):185202.  SourceAvailable from: Andrew Briggs[Show abstract] [Hide abstract]
ABSTRACT: We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) singledistribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)part and Tobit models, (VII) survival methods, (VIII) nonparametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the nearnormality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work.Health Economics 08/2011; 20(8):897916. · 2.14 Impact Factor
Page 1
© 2012 Wijeysundera et al, publisher and licensee Dove Medical Press Ltd. This is an Open Access
article which permits unrestricted noncommercial use, provided the original work is properly cited.
ClinicoEconomics and Outcomes Research 2012:4 145–155
ClinicoEconomics and Outcomes Research
Techniques for estimating health care costs
with censored data: an overview for the health
services researcher
Harindra C Wijeysundera1–5
Xuesong Wang5
George Tomlinson2,4
Dennis T Ko1,3–5
Murray D Krahn2–4,6
1Division of Cardiology, Schulich
Heart Centre and Department
of Medicine, Sunnybrook Health
Sciences Centre, University of
Toronto, 2Toronto Health Economics
and Technology Assessment (THETA)
Collaborative, University of Toronto,
3Department of Medicine, University
of Toronto, 4Institute of Health
Policy, Management and Evaluation,
University of Toronto, 5Institute for
Clinical Evaluative Sciences, 6Leslie
Dan Faculty of Pharmacy, University
of Toronto, Toronto, Ontario, Canada
Correspondence: Harindra C
Wijeysundera
2075 Bayview Avenue, Suite A209D
Toronto, Ontario, Canada M4N3M5
Tel +14164804527
Fax +14164804657
Email harindra.wijeysundera@
sunnybrook.ca
Objective: The aim of this study was to review statistical techniques for estimating the mean
population cost using health care cost data that, because of the inability to achieve complete
followup until death, are right censored. The target audience is health service researchers
without an advanced statistical background.
Methods: Data were sourced from longitudinal heart failure costs from Ontario, Canada, and
administrative databases were used for estimating costs. The dataset consisted of 43,888 patients,
with followup periods ranging from 1 to 1538 days (mean 576 days). The study was designed so
that mean health care costs over 1080 days of followup were calculated using naïve estimators
such as fullsample and uncensored case estimators. Reweighted estimators – specifically, the
inverse probability weighted estimator – were calculated, as was phasebased costing. Costs
were adjusted to 2008 Canadian dollars using the Bank of Canada consumer price index (http://
www.bankofcanada.ca/en/cpi.html).
Results: Over the restricted followup of 1080 days, 32% of patients were censored. The full
sample estimator was found to underestimate mean cost ($30,420) compared with the reweighted
estimators ($36,490). The phasebased costing estimate of $37,237 was similar to that of the
simple reweighted estimator.
Conclusion: The authors recommend against the use of fullsample or uncensored case estima
tors when censored data are present. In the presence of heavy censoring, phasebased costing
is an attractive alternative approach.
Keywords: health care costing, heart failure, incomplete data, statistical techniques, phase
based costing
Introduction
Accurate estimates of health care costs have a wide range of applications and are of
growing importance to both policy makers and clinicians, given the burgeoning costs
of health care delivery, budgetary constraints, and the aging population. Therefore,
it is important for health services researchers to be familiar with robust methods for
description, inference, and prediction using costing data.
A number of statistical properties of costing data preclude the use of traditional
statistical tools.1,2 There is a rich econometric and statistical literature focused pre
dominantly on three specific properties of cost data: first, a substantial proportion of
the general population may be healthy, requiring little medical care and having zero
costs; second, the distribution of health care costs for those who do incur costs is
usually heavily right skewed, with a few very highcost individuals on the tail; third,
investigators have shown that the assumption of homoscedasticity (ie, constant variance
Dovepress
submit your manuscript  www.dovepress.com
Dovepress
145
R E v I E W
open access to scientific and medical research
Open Access Full Text Article
http://dx.doi.org/10.2147/CEOR.S31552
Number of times this article has been viewed
This article was published in the following Dove Press journal:
ClinicoEconomics and Outcomes Research
30 May 2012
Page 2
ClinicoEconomics and Outcomes Research 2012:4
in the error term) is often violated and thereby alternative
modeling techniques are required.2–6
A fourth obstacle is incomplete data when health care
expenses are not available for all participants for the entire
period of interest. Although this area is one of active research,
much of this work has been presented in health economics or
statistical journals.3,7–13 The objective of the present review is
to examine this fourth obstacle in detail, targeting an audience
of health services researchers without an advanced statistical
background. The authors will focus on the basic operation
of estimating mean health care costs, using both simulations
and a case study to illustrate these concepts. In the process,
the goal is to provide some of the necessary background to
make this important area of study more accessible.
The case study was of patients with heart failure (HF) in
Ontario, Canada.14 Briefly, all patients with an admission for
HF, based on International Classification of Disease Version
10 Code I50, during the period 2004–2006 were identified
in the Canadian Institute for Health Information’s Discharge
Abstract Database. Costs for hospital admission, sameday
surgeries, physician services, ambulatory care, and HF
medications were estimated in 30day intervals until March
31, 2008.14 Throughout the text, the example of cumulative
3year costs, approximated as 1080 days based on the 30day
costing interval, will be used. Costs were adjusted to 2008
Canadian dollars using the Bank of Canada consumer price
index (http://www.bankofcanada.ca/en/cpi.html). The dataset
consisted of 43,888 patients, with followup periods ranging
from 1 day to 1538 days (mean 576 days). Mean age was
76 years (range 25–106 years), with 51% females and 72%
with an ischemic cardiomyopathy.
Cumulative cost functions
For a longitudinal health care costing study, the costing value
of greatest interest is the mean health cost (also known as
incidencebased costs), defined as the cumulative cost from
the index event over some interval. The incidencebased
costs must be contrasted with prevalencebased costs,
where the costs for the entire population are assessed in a
crosssectional fashion and are then divided by the number
of members. Incidencebased cumulative cost functions for
an individual can be complex, as illustrated in Figure 1A.
The rate of cost accumulation tends to increase around
index events such as hospitalizations and death, as shown
by the dashed line and the varying slope of the solid curve
in Figure 1A. Moreover, the pattern of cost accumulation
may be different between any two individuals. One could
theoretically follow all participants until death; however,
death will rarely be observed for every participant because
of short study horizons. Indeed, the portion of health care
cost that is unobserved in this setting may be especially
important, because health care costs tend to rise dramatically
in the period prior to death.2,15–17 To avoid this issue, a study
may instead focus on the mean total costs for a restricted time
period (eg, 1080day total health care costs).18 This creates
two major issues.
Observed costs Ci
total
= Ai(ti) = Ai(Ti
L)
= Ai(ti) = Ai(Ti
c)
Un
observed
costs
Ti
death or
complete
followup
L: time till
Ti
censoring
c : time until
L
C
L
Ti
L
Observed costs Ci
total
t
t
Total costTotal cost
AB
Figure 1 (A) Cumulative costs and flow of costs in complete case; (B) cumulative costs in censored case.
Notes: S(t) is probability of survival; Sc(t) is probability of being uncensored; t is followup time in days; C indicates censored time; L indicates the restricted time limit;
the solid line shows cumulative costs over time; the dashed/dotted line shows the rate of cost accumulation or flow of costs at a particular time; shaded area represents
unobserved costs accrued from the time of being censored to either death or the full time period of interest; t is followup time in days.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
146
Wijeysundera et al
Page 3
ClinicoEconomics and Outcomes Research 2012:4
First, among the participants who die, death drives
up costs in the period before death as seen in Figure 1A.
Conversely, cumulative costs may in fact be driven down
because no costs are accrued after death. The accepted
method of dealing with this is to consider death as a terminal
event.7,9,11,12,18 Subjects will accrue costs until they die, or until
they reach the time horizon of the analysis. A complete case
is defined as one in which death occurs, or where followup
is complete until the end of the restricted time period. In each
of these situations, participants are no longer accumulating
relevant costs.
The second issue is how to deal with the individuals who
are not complete cases. A portion of the relevant health costs
for these participants will be unobserved, as illustrated by the
shaded area in Figure 1B.18 Such data are said to be right cen
sored, defined as an observation that ends prematurely, before
the outcome of interest has occurred (death or 1080 days, in
the present example).18 Right censoring of health care costs
can arise from a number of mechanisms. Patients may be
lost to followup at varying times; alternatively, a study may
enroll patients over a period of time but discontinue followup
on a fixed calendar date. In both of these cases, the censoring
occurs completely at random, and the observed health care
costs represent the lower limit of the relevant costs. One way
of adjusting cumulative cost estimates for censoring is to
develop a function that describes the way in which data are
censored and to use that function to reweight the observed
cost data. KaplanMeier techniques are a wellestablished
method to achieve such reweighting.
KaplanMeier estimates
for survival and censoring
First, the traditional KaplanMeier estimator for survival will
be reviewed, and then an analogous estimator for censoring
will be introduced.12 Please see Table 1 for explanation of
the nomenclature in this section. A traditional KaplanMeier
estimator, S(t) is the probability of surviving beyond a time, t.
In this method, patients who are censored are no longer at
risk for death and are therefore excluded. The probability of
survival for any interval is equal to the proportion surviving
among those still at risk of death at the beginning of the
interval (ie, uncensored cases). The KaplanMeier estima
tor at time t is calculated by multiplying the probabilities of
surviving each time interval preceding point t – hence, it is
also referred to as the productlimit estimator.
The KaplanMeier estimate for censoring, Sc(t), is defined
as the probability for being uncensored beyond time t.12 Here,
the role of death and censoring are reversed relative to a
conventional survival analysis. Censoring is the outcome of
interest, and death simply means that the patient is excluded
from further observations. The risk of being uncensored in
a particular interval is calculated for those who are “at risk”
of being censored at the beginning of the interval. These are
the patients who have not been removed or excluded – that
is, those who have not died or been censored. Again, Sc(t) for
time t is the product of all probabilities of being uncensored
across intervals prior to time t.
To illustrate these concepts, four hypothetical patients
are presented in Table 2, followed over 6 months. Patients
A and B are followed for all 6 months, while patient C
dies in month 3 and patient D is censored in month 4. The
components for both the KaplanMeier estimates for sur
vival, S(t), and the KaplanMeier estimates for censoring
conditional on being alive, Sc(t), are shown on the right of
the Table 2. When calculating the KaplanMeier estimate
for survival, it is necessary to determine the probability of
death and of survival for each month. These are shown with
the number of patients at risk at the beginning of the month
in the denominator. Importantly, patients who are censored
are removed from the denominator. For example, in the third
month, four patients are at risk for death at the beginning of
the month, with three alive at the end of the month (prob
ability of survival is 3/4 = 0.75). In month 5, only two are at
risk for death at the beginning of the interval, because one
patient was censored in the previous month (probability of
survival is 2/2 = 1). S(t) is the product across the months of
the probability of survival: S(4) = 1*1*0.75*1 = 0.75.
The corresponding calculations for Sc(t) are shown on
the far right side of Table 2. Here, the denominator for each
interval contains only patients at risk for censoring at the
Table 1 Nomenclature
TermDefinition
S(t)
Sc(t)
i
N
j
K
Ci
ti
Ti
Probability of being alive beyond time t
Probability of being uncensored beyond time t
Individual
Total number of individuals in study
Cost interval (ie, 30 days)
Total number of costing intervals
Accumulated cost for individual i
Period of observation for individual i
Time of observation until death/cure/end of relevant period
for an individual who is considered a complete observation
Time of observation until censoring for an individual who
is censored
The cost function used to estimate cumulative cost until
time t for patient i
The total cost for each subinterval j for each patient i
Rate of cost accumulation
total
L
Ti
C
Ai(ti)
Mi
R
j
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
147
Censored health care costs
Page 4
ClinicoEconomics and Outcomes Research 2012:4
beginning of the interval; patients who died in the preceding
interval are removed. For example, at the beginning of the
fourth month, only three patients continue to be at risk
for censoring. In the end of the fourth month, one patient
was censored, so the probability of being uncensored is
2/3 = 0.67. The KaplanMeier estimate Sc(t) is the product
across intervals of the probability of remaining uncensored:
Sc(4) = 1*1*1*0.67 = 0.67.
In Figure 2A, the KaplanMeier survival curve is
constructed from the HF study over a followup period of
1080 days, with the probability of survival, S(t), at the end
of followup being 43%. It is evident that the probability of
dying – the complement of S(t) – increases with larger values
of t, after accounting for censoring.
Over the full followup period of 1080 days, 14,107 patients
of the original 43,888 patients were censored and therefore
were no longer available for observation. In Figure 2B, the
corresponding KaplanMeier curve is constructed, with the
probability of being uncensored, Sc(t), decreasing at greater
values of t. It is important to note that at greater values of time
t, the probability of censoring increases – the complement
of Sc(t).
Restricted time period total costs
First, the issues related to censoring in a restricted time period
will be tackled. In order to understand the techniques, some
nomenclature is necessary (see Table 1). Let N be the total
sample size of the study, including both censored and uncen
sored cases. For each participant, i, there is an observed accu
mulated medical cost, denoted by Ci
an observation time, denoted by ti. For complete cases who
are observed until death or until the end of the restricted time
period, ti is equal to the time to death/restricted time limit,
denoted by Ti
censoring, denoted by Ti
defined for each participant, ∆i, which will take the value of
0 for censored cases and of 1 for complete cases. Ci
each participant will be expressed as a function Ai:
total. Each individual has
L. For a censored case, ti is equal to the time to
C. Finally, an indicator variable is
total for
Ci
total = Ai(ti) (1)
Each of these terms is illustrated in Figure 1A and B.
Figure 1A shows the cumulative costs over time for a com
plete case, defined as a participant who is observed until
Ti
the censoring time, Ti
censored patient will continue to accumulate relevant costs
(ie, until Ti
L. Figure 1B is a censored patient, observed only until
C. As illustrated by the shaded area, a
L) and these will be unobserved.
Table 2 Hypothetical patient cohort to illustrate KaplanMeier techniques
Data
Survival
Censoring
Month
Patient A
Patient B
Patient C
Patient D
Probability
of death
within interval
Probability
of survival
within interval
S(t)
Probability
of censoring
within interval
Probability of
being uncensored
within interval
Sc(t)
1
x
x
x
x
0/4
4/4 = 1
1
0/4
4/4 = 1
1
2
x
x
x
x
0/4
4/4 = 1
1*1 = 1
0/4
4/4 = 1
1*1 = 1
3
x
x
Died
x
1/4
3/4 = 0.75
1*1*0.75 = 0.75
0/4
4/4 = 1
1*1*1 = 1
4
x
x
Censored
0/3
3/3 = 1
1*1*0.75*1 = 0.75
1/3
2/3 = 0.67
1*1*1*0.67 = 0.6 7
5
x
x
0/2
2/2 = 1
1*1*0.75*1*1 = 0.75
0/2
2/2 = 1
1*1*1*0.67*1 = 0.67
6
x
x
0/2
2/2 = 1
1*1*0.75*1*1*1 = 0.75
0/2
2/2 = 1
1*1*1*0.67*1*1 = 0.67
Notes: S(t) represents the KaplanMeier estimate for survival, defined as the probability of survival beyond time t; Sc(t) represents the KaplanMeier estimate for censoring, defined as the probability of being uncensored beyond time t,
x, indicates that patient was observed in that month.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
148
Wijeysundera et al
Page 5
ClinicoEconomics and Outcomes Research 2012:4
Fullsample and uncensored
case estimators
Two potential estimators for mean restricted time total costs
(Ci
uncensored case estimators.1,9,13 In the fullsample estima
tor, the accumulated cost for each participant is averaged,
irrespective of whether the patient died, was observed for
the full followup period, or was censored.1,13 As censored
patients will continue to accumulate relevant costs while
unobserved (shaded portion in Figure 1B), the fullsample
estimator would include only a portion of their relevant costs,
and therefore it will always be an underestimate.1
In the uncensored case estimator, only the values from
complete cases are used.13 As illustrated in Figure 2B, the
probability of remaining uncensored, Sc(t), is not uniform
at all values of t. Instead, as t increases, the probability of
being uncensored, Sc(t), decreases. Therefore, the uncen
sored case estimator would be biased toward the costs
of participants who died early – those who had smaller
values of ti.1,13
total) in the face of censored data are the fullsample and
Reweighted estimators
One approach to estimate mean health care costs when
censoring is present is to reweight each complete case so
that each complete case represents not only itself but also
some number of incomplete/censored cases. In this setting,
the cumulative cost of each participant who died or reached
the full period of observation must represent not only the
cost of that participant but also the censored cases that
would have been observed had there been no censoring.
The number of censored cases that must be represented by
a complete case at observation time t is proportional to the
probability of that case being censored.18,19 It follows that
costs for complete cases with a short followup should be
weighted less than cases with a longer observation period,
accounting for the higher probability of censoring with
longer observation periods.
Different reweighted estimators have been
developed.1,9,13,18,20,21 These are conceptually similar and are
equivalent under certain conditions.12,21 The Lin 1997 esti
mator was the first to be described and is based on dividing
observation time into a number of equal intervals.9 Lin et al9
described two alternative methods: one if cost histories are
available, and a second if only total cumulative costs are
available for all individuals. In the latter, more basic scenario,
the mean cost for each interval is calculated, based only on
the costs of patients who die during the interval. The cumula
tive cost for the entire period of observation is the sum of the
mean costs for each interval, weighted by the KaplanMeier
probability of surviving to the beginning of each interval.9
A limitation of the Lin 1997 estimator is the assumption of
discrete censoring times that coincide with the beginning of
the costing intervals.22 Bang and Tsiatis7 described an inverse
probability weighted (IPW) estimator that did not require
interval costs and which accommodated continuous censoring
times. As an illustration, the IPW method of Bang and Tsiatis7
will be worked through in detail here. Interested readers are
encouraged to refer to the source documentation for a full
description of the other estimators, and for recommendations
as to their appropriate use.1,9,12,13,18,20,21
In the IPW estimator, sample weighting is done
using the KaplanMeier estimate for censoring, Sc(ti).1,21
Each uncensored participant (∆i value of 1) with Ti
L
1
0.8
0.6
0.4
0.2
0
0
200
400
600
800
1000
S(t)
t
1
0.8
0.6
0.4
0.2
0
0
200
400600800
1000
S(t)
t
A
B
Figure 2 (A) KaplanMeier survival curve (B) KaplanMeier curve for censoring
Notes: S(t) is probability of survival; Sc(t) is probability of being uncensored; t is followup time in days.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
149
Censored health care costs
Page 6
ClinicoEconomics and Outcomes Research 2012:4
of observation time has Sc(Ti
uncensored, as seen in Figure 2B. Each uncensored
observation represents on average 1/Sc(Ti
censored (∆i value of 0).12 Because uncensored observations
are weighted by the inverse of Sc(ti), it is apparent that
patients who die early in the study (smaller values of ti),
and who therefore have smaller values of Ti
less than those who die at longer followup times or who
are followed up until the restricted time limit. The mean
IPW total cost is estimated as:
L) probability of being
L) patients who are
L, are weighted
1/N[∑i
n∆iAi(ti)/Sc(ti)] (2)
Several key points from this merit discussion. Costs from
all individuals are included, as N is the full sample. However,
the costs of the censored participants are multiplied by the
indicator variable of “0,” with only the costs of complete
participants reweighted accordingly. An important limita
tion for this estimator is inefficiency, because only data from
uncensored/complete cases inform the final value.13 Using
simulation, Raikou and McGuire13 found that in the presence
of very heavy censoring (.50%), the simple IPW estimator
becomes unstable.
An alternative “partitioned estimator” is possible when
cumulative cost histories are available for each participant.
This is shown in Figure 3, where costs are available for sub
intervals of the full period of observation.12 Censored patients
are likely to have full costs for some of the subintervals. For
example, in Figure 3, patient 2 is a complete case over the
entire restricted time period (shaded area), and therefore
patient 2 has complete costs for all four subintervals; patient
1 is censored in subinterval 3 but has full costs for subinterval
1 and 2 (shaded area). Because a censored patient is likely
to have complete costs for some intervals, it is possible to
make use of these data to further inform the estimator of
mean cost.
Bang and colleagues7,21 developed a partitioned exten
sion of their IPW estimator, in which the total time period
is divided into K partitions or subintervals. For each subin
terval, denoted as j, a participant will either be censored or
have full observation, defined as dying within the subinterval
or observation for the full subinterval. Thus, one can define
variables ∆i
est. Mi
calculated as the difference between cumulative cost up to
the end of the subinterval j and the cumulative cost in the
preceding subinterval. This is given by the formula:
j, Ti
C, Ti
L, ti
j specific to each subinterval j of inter
j designates the total cost for each subinterval j. Mi
j is
Mi
j = [Ai
j(ti
j) − Ai
(j−1)(ti
(j−1))] (3)
For illustration, in Figure 3, the cost for patient
1 for subinterval 2 is the difference between the
entire shaded area – the first term in equation 4: Ai
the shaded area to the left of the line separating the first and
second interval – the second term in equation 4: Ai
By summing the cost estimate for each subinterval, the mean
j(ti
j) – and
(j−1)(ti(j−1)).
123
t
4
Total cost
Patient 1
Patient 2
Figure 3 Partitioned cost histories: the full period of observation is subdivided into four partitions. Patient 1 is censored in partition 3, while patient 2 is a complete case.
Notes: Shaded area represents partitions for patients 1 and 2, where full data is available; t is followup time.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
150
Wijeysundera et al
Page 7
ClinicoEconomics and Outcomes Research 2012:4
total cost can be determined. The mean partitioned IPW
estimator for total restricted time costs will then be:
1/N[∑i
n∑j
K ∆i
jMi
j/Sc(ti)] (6)
Investigators have shown that the Lin 1997 method
and the IPW estimator are equivalent when the intervals
for the Lin 1997 method become infinitesimally small (ie,
approach continuous censoring time).12 In order to extend
beyond estimation of the mean and make formal inferences,
both the Lin 1997 and BangTsiatis methods allow for the
calculation of variances. These calculations are necessarily
complex – readers are encouraged to review the source
documentation on this area and are strongly encouraged to
involve a statistician. Moreover, using the simple IPW or
the partitioned IPW as response variables, these methods
can be expanded within a regression framework to control
for covariates.10,11,18 However, the IPW techniques have a
number of limitations, especially when evaluating covari
ate effects, as the effects on cost accumulation cannot be
distinguished from the effects on survival.22 Moreover,
these techniques do not account for the differential rates
of heath care cost accumulation near death, as seen in
Figure 1A and B. Alternative models have been developed
to deal with these issues.22
Simulations
The authors used a similar simulation method to Basu and
Manning22 to generate a cohort of 1000 patients, evaluated
over a maximum of ten equally spaced intervals. Patients
who died or who completed observation until the end of the
ten periods were considered to be complete observations.
Survival and censoring times were generated from an expo
nential distribution and a uniform distribution, respectively.22
As per previous investigators, the present authors generated
a cumulative cost profile for individuals, such that there was
an increased initial cost reflecting diagnosis and an increased
terminal cost in the event of death.
The authors used combinations of censoring and survival
times to create datasets with increasing degrees of censoring.
Using 500 simulations per dataset, the authors then compared
a fullsample, uncensored, and simple IPW estimator with the
true mean costs. These results are shown in Table 3.
As expected, with increasing censoring, the fullsample
estimator underestimated the true costs. The simple IPW
estimator performed well with mild to moderate degrees of
censoring in the simulated datasets; however, with heavy
censoring (53%) it substantially overestimated true costs.
This is consistent with reports with other investigators as to
its instability in the presence of high censoring.
HF case study
Using data from the 43,888 patients in the HF case study,
the authors calculated estimators for the mean 1080day total
cost. Cost histories were available for 180day partitions.
Statistical models were created using R software (v 2.9.0; R
Foundation for Statistical Computing, Vienna, Austria) and
are available upon request. Of the 43,888 patients, 32.1%
were censored over the 1080day restricted time period, with
50.9% of patients dying and 17% having complete followup
to 1080 days. In Table 4, the fullsample estimator, uncen
sored case estimator, simple IPW estimator, and the parti
tioned 180day estimator are shown. In addition, the authors
estimated costs using the Lin 1997 method based on total
accumulated costs. Two versions of the Lin 1997 method,
using 180 and 30day intervals, were utilized to highlight
issues that may arise from the choice of timeinterval.
As anticipated, the fullsample estimator was the lowest,
at $30,420 for the 3year (1080day) period, which is a biased
underestimate. A total of 14,107 patients were censored
within the restricted time period and had costs that would
Table 3 Simulations to evaluate impact of censoring
CensoringMean teninterval
cumulative costs ($)a
Interquartile
range
7% Censoring
True costs
Fullsample estimator
Uncensored case
estimator
Simple IPW
18% Censoring
True costs
Fullsample estimator
Uncensored case
estimator
Simple IPW
21% Censoring
True costs
Fullsample estimator
Uncensored case
estimator
Simple IPW
53% Censoring
True costs
Fullsample estimator
Uncensored case
estimator
Simple IPW
8.29
7.49
7.68
8.21–8.38
7.41–7.56
7.61–7.77
8.067.97–8.15
8.29
7.03
7.50
8.20–8.37
6.96–7.10
7.42–7.58
8.498.39–8.59
9.07
7.57
8.20
9.00–9.16
7.49–7.65
8.12–8.28
9.359.24–9.45
7.45
4.90
5.28
7.37–7.53
4.89–5.04
5.18–5.38
9.87 9.64–10.1
Note: aCosts adjusted to 2008 Canadian dollars using the Bank of Canada consumer
price index (http://www.bankofcanada.ca/en/cpi.html).
Abbreviation: IPW, inverse probability weighting.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
151
Censored health care costs
Page 8
ClinicoEconomics and Outcomes Research 2012:4
have otherwise accrued in the absence of censoring (ie, the
shaded portion in Figure 2B). The uncensored estimate is
higher, at $33,940, and disproportionately biased patients
with short survival times, who in this dataset have higher
costs. The simple IPW cost of $36,490 only makes use of the
67.2% of data not censored. With the partitioned IPW estima
tor, which makes use of data from all the subjects, the estimate
for mean 1080day cumulative cost was $33,230. In contrast,
the Lin 1997 method, based on intervals of 180 days, provides
a substantially lower mean estimate of $20,059, while the
Lin 1997 estimate using a 30day interval of $37,042 closely
approximated the simple IPW estimate. This highlights the
differences between the Lin 1997 methods and the IPW
estimator when longer time intervals are used.
Lifetime costs
Although using a restricted time period allows one to circum
vent the issue of extrapolating lifetime costs and is often used
in practice, a restricted time period cost has important limita
tions.18 For example, two patients may have the same lifetime
cumulative costs but because of differences in survival times
(ie, one patient dies at 3 years and the other dies at 5 years),
may have substantially different timerestricted costs at
3 years.18 When studying interventions with significant influ
ences on mortality, having the same distribution of lifetime
costs in the control and study groups is not synonymous with
having the same distribution of timerestricted costs, because
the survival distributions in the groups may be different.
Given the critical relationship between survival time and
health care costs, it is tempting to use KaplanMeier techniques,
substituting time to death with cost to death as the dependent
variable. However, investigators have found that this results
in biased estimates.3,8,18,23 A fundamental requirement for a
KaplanMeier survival curve is independent censoring.3,8,18,23
For survival time, this requires that the time to censoring is
independent of the time to death. In most cases this is true;
however, in the parallel form for costs, the cumulative cost to
censoring for a particular participant will not be independent
from the cumulative costs to death because both are related
to the participant’s unique pattern of cost accumulation
(Figure 1A).3,8,18,23 This is most obvious in the situation of a
constant rate of cost accumulation, R, where the cumulative
cost at censoring time, Ti
that at time of death, Ti
independent but are related to each other by R.3,8,18,23
C, is simply the product of R*Ti
L, is R*Ti
C, while
L. Both values are clearly not
Phasebased costing
An alternative method for estimating cumulative costs is
using is a phasebased modeling approach.14,24–26 This is par
ticularly attractive for estimating lifetime costs or cost in the
presence of heavy censoring. The steps for the phasebased
approach are as follows:14,24–26
1. Define a priori clinically important phases of disease.
Examples are the phase immediately after diagnosis,
associated with higher costs; a stable phase, with constant
and low costs; and the phase prior to death, which again
has high costs.
2. Determine inflection points in cumulative cost, which
define the duration of each phase. This will be disease
specific.
3. Allocate observation time and costs for each patient to
the phases.
4. Once the costs for all patients have been assigned, deter
mine the mean cost per phase (or per subdivision of each
phase).
5. Using both the data on cost per phase and time to death,
determine the cumulative lifetime costs.
Each of these steps will now be worked through in the
HF example. First, based on content experts, the authors
expected that HF would be characterized by at least three
phases: (1) a postdischarge phase after index hospitalization,
(2) a predeath phase, and (3) a relatively stable phase (Step 1).
To confirm this hypothesis, the authors evaluated the cost per
30 days for patient subgroups that survived 9–12, 21–24, and
33–36 months post discharge (Appendix Figure 1). The mean
30day cost curves confirmed the hypothesis of discrete cost
phases with inflection points separating the postdischarge
and stable phases, and the stable and predeath phases esti
mated at 3 months post discharge and 6 months prior to death,
respectively (Step 2).
The cumulative cost history for each individual over the
1080day period of the study was partitioned and sequentially
allocated to phases (Step 3). For example, for each patient the
Table 4 Mean 1080day costs using different estimating methods
Estimating methodMean 1080day
cumulative costs ($)a
Interquartile
range
Fullsample estimator
Uncensored case
estimator
Simple IPW
Partitioned IPW
Lin 1997 (180day interval)
Lin 1997 (30day interval)
30,420
33,940
10,060–37,850
11,480–42,890
36,490
33,230
20,059
37,042
0–44,620
10,260–40,550
NAb
NAb
Notes: aCosts adjusted to 2008 Canadian dollars using the Bank of Canada
consumer price index (http://www.bankofcanada.ca/en/cpi.html); bthe Lin 1997
method produces a single mean value for the sample, as opposed to a reweighted
estimate for each individual – as such, an interquartile range is not available.
Abbreviations: NA, not applicable; IPW, inverse probability weighting.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
152
Wijeysundera et al
Page 9
ClinicoEconomics and Outcomes Research 2012:4
cumulative costs for the first 3 months of observation were
assigned to the postdiagnosis phase, the costs associated
with the 6 months prior to death were assigned to the pre
death phase, and the remainder were assigned to the stable
phase. Once the entire cohort was analyzed in this manner, a
mean cost was calculated for each of the phases (Step 4). In
the present study, the mean costs were determined for each
30day block within each phase (Appendix Table 1). Other
investigators have used a simpler approach in which a single
mean cost is determined per phase.27 It is important to note
that costs should be adjusted to the current year in order to
account for health care inflation, using a multiplier such as
the consumer price index.
To calculate cumulative costs, one utilizes both the mean
costs per phase and a survival function that spans the time
horizon of the study (lifetime or shorter) (Step 5). Although
the survival and cost data are from the same cohort in the
earlier techniques, this need not be the case in the phasebased
approach.28 In the present study, the authors used a survival
curve from a separate HF cohort that had been followed for
12 years, over which period 99% of patients died.
First, the survival curve is divided into intervals. In the
present example the authors used 30day time intervals.
For any time interval on the survival curve, the proportion
of the original cohort in each phase is determined. This
proportion is multiplied by the mean cost for that particular
phase. In Figure 4, for example, at the 120 to 150day time
interval on the survival curve, 68.4% of the original HF
cohort were in the stable phase – the cost for this phase was
0.684*$617 = $422. None of the patients were in the post
discharge phase, and 10.5% were in the predeath phase (for
a cost of $614). Thus, the cost for t = 120 to 150day interval
is $422 + $614 = $1036. The costs for all time intervals are
calculated in this manner and are summed to produce the
mean cost for the entire time horizon.
The authors found that over a mean life expectancy of
3.87 years, HF patients had a mean lifetime cost of $61,870.14
To provide a comparison with the methods already mentioned,
1
0.8
0.6
0.4
0.2
0
0 200400600 8001000
t
Predeath phase:
10.5%
Post discharge
phase: 0%
Stable phase:
68.4%
S(t)
Figure 4 Merging phasebased costs on the survival curve. For the time interval 120–150 days, 68.4% of the original cohort was in the stable phase, with 10.5% in the
predeath phase. To determine the cost for the time interval of 120–150 days, the proportion of patients in each phase is multiplied by the mean cost per phase, as
shown in Appendix Table A1.
Notes: S(t) is probability of survival; t is followup time in days.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
153
Censored health care costs
Page 10
ClinicoEconomics and Outcomes Research 2012:4
the authors also calculated the mean cost at 1080 days, using a
phasebased approach. The phasebased estimate of $37,237
was similar to that from the other methods – specifically, the
simple IPW and the Lin 1997 methods.
Data comparing such phasebased estimates with those
from IPW methods are sparse, but with investigators to
date finding that they are comparable.26 The benefits of the
phasebased approach are that actual costs for the cohort
over the entire period of interest (ie, lifetime) do not need
to be observed, thereby overcoming the major limitation of
the previous methods.14,24–26 Using these methods, investiga
tors have been able to produce widely used estimates of the
lifetime costs of cancer.26,29 However, greater understanding
of when one technique is favored over another is important
and should be a focus for further methodological study.
Conclusion and recommendations
This review has provided an overview for the uninitiated reader
who wishes to tackle the literature on health care costing with
data that are incomplete because of incomplete followup. The
authors offer the following recommendations:
1. Censoring will have substantial methodological impact
on a study, and investigators must evaluate their data to
determine if any cases are right censored.
2. If censoring is present, the use of either a fullsample
estimator or an uncensored case estimator in the estima
tion of mean cost is potentially inaccurate.
3. The choice of estimator when censoring is present is not
clearcut. Options include a weighted estimator (prefer
ably a partitioned estimator, to make use of all the data
efficiently) or a phasebased approach.
Given the importance of health care costing for compara
tive effectiveness research and in the shaping of future health
policy, the authors believe that further work on developing
accurate yet transparent techniques should be a priority; the
authors’ hope is that this review serves as a stimulus for
such work.
Disclosure
The authors report no conflicts of interest in this work.
References
1. Bang H. Medical cost analysis: application to colorectal cancer
data from the SEER Medicare database. Contemp Clin Trials.
2005;26(5):586–597.
2. Diehr P, Yanez D, Ash A, Hornbrook M, Lin DY. Methods for ana
lyzing health care utilization and costs. Annu Rev Public Health.
1999;20:125–144.
3. Austin PC, Ghali WA, Tu JV. A comparison of several regres
sion models for analysing cost of CABG surgery. Stat Med.
2003;22(17):2799–815.
4. Barber J, Thompson S. Multiple regression of cost data: use of genera
lised linear models. J Health Serv Res Policy. 2004;9(4):197–204.
5. Blough DK, Ramsey SD. Using generalized linear models to
assess medical care costs. Health Serv Outcomes Res Methodol.
2000;1(2):185–202.
6. Mihaylova B, Briggs A, O’Hagan A, Thompson SG. Review of
statistical methods for analysing healthcare resources and costs.
Health Econ. 2011;20(8):897–916.
7. Bang H, Tsiatis AA. Estimating medical costs with censored data.
Biometrika. 2000;87(2):329–343.
8. Etzioni RD, Feuer EJ, Sullivan SD, Lin D, Hu C, Ramsey SD.
On the use of survival analysis techniques to estimate medical care costs.
J Health Econ. 1999;18(3):365–380.
9. Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from
incomplete followup data. Biometrics. 1997;53(2):419–434.
10. Lin DY. Linear regression analysis of censored medical costs.
Biostatistics. 2000;1(1):35–47.
11. Lin DY. Regression analysis of incomplete medical cost data. Stat Med.
2003;22(7):1181–1200.
12. O’Hagan A, Stevens JW. On estimators of medical costs with censored
data. J Health Econ. 2004;23(3):615–625.
13. Raikou M, McGuire A. Estimating medical care costs under conditions
of censoring. J Health Econ. 2004;23(3):443–470.
14. Wijeysundera HC, Machado M, Wang X, et al. Costeffectiveness of
specialized multidisciplinary heart failure clinics in Ontario, Canada.
Value Health. 2010;13(8):915–921.
15. Scitovsky AA, Capron AM. Medical care at the end of life: the interaction
of economics and ethics. Annu Rev Public Health. 1986;7:59–75.
16. Scitovsky AA. “The high cost of dying” revisited. Milbank Q.
1994;72(4):561–591.
17. Scitovsky AA. “The high cost of dying”: what do the data show? 1984.
Milbank Q. 2005;83(4):825–841.
18. Huang Y. Cost analysis with censored data. Med Care. 2009;47(7 Suppl 1):
S115–S119.
19. Etzioni R, Riley GF, Ramsey SD, Brown M. Measuring costs:
administrative claims data, clinical trials, and beyond. Med Care.
2002;40(Suppl 6):III63–III72.
20. Zhao H, Tian L. On estimating medical cost and incremental cost
effectiveness ratios with censored data. Biometrics. 2001;57(4):
1002–1008.
21. Zhao H, Bang H, Wang H, Pfeifer PE. On the equivalence of some
medical cost estimators with censored data. Stat Med. 2007;26(24):
4520–4530.
22. Basu A, Manning WG. Estimating lifetime or episodeofillness costs
under censoring. Health Econ. 2010;19(9):1010–1028.
23. Lipscomb J, Ancukiewicz M, Parmigiani G, Hasselblad V, Samsa G,
Matchar DB. Predicting the cost of illness: a comparison of alterna
tive models applied to stroke. Med Decis Making. 1998;18(Suppl 2):
S39–S56.
24. Brown ML, Riley GF, Potosky AL, Etzioni RD. Obtaining longterm
disease specific costs of care: application to Medicare enrollees diag
nosed with colorectal cancer. Med Care. 1999;37(12):1249–1259.
25. Brown ML, Riley GF, Schussler N, Etzioni R. Estimating health care
costs related to cancer treatment from SEERMedicare data. Med Care.
2002;40(Suppl 8):IV104–IV117.
26. Yabroff KR, Warren JL, Schrag DM, et al. Comparison of approaches
for estimating incidence costs of care for colorectal cancer patients.
Med Care. 2009;47(7 Suppl 1):S56–S63.
27. Krahn MD, Zagorski B, Laporte A, et al. Healthcare costs associated
with prostate cancer: estimates from a populationbased study. BJU Int.
2010;105(3):338–346.
28. Etzioni R, Urban N, Baker M. Estimating the costs attributable
to a disease with application to ovarian cancer. J Clin Epidemiol.
1996;49(1):95–103.
29. Yabroff KR, Warren JL, Knopf K, Davis WW, Brown ML. Estimating
patient time costs associated with colorectal cancer care. Med Care.
2005;43(7):640–648.
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
154
Wijeysundera et al
Page 11
ClinicoEconomics and Outcomes Research
Publish your work in this journal
ClinicoEconomics & Outcomes Research is an international, peer
reviewed openaccess journal focusing on Health Technology Assess
ment, Pharmacoeconomics and Outcomes Research in the areas of
diagnosis, medical devices, and clinical, surgical and pharmacological
intervention. The economic impact of health policy and health systems
Submit your manuscript here: http://www.dovepress.com/clinicoeconomicsandoutcomesresearchjournal
organization also constitute important areas of coverage. The manu
script management system is completely online and includes a very
quick and fair peerreview system, which is all easy to use. Visit
http://www.dovepress.com/testimonials.php to read real quotes from
published authors.
ClinicoEconomics and Outcomes Research 2012:4
12000
10000
8000
6000
4000
2000
0
0
515202530354010
Died between 9 and 12 months
Died between 21 and 24 months
Nth month from index date
Mean cost/30 patient days ($)
Died between 33 and 36 months
Figure A1 Exploratory analysis on phases of longterm costa associated with heart failure care.
Note: aCosts adjusted to 2008 Canadian dollars using the Bank of Canada consumer price index (http://www.bankofcanada.ca/en/cpi.html).
Table A1 Phasebased costing example using heart failure cohort
30day blockObserved costs ($)a
Postdischarge phase
Block 1
Block 2
Block 3
Stable phase
All blocks
Predeath phase
Block 6
Block 5
Block 4
Block 3
Block 2
Block 1
Mean lifetime cost
10,675
2961
2172
617
3062
3501
4077
5119
8716
8308
61,870
Note: aCosts adjusted to 2008 Canadian dollars using the Bank of Canada consumer
price index (http://www.bankofcanada.ca/en/cpi.html).
Appendix
submit your manuscript  www.dovepress.com
Dovepress
Dovepress
Dovepress
155
Censored health care costs