ARTHRITIS & RHEUMATISM
Vol. 58, No. 10, October 2008, pp 3166–3171
© 2008, American College of Rheumatology
The Effects of Total Knee Arthroplasty on Physical Functioning
in the Older Population
Linda K. George, David Ruiz, Jr., and Frank A. Sloan
Objective. Clinical research provides convincing
evidence that total knee arthroplasty (TKA) is safe and
improves joint-specific outcomes. However, higher-level
functioning associated with self care and independent
living has not been studied. Furthermore, most previous
studies of the effects of TKA relied on relatively small
clinical samples. We undertook this study to estimate
the effects of TKA on 3 levels of physical functioning in
a national sample of older adults.
Methods. Data were obtained from the Medicare
Current Beneficiary Survey from 1992 to 2003. Medi-
care claims data identified participants with osteoar-
thritis of the knee who received TKA (n ? 259) or no
TKA (n ? 1,816). Propensity scores were used to match
treatment and no-treatment groups according to demo-
graphic characteristics, comorbid conditions, and base-
line functioning. Three levels of physical functioning
were examined as outcomes of TKA. These levels were
represented by items on the Nagi Disability Scale, the
Instrumental Activities of Daily Living (IADL) Scale,
and the Activities of Daily Living (ADL) Scale. These
items were measured after TKA and at comparable
intervals for the no-treatment group. Average treatment
effects were calculated for relevant Nagi Disability
Scale, IADL Scale, and ADL Scale tasks.
Results. Between baseline and outcome assess-
ments, TKA recipients improved on all 3 levels of
physical functioning; the no-treatment group declined.
Statistically significant average treatment effects for
TKA were observed for one or more tasks for each
measure of physical functioning.
Conclusion. TKA is associated with sizeable im-
provements in 3 levels of physical functioning among
elderly Medicare beneficiaries.
Clinical trials and observational studies convinc-
ingly demonstrate that total knee arthroplasty (TKA) is
safe and improves joint-specific outcomes, including
pain and range of motion (1–3). Levels of patient
satisfaction are also high, exceeding 75% (4,5). It is less
clear whether TKA improves overall physical function-
ing. In a clinical study, Jones et al examined both
joint-specific outcomes and general physical functioning
6 months after TKA (4). They reported significant
improvements in joint pain and mobility, but improve-
ments in physical functioning were less common and
were not highly correlated with joint-specific outcomes.
Noble et al compared multiple indicators of physical
functioning between TKA recipients who were at least 1
year postsurgery and age- and sex-matched adults with
no history of knee disorders (6). They reported no
differences between groups for most activities, but TKA
recipients reported significantly greater dysfunction on
tasks that place greater demands on the knee (e.g.,
kneeling, carrying loads). Note, however, that this study
did not compare functioning before and after TKA.
Similar results were reported in a survey-based study in
TKA may be one factor accounting for declining
disability rates in the US (8,9). To document this,
however, improvements in physical functioning resulting
from TKA must be observed at the population level.
Given the high cost of TKA, a question also remains
whether its benefits exceed joint-specific outcomes and
translate into improved capacity for self care and main-
tenance activities, although studies based on health-
related quality of life show that TKA is cost-effective
(10,11). The present study examined the effects of TKA
on 3 levels of physical functioning that vary in severity of
impairment in a national sample of Medicare beneficia-
ries. To our knowledge, this is the first study to examine
Supported by a grant from the Institute for Health Technol-
Linda K. George, PhD, David Ruiz, Jr., MA, Frank A. Sloan,
PhD: Duke University, Durham, North Carolina.
Address correspondence and reprint requests to Linda K.
George, PhD, Duke University, Box 90088, Durham, NC 27708-0088.
Submitted for publication July 8, 2007; accepted in revised
form June 17, 2008.
the effects of TKA using national data, although similar
samples have been used in studies of the epidemiology
of TKA (12–14).
Rates of TKA have increased dramatically since
1990 (12). Kurtz and colleagues reported that rates of
primary TKA nearly tripled between 1990 and 2002 (13).
Despite the dramatic increase in primary TKAs, the
National Institutes of Health Consensus Statement on
Total Knee Replacement estimated that only 13% of
women and 9% of men who are candidates for this
surgical procedure choose TKA (5). Similarly, in a
county-level study in Ontario, Canada, only 33.5% of
persons for whom joint replacement was clinically ap-
propriate reported that they would “definitely” or
“probably” be willing to consider joint replacement as a
treatment option (15). Increasing rates of TKA and the
very large potential pool of candidates for TKA high-
light the importance of documenting the effects of TKA
on physical functioning.
PATIENTS AND METHODS
Sample. This study used data from the Medicare
Current Beneficiary Survey (MCBS) Cost and Use files. The
MCBS sample is selected from Medicare beneficiaries using a
multistage, stratified random sampling procedure. Disabled
persons ages ?64 years and beneficiaries ages ?80 years are
oversampled. The Medicare program covers 96% of the US
population ages ?65 years as well as seriously disabled persons
younger than this. Participants are interviewed 3 times each
year. The interviews focus on demographic and socioeconomic
characteristics, health, cognitive status, physical functioning,
and health service use. However, the physical functioning items
are asked only once a year. Medicare claims data are merged
with the survey data. The MCBS uses a rotating panel design
in which one-third of the sample is replaced annually. Study
participants are interviewed for 4 years or until they die or
drop out. The sample size in any given year is ?12,500 (16).
The present study used MCBS and claims data col-
lected between 1992 and 2003. National data indicate that
?95% of TKAs in the older population are performed because
of arthritis of the knee (17). The sample for this study consisted
of individuals who 1) reported no TKA prior to entering an
MCBS cohort and 2) had been diagnosed as having osteoar-
thritis of the knee (International Classification of Diseases,
Ninth Revision, Clinical Modification codes 715.16, 715.96,
715.06, 715.26, 715.36, and 715.86). The sample was further
classified into members who did and did not undergo unilateral
TKA between their baseline and final interviews (Current
Procedural Terminology code 27447). After eliminating re-
spondents for whom data were missing (including respondents
enrolled in health maintenance organizations for whom infor-
mation about medical diagnoses and procedures was lacking),
there were 259 persons in the treatment group and 1,816
persons in the no-treatment group (see Figure 1 for the types
and numbers of losses to attrition). The vast majority of sample
loss was due to lack of an interview within the appropriate time
frame, resulting in part from the fact that the physical func-
tioning items were asked only once a year. Creation of
propensity scores is described below. The Institutional Review
Board at Duke University approved this study.
Measurement. The primary independent variable was
receipt versus nonreceipt of unilateral TKA. For TKA recipi-
ents, baseline measures were taken from the interview imme-
diately prior to surgery; followup data were taken from the first
interview that occurred at least 80 days postsurgery and
included the physical functioning items. For the no-treatment
group, baseline interviews were randomly selected with the
qualification that there was a followup interview at least 6
months after the baseline interview that included the physical
functioning items. The average interval between baseline and
followup interviews was slightly less than 13 months for the
treatment group and slightly less than 16 months for the
no-treatment group. Time between baseline and followup
interview was a control variable in the multivariate analysis.
Three levels of physical functioning were examined as
outcomes of TKA. We examined 3 items from the Nagi
Disability Scale that were especially relevant to knee function:
stooping/crouching, walking 2–3 blocks, and lifting objects
weighing up to 10 pounds (18). Respondents reported the
amount of difficulty they had performing these tasks (no
difficulty, a little difficulty, some difficulty, a lot of difficulty,
and not able to do it). Persons who reported “a lot of
difficulty” and “not able to do it” were compared with those
who reported “no difficulty,” “a little difficulty,” and “some
We examined 4 items from the Instrumental Activities
of Daily Living (IADL) Scale: performing light housework,
performing heavy housework, preparing meals, and personal
shopping (19). The omitted IADL tasks—taking medicine,
handling money, and using the telephone—were unlikely to be
osteoarthritis; dx ? diagnosis.
Sample attrition. TKA ? total knee arthroplasty; OA ?
TKA AND PHYSICAL FUNCTIONING3167
affected by TKA. We examined 5 items from the Activities of
Daily Living (ADL) Scale: bathing/showering, getting dressed,
getting in and out of a chair, walking, and using the toilet (20).
The omitted ADL tasks—eating and personal grooming—
were also unlikely to be affected by TKA. Response categories
for IADL and ADL items focused on difficulty performing the
task (no difficulty, have difficulty, and do not do). Respon-
dents who reported having difficulty or “do not do” because of
health or physical problems were compared with those who
reported “no difficulty.”
These 3 scales require different levels of strength,
mobility, and stamina. The Nagi Disability Scale items require
the highest levels of functional capacity, IADL Scale items are
intermediate, and ADL Scale items require the least functional
capacity (21). Metric values of the functional measures were
used as baseline measures of the outcomes. Changes in scores
between interview dates were the dependent variables.
Analytical methods. Percentages and means (for in-
come) for the independent variables were compared for the
treatment and no-treatment groups. We used t-tests to assess
statistically significant differences between treatment and no-
treatment groups in physical functioning outcomes and to
compare presurgical–postsurgical measures of physical func-
tioning within groups. Propensity score methods were used to
predict the average treatment effects of TKA on physical
In a first step, we estimated a logit model for which the
dependent variable was 1 if the individual received a TKA and
0 if otherwise. We included as covariates several variables
demonstrated in previous research to predict either receipt or
outcome of TKA. Demographic characteristics included sex
(male ? 1, female ? 0), a set of binary variables representing
race/ethnicity (American Indian, Asian, African American,
Other, and White, the omitted category in the analysis), and
baseline age, which was coded in 3 groups: 60–74 years
(omitted category), 75–84 years, and ?85 years. Socioeco-
nomic characteristics included education (?9 years [the omit-
ted category], 9–12 years, and ?13 years), income (coded in
hundreds of dollars), and insurance coverage in addition to
Medicare, which included private insurance and Medicaid. We
also included baseline measures of self-rated health and co-
morbid conditions that limit mobility. Self-rated health was
coded in 5 categories: poor, fair, good, very good, and excellent
(omitted category). Binary variables indicated the presence of
Parkinson’s disease, osteoporosis, Alzheimer’s disease, hard-
ening of arteries, stroke, congestive heart disease, other heart
conditions, and whether a respondent had a body mass index
?30. For the baseline Nagi Disability Scale, IADL Scale, and
ADL Scale, “no difficulty” was the omitted category.
In a second step, a no-treatment group was defined by
matching treated patients with “controls” based on the pre-
dicted probabilities of receiving a TKA. The purpose was to
match the controls as closely as possible to the treated patients
based on the predicted probabilities of receiving a TKA. The
propensity score procedure defines block groups. Within each
block group, the average predicted probability of receiving the
procedure does not differ significantly between the treated
patients and the controls. In our analysis, the 6 resulting blocks
of treatment and no-treatment persons were generated using
the kernel matching method to compute the average treatment
effect of TKA on physical functioning measures (22). Standard
errors were calculated using the bootstrapping method. All
analyses were performed using Stata 9.0 software (StataCorp,
College Station, TX).
There is consensus among statisticians that conven-
tional significance tests are inappropriate for assessing the fit
of propensity score matching (23,24). The Stata software used
in this analysis splits the sample into the number of intervals
required to ensure that the balancing hypothesis is met (i.e.,
that the means of each characteristic do not differ between
treated and untreated units) (22). If necessary, subjects will be
dropped from the sample to meet the balancing criterion. As
shown in Figure 1, in this analysis, 3 TKA recipients and 1
untreated respondent were dropped from the sample for this
In lieu of conventional significance testing, Austin et al
(23), among others, propose the use of standardized difference
scores for assessing the adequacy of the balance between
treated and untreated groups in observational studies. They
recommend that a standardized difference score of ?10%
represents meaningful imbalance between groups. Of the 1,816
matches generated by the propensity score analysis, only 3
pairs had standardized difference scores of ?10%. Overall, the
balancing criterion was well met.
Table 1 presents percentages (and means for
income) for the independent variables for the treatment
and no-treatment groups at baseline. Neither sex nor
race/ethnicity differed significantly across groups. Re-
spondents ages ?85 years or with ?9 years of education
were significantly less likely to receive TKA. Self-rated
health and comorbid conditions did not differ across
groups. Two of the 3 Nagi Disability Scale activities
(stooping and walking) and 3 of the ADL Scale tasks
(bathing, walking, and using the toilet) differed signifi-
cantly across groups. As expected, the treatment group
was more impaired than the no-treatment group. IADL
Scale limitations were not significantly related to receipt
Table 2 presents descriptive statistics for changes
in physical functioning scores between the presurgical
and postsurgical interviews. Metric-based change scores
are presented, and significance tests are reported for
both within-group changes and cross-group differences.
Differences between groups were significant for 9 of the
12 physical functioning tasks (all but heavy housework,
getting in/out of a chair, and dressing); the treatment
group members reported significantly better functioning
than members of the no-treatment group. There was a
clear pattern: physical functioning of treatment group
members improved over time and that of persons who
did not receive TKA declined over time. Examining
changes within the treatment group, 10 of the 12 change
3168GEORGE ET AL
scores represented statistically significant improvements
in functioning. Within the no-treatment group, there
were significant declines in functioning for 5 of the 12
Table 3 presents the average treatment effect of
TKA for each Nagi Disability Scale, IADL Scale, and
ADL Scale activity. The average treatment effects were
statistically significant for 2 of the 3 Nagi Disability Scale
Descriptive statistics (n ? 2,075)*
(n ? 259)
(n ? 1,816)P†
Income, mean $ (?100)
Health and comorbidities
Other heart conditions
Congestive heart disease
Baseline physical functioning
Nagi Disability Scale activities
No difficulty stooping
Not able to stoop
No difficulty lifting
Not able to lift
No difficulty walking
Not able to walk
IADL Scale activities
ADL Scale activities
Getting in/out of chair
Use of toilet
* Except for income, values are the percentage of patients. NS ? not
significant; IADL ? Instrumental Activities of Daily Living.
† Treatment group versus no-treatment group, by chi-square omnibus
test, plus t-tests for each variable level.
Descriptive statistics: dependent variables (n ? 2,075)*
(n ? 259)
(n ? 1,816)P†
Nagi Disability Scale activities
IADL Scale limitations
ADL Scale limitations
Getting in/out of chair
Use of toilet
* Values are percentage-based change scores. See Table 1 for defini-
† Change in treatment group versus change in no-treatment group, by
‡ P ? 0.001 for baseline interview versus followup interview, by t-test.
§ P ? 0.01 for baseline interview versus followup interview, by t-test.
¶ P ? 0.05 for baseline interview versus followup interview, by t-test.
Scale, and ADL Scale activities (n ? 2,075)*
Average treatment effect for Nagi Disability Scale, IADL
Baseline covariate (range) Treatment effect, %
Nagi Disability Scale activities (?3 to 3)
IADL Scale activities (?1 to 1)
ADL Scale activities (?1 to 1)
Getting in/out of chair
Use of toilet
7.8 (?0.02 to 0.16)
14.0 (0.61 to 0.22)†
27.6 (0.15 to 0.37)†
5.3 (0.00 to 0.10)†
6.1 (0.00 to 0.14)†
4.6 (?0.00 to 0.10)
7.8 (0.04 to 0.13)†
6.9 (0.01 to 0.13)†
0.5 (?0.03 to 0.055)
3.6 (?0.00 to 0.10)
6.9 (?0.01 to 0.15)
4.2 (?0.00 to 0.10)
* Values are the mean (range) treatment effect, where the treatment
effect is the average percentage change. The range is from the baseline
metric mean to the outcome metric mean. See Table 1 for definitions.
† P ? 0.05.
TKA AND PHYSICAL FUNCTIONING3169
activities (lifting and walking) and for 3 of the 4 IADL
Scale tasks (light housework, heavy housework, and
shopping). In contrast, only 1 of the average treatment
effects for the 5 ADL Scale tasks (bathing) was signifi-
This study demonstrates that TKA is associated
with improvements in both basic and advanced activities
of daily living—known prerequisites for self care and
living independently. Recipients of TKA improved sig-
nificantly in 1 basic aspect of self care (bathing), 3 more
difficult tasks (light housework, heavy housework, and
shopping), and 2 advanced activities of daily living
(walking 2–3 blocks and lifting weights up to 10 pounds).
In contrast, persons who did not have TKA exhibited
overall patterns of decline in physical functioning. It is
not surprising that improvements in Nagi Disability
Scale activities and IADL Scale tasks were more com-
mon than those for ADL Scale tasks because the latter
represent very basic indicators of physical functioning. If
an individual is disabled in one or more ADL Scale
tasks, the odds of recovery of function are especially low
This study has several strengths. The effects of
TKA on physical functioning were examined using data
from a national sample. Physical functioning was mea-
sured at 3 levels of severity, all of which improved after
TKA. Propensity scores were used to match TKA and
no-TKA groups on demographic factors, self-rated
health, comorbid medical conditions, and baseline levels
of functioning. Propensity scores are generally superior
to analysis of covariance procedures when estimating
treatment effects using observational data (25). The
propensity matching worked very well in these data. This
was probably because even at the bivariate level, only 2
of the sociodemographic variables and none of the
health variables differed significantly between the treat-
ment and no-treatment groups (see Table 1).
We acknowledge the study’s limitations. Al-
though the MCBS sample was large, fewer than 300
participants received TKA and met the criteria for
inclusion in the analysis. The interval between baseline
and followup interviews was ?2 months longer for the
no-treatment group than for the treatment group. We
controlled for interval length in the multivariate logit
model, however, and it was not a significant predictor.
Use of propensity scores was helpful for establishing
equivalence between the TKA and no-TKA groups, but
this procedure does not permit use of sample weights,
precluding population estimates. The measures of phys-
ical functioning were based on participants’ self reports
rather than on performance tests. Previous research
strongly suggests, however, that older adults accurately
report their functional capacities (19,26).
Rates of disability in the older population have
declined steadily since the 1980s (8,9). Manton and Gu
reported that the prevalence of disability cumulatively
declined by 25%, from 26.2% to 19.7% of the older
population, between 1982 and 1999 (9). Moreover, the
rate of decline is increasing over time. This pattern is
typically attributed to the long-term effects of public
health measures introduced in the 20th century, rising
levels of socioeconomic status, and historical trends of
better health habits (26). Improvements in medical care
are only rarely suggested as contributors to declining
disability rates (27), although substantial research has
reported the importance of improvements in health care
for longevity (28,29). Research on disability transitions
indicates that, although transitions to a disabled status
are most common, some older adults transition from
disability to no disability—and transitions out of disabil-
ity appear to be increasing over time (26,30). Joint
replacement is one likely way that medical care is
contributing to declining rates of disability in the older
population. The results of the present study are compat-
ible with this conclusion. Additional research that exam-
ines the effects of medical procedures on overall disabil-
ity status at the population level is a high priority for
Dr. George had full access to all of the data in the study and
takes responsibility for the integrity of the data and the accuracy of the
Study design. George, Ruiz, Sloan.
Acquisition of data. Sloan.
Analysis and interpretation of data. George, Ruiz, Sloan.
Manuscript preparation. George, Ruiz.
Statistical analysis. Ruiz.
1. Bachmeier CJ, March LM, Cross MJ, Lapsley HM, Tribe KL,
Courtenay BG. A comparison of outcomes in osteoarthritis pa-
tients undergoing total hip and knee replacement surgery. Osteo-
arthritis Cartilage 2001;9:37–146.
2. Jones CA, Voaklander DC, Johnston DW, Suarez-Almazor ME.
The effect of age on pain, function, and quality of life after total
hip and knee arthroplasty. Arch Intern Med 2001;161:454–60.
3. Mangione CM, Goldman L, Orav EJ, Marcantonio ER, Pedan A,
Ludwig LE. Health-related quality of life after elective surgery:
measurement of longitudinal changes. J Gen Intern Med 1997;12:
4. Jones CA, Voaklander DC, Johnston DW. Health-related quality
3170GEORGE ET AL
of life outcomes after total hip and knee arthroplasties in a
community-based population. J Rheumatol 2000;27:745–52.
5. National Institutes of Health. NIH consensus statement on total
knee replacement. NIH Consens State Sci Statements 2003;20:
6. Noble PC, Gordon MJ, Weiss JM, Reddix RN, Conditt MA,
Mathis KB. Does total knee replacement restore normal knee
function? Clin Orthop 2005;431:157–65.
7. Boutron I, Poiraudeau S, Ravaud P, Baron G, Revel M, Nizard R,
et al. Social and personal consequences of disability in adults with
hip and knee arthroplasty: a French national community based
survey. J Rheumatol 2004;31:759–66.
8. Freedman VA, Martin LG, Schoeni RF. Recent trends in disabil-
ity and functioning among older adults in the United States.
9. Manton KG, Gu X. Changes in the prevalence of chronic disability
in the United States black and non-black population above age 65
from 1982 to 1999. Proc Natl Acad Sci U S A 2001;98:6354–9.
10. Lavernia CJ, Guzman JF, Gachupin-Garcia A. Cost effectiveness
and quality of life in knee arthroplasty. Clin Orthop Relat Res
11. Rasanen P, Paavolainen P, Sintonen H, Koivisto AM, Blom M,
Ryynanen OP, et al. Effectiveness of hip and knee replacement
surgery in terms of quality-adjusted life years and costs. Acta
12. Jain NB, Higgins LD, Ozumba D, Guller U, Cronin M, Pietrobon
R, et al. Trends in epidemiology of knee arthroplasty in the United
States, 1990–2000. Arthritis Rheum 2005;52:3928–33.
13. Kurtz S, Mowat F, Ong K, Chan N, Lau E, Halpern M. Prevalence
of primary and revision total hip and knee arthroplasty in the
United States from 1990 through 2002. J Bone Joint Surg Am
14. Ibrahim SA, Stone RA, Han X, Cohen P, Fine MJ, Henderson
WG, et al. Racial/ethnic differences in surgical outcomes in
veterans following knee or hip arthroplasty. Arthritis Rheum
15. Hawker GA, Wright JG, Bradley EM, Coyte PC, for the Toronto
Arthroplasty Health Services Research Consortium. Perceptions
of, and willingness to consider, total joint arthroplasty in a
population-based cohort of individuals with disabling hip and knee
arthritis. Arthritis Rheum 2004;51:635–41.
16. Adler GS. A profile of the Medicare Current Beneficiary Survey.
Health Care Financ Rev 1994;15:153–63.
17. Mancuso CA, Ranawat CS, Esdaile JM, Johanson NA, Charlson
ME. Indications for total hip and total knee arthroplasties—results
of orthopaedic surveys. J Arthroplasty 1996;11:34–46.
18. World Health Organization. International classification of impair-
ments, disabilities, and handicaps. Geneva: World Health Organi-
19. Fillenbaum GG. Multidimensional functional assessment of older
adults: the Duke older Americans resources and services proce-
dures. Mahwah (NJ): Lawrence Erlbaum; 1988.
20. Katz S, Ford AB, Moskowitz RW, Jackson BA, Jaffe MW. Studies
of illness of the aged. The index of ADL: a standardized measure
of biological and psychosocial functions. JAMA 1963;185:914–9.
21. McHorney CA, Cohen AS. Equating health status measures with
item response theory: illustrations with functional status items.
Med Care 2000;38:1143–59.
22. Becker SO, Ichino A. Estimation of average treatment effects
based on propensity scores. Stata J 2002;2:358–77.
23. Austin PC, Grootendorst P, Anderson GM. A comparison of the
ability of different propensity score models to balance measured
variables between treated and untreated subjects: a Monte Carlo
study. Stat Med 2007;26:734–53.
24. Imai K, King G, Stuart EA. Misunderstandings among experimen-
talists and observationalists: balance test fallacies in causal infer-
ence. Technical report; Princeton University. Available from:
25. D’Agostino RB Jr., D’Agostino RB Sr. Estimating treatment
effects using observational data. JAMA 2007;297:314–6.
26. Crimmins EM, Saito Y, Reynolds SL. Further evidence on recent
trends in the prevalence and incidence of disability among older
Americans from two sources: the LSOA and the NHIS. J Gerontol
B Psychol Sci Soc Sci 1997;52:S59–71.
27. Cutler DM. The reduction in disability among the elderly. Proc
Natl Acad Sci U S A 2001;98:6546–7.
28. Hall RE, Jones CE. The value of life and the rise in health care
spending. Q J Econ 2007;122:39–72.
29. Murphy KM, Topel RH. The value of health and longevity. J
Political Economy 2006;114:871–904.
30. Hardy SE, Dubin JA, Holford TR, Gill TM. Transitions between
states of disability and independence among older persons. Am J
TKA AND PHYSICAL FUNCTIONING 3171