Page 1

ORIGINAL REPORT

One-to-many propensity score matching in cohort studies

Jeremy A. Rassen1*, Abhi A. Shelat2, Jessica Myers1, Robert J. Glynn1, Kenneth J. Rothman3

and Sebastian Schneeweiss1

1Division of Pharmacoepidemiology and Pharmacoeconomics; Department of Medicine, Brigham and Women’s Hospital and Harvard

Medical School, Boston, MA, USA

2Department of Computer Science, University of Virginia, Charlottesville, VA, USA

3RTI International, Research Triangle Park, NC, USA

ABSTRACT

Background

matching ratio is thought to improve precision but may come with a trade-off with respect to bias.

Objective

To evaluate several methods of propensity score matching in cohort studies through simulation and empirical analyses.

Methods

We simulated cohorts of 20000 patients with exposure prevalence of 10%–50%. We simulated five dichotomous and five continuous

confounders. We estimated propensity scores and matched using digit-based greedy (“greedy”), pairwise nearest neighbor within a caliper (“nearest

neighbor”), and a nearest neighbor approach that sought to balance the scores of the comparison patient above and below that of the treated patient

(“balanced nearest neighbor”). We matched at both fixed and variable matching ratios and also evaluated sequential and parallel schemes for the

order of formation of 1:n match groups. We then applied this same approach to two cohorts of patients drawn from administrative claims data.

Results

Increasing the match ratio beyond 1:1 generally resulted in somewhat higher bias. It also resulted in lower variance with variable

ratio matching but higher variance with fixed. The parallel approach generally resulted in higher mean squared error but lower bias than the

sequential approach. Variable ratio, parallel, balanced nearest neighbor matching generally yielded the lowest bias and mean squared error.

Conclusions

1:n matching can be used to increase precision in cohort studies. We recommend a variable ratio, parallel, balanced 1:n,

nearest neighbor approach that increases precision over 1:1 matching at a small cost in bias. Copyright © 2012 John Wiley & Sons, Ltd.

Among the large number of cohort studies that employ propensity score matching, most match patients 1:1. Increasing the

key words—propensity scores; confounding factors (epidemiology); epidemiologic methods; comparative effectiveness research

Received 19 August 2011; Revised 23 February 2012; Accepted 24 February 2012

INTRODUCTION

Epidemiologists have long employed matching in cohort

studies, and matched cohort studies may be particularly

applicable in automated safety surveillance systems and

other scenarios. Whereas matching was traditionally

performed on specific factors—age, sex, days on treat-

ment1—today’s matching is often carried out on a

summary score, such as a propensity score2–4or disease

risk score.5In cohort studies, matching on propensity

scoresoffersinvestigatorstheabilitytobalancetreatment

groups across all putative risk factors, and allows easy

inspection of the achieved balance across measured cov-

ariates. It excludes those subjects in the non-overlapping

ranges of the score, thereby giving an estimate of the

treatment effect among the treated, an important clinical

measure.6–8The matching process serves a function

similar to propensity score trimming and improves the

validity of the estimate.91:1 matching on propensity

scores is often performed using SAS-based greedy

matching algorithm,10which offers a fast way to get ap-

proximately nearest neighbor matches.10,11Nearest

neighbor matching, although shown to provide better

balance among treatment groups, is not frequently used

in epidemiology.12

Cohort study matching at ratios of 1:n, with either a

fixed or variable n, can yield higher precision and thus

smaller confidence intervals than does simple 1:1

matching. It has also been known to increase bias,

because second matches will generally be of lower qual-

ity than the first.13When going beyond 1:1 matching

*Correspondence to: J. A. Rassen, Division of Pharmacoepidemiology and

Pharmacoeconomics, Department of Medicine, Brigham and Women’s

Hospital, Harvard Medical School, 1620 Tremont Street Suite 3030, Boston,

MA 02120, USA. E-mail: jrassen@post.harvard.edu

Dr. Rassen is a recipient of a career development award from Agency for

Healthcare Research and Quality (K01 HS018088). The Division of Pharmacoe-

pidemiology received gifts from IBM Netezza and Tableau Software.

Copyright © 2012 John Wiley & Sons, Ltd.

pharmacoepidemiology and drug safety 2012; 21(S2): 69–80

Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pds.3263

Page 2

ratios, Ming and Rosenbaum recommend employing

variable ratio techniques, which reduce bias as compared

with fixed ratio matching but also result in a loss in trans-

parencyofa“Table1”presentationofcovariatebalancein

the patient cohort.14Variable ratio matching retains more

exposed subjects than fixed ratio by not dropping those

without the set number of comparison group matches.

1:n matching without replacement is frequently

accomplished by creating a 1:1 matched cohort and

then adding second-level, third-level, and higher-level

matches from among the remaining patients.15,16We re-

fer to this as a sequential, “everyone gets firsts before

anyone gets seconds” approach. The advantage of this

approach is that no treated patient is “starved” of his or

her single best available comparison patient match as a

result of using that best match in a secondary position

for another treated patient. At the same time, the sequen-

tial approach may lessen the quality of certain matched

sets by potentially downgrading the quality of a treated

patient’s secondary matches. If enough matched sets

are affected, then the distance between the treated and

comparison groups in the overall cohort may be larger

than necessary, resulting in a biased point estimate.

Here, we examine alternatives to the Parsons greedy

matching methodology, including a true nearest neigh-

bor approach that minimizes within-set distances. We

also examine several schemes for matching that yield

1:n cohorts. We examine the performance of these

schemes through simulation and empirical studies, as

applied in the context of cohort studies of drug effects

in healthcare databases.

METHODS

Treatment and comparison groups

Throughout this paper, we refer to the two exposure

categories as the treatment and the comparison groups

and assume that a single treated patient is matched to

one or more comparison patients.

In observational research, the goal of matching is to

createtreatmentandcomparisongroupsthatarebalanced

on all measured confounders. Matching on a balancing

score will yield, in expectation, balance between

treatmentgroupsforthecovariatesincludedinthescore.3

Although it is a common practice to match on a propen-

sity score,17it is also possible to match on a summary

disease risk score, dichotomous variables (e.g., sex), or

continuous values (e.g., logit of propensity score, age).

Type of matching and terminology

Although greedy matching has a general meaning in the

biostatistics literature, the term in epidemiology tends to

refer to the SAS-based implementation of greedy match-

ing by Parsons.10,18Parsons’ approach matches patients

on decreasing levels of precision of the propensity score.

Treated patients are considered sequentially.10Each trea-

ted patient is matched to a comparison patient whose

score equals that of the treated patient to at least the fifth

digit. When all matches at the fifth digit are exhausted,

the process begins again at the fourth digit and so forth.

This approach to greedy matching is an efficient

approximation of a type of nearest neighbor matching,

in which each treated patient is matched to the

unmatched comparison patient with the closest pro-

pensity score, with “closest” commonly defined as

the difference in the two patients’ scores. A maximum

allowable distance (the “caliper”) is often imposed.

The use of this type of nearest neighbor matching

has been in part limited by the lack of efficient soft-

ware to compute the best matches; to our knowledge,

existing software either computes all possible pairings

of treated and comparison patients and selects the

nearest pairings within a predefined caliper19or finds

treated patients’ best comparison patient matches on

the basis of a single ordering (high score to low, low

score to high, random, order appearing in the data).16

As an alternative, we have implemented a nearest

neighbor technique that guarantees computation of

the best matches, gives consistent results independent

of any ordering of patients, and avoids the exponential

scaling of required time and memory with the number

of subjects. In typical configurations, it executes in less

than 1 second (see Appendix A).

Our pairwise approach to nearest neighbor matching

yields a cohort in which the distance between each pair

of patients is minimized, but the overall distance

between the treated and comparison groups may not be

optimal. In practice, we believe the difference between

pairwise nearest neighbor and optimal nearest neighbor

matching is minimal, and pairwise nearest neighbor

matching is far faster to compute. Appendix B demon-

strates a case in which the results of pairwise nearest

neighbor and optimal nearest neighbor matching will

differ. Because of what we perceive to be small differ-

ences in the amount of confounding adjustment offered

by the two techniques, and the substantially greater

computetimerequiredbyoptimalmatching,weconsider

only pairwise nearest neighbor matching in this paper.

Unfortunately, there has been some inconsistency in

matching terminology in the epidemiology and biosta-

tistics literature. In this paper, we refer to pairwise

nearest neighbor matching within a fixed caliper simply

as nearest neighbor matching. Other literature refers to

this approach as greedy matching with a caliper and

refers to what we describe as optimal nearest neighbor

j. a. rassen et al.70

Copyright © 2012 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2012; 21(S2): 69–80

DOI: 10.1002/pds

Page 3

Table 1.

Matching simulation results for base exposure prevalence of 30%

Matching scheme

Standardized differences of

measured variables*

C1

C5

D1

D5

Mean

stand-

ardized

distance

Max.

standard

distance

Mean

number of

matched

sets

Mean % of

treated patients

matched

Mean

matching

ratio (1:n)

Mean

treatment effect

(SD)**

Mean

bias (%)

Mean

squared

error

Unmatched

0.199

0.577

0.209

0.560

0.391

0.407

6001

30.00

2.3

10.43 (0.31)

?943.3

90.678

1:1 Matching

Nearest neighbor matching

0.000

0.000

0.000

0.000

0.000

0.005

4272

50.00

1.0

0.99 (0.40)

0.7

0.163

Digit-based greedy matching

0.000

0.001

0.000

0.001

0.000

0.004

4285

50.00

1.0

1.01 (0.41)

?0.6

0.166

2:1 Matching

Nearest neighbor matching

Sequential variable ratio

0.029

0.092

0.031

0.089

0.061

0.065

4271

39.65

1.5

1.01 (0.38)

?0.9

0.146

Parallel variable ratio

0.017

0.054

0.019

0.048

0.036

0.041

3683

36.19

1.8

1.01 (0.39)

?1.1

0.154

Sequential fixed ratio

0.000

0.001

0.000

0.001

0.000

0.007

2230

33.33

2.0

1.01(0.50)

10.9

0.250

Parallel fixed ratio

0.000

0.001

0.000

0.000

0.000

0.005

2819

33.33

2.0

0.99 (0.43)

0.5

0.188

Balanced nearest neighbor matching

Sequential variable ratio

0.028

0.090

0.030

0.089

0.060

0.064

4271

40.44

1.5

1.01 (0.38)

11.1

0.147

Parallel variable ratio

0.021

0.067

0.023

0.066

0.045

0.050

3973

39.43

1.5

1.01 (0.39)

11.0

0.151

Sequential fixed ratio

0.001

0.001

0.000

0.000

0.000

0.008

2020

33.33

2.0

1.01 (0.52)

11.2

0.274

Parallel fixed ratio

0.000

0.001

0.000

0.000

0.000

0.007

2133

33.33

2.0

1.00 (0.50)

0.1

0.250

Digit-based greedy matching

Sequential variable ratio

0.029

0.093

0.031

0.091

0.062

0.066

4284

39.76

1.5

1.02 (0.38)

12.2

0.146

Parallel variable ratio

0.013

0.040

0.014

0.036

0.026

0.031

3882

40.04

1.5

1.01 (0.40)

11.4

0.159

Sequential fixed ratio

0.000

0.001

0.000

0.001

0.000

0.007

2205

33.33

2.0

1.02 (0.50)

12.0

0.252

Parallel fixed ratio

0.000

0.001

0.000

0.002

0.000

0.008

1935

33.33

2.0

1.00 (0.54)

0.2

0.297

3:1 Matching

Nearest neighbor matching

Sequential variable ratio

0.046

0.146

0.049

0.147

0.097

0.103

4271

35.03

1.9

1.01 (0.37)

10.7

0.137

Parallel variable ratio

0.032

0.102

0.036

0.096

0.068

0.074

3463

30.42

2.3

1.01 (0.39)

10.9

0.154

Sequential fixed ratio

0.000

0.001

10.001

0.001

0.000

0.010

1423

25.00

3.0

1.00 (0.60)

0.5

0.355

Parallel fixed ratio

0.000

0.000

0.000

0.000

0.000

0.007

2007

25.00

3.0

1.02(0.50)

11.8

0.249

Balanced nearest neighbor matching

Sequential variable ratio

0.046

0.145

0.049

0.147

0.097

0.102

4271

35.19

1.8

1.01 (0.37)

10.6

0.138

Parallel variable ratio

0.036

0.114

0.039

0.113

0.076

0.082

3809

33.64

2.0

1.01 (0.39)

10.8

0.151

Sequential fixed ratio

0.000

0.001

0.000

0.000

0.000

0.010

1497

25.00

3.0

0.99 (0.58)

0.6

0.343

Parallel fixed ratio

10.001

0.000

0.000

0.000

0.000

0.009

1743

25.00

3.0

1.02 (0.53)

11.8

0.287

Digit-based greedy matching

Sequential variable ratio

0.046

0.147

0.049

0.148

0.098

0.103

4285

35.19

1.8

1.01 (0.37)

11.5

0.138

Parallel variable ratio

0.022

0.069

0.024

0.064

0.045

0.052

3731

36.01

1.8

1.01 (0.40)

10.8

0.163

Sequential fixed ratio

0.000

0.001

0.000

0.001

0.000

0.009

1402

25.00

3.0

1.00 (0.60)

0.3

0.362

Parallel fixed ratio

0.000

0.000

0.000

0.001

0.001

0.012

1114

25.00

3.0

1.00 (0.66)

0.1

0.443

4:1 Matching

Nearest neighbor matching

Sequential variable ratio

0.058

0.183

0.061

0.189

0.123

0.129

4268

32.35

2.1

1.01 (0.36)

10.7

0.127

Parallel variable ratio

0.044

0.139

0.049

0.136

0.093

0.099

3362

27.37

2.7

1.00 (0.40)

0.2

0.161

Sequential fixed ratio

0.001

0.001

0.001

0.002

0.001

0.012

1009

20.00

4.0

0.99 (0.67)

0.5

0.450

Parallel fixed ratio

0.000

0.000

0.001

0.000

0.001

0.009

1516

20.00

4.0

1.01 (0.55)

11.5

0.301

(Continues)

one-to-many matching in cohort studies 71

Copyright © 2012 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2012; 21(S2): 69–80

DOI: 10.1002/pds

Page 4

matching as optimal matching.6We refer to Parsons’

commonly used digit-based greedy matching approach

as greedy matching to invoke the standard term in the

epidemiology literature.

1:n matching

We examined a series of strategies for 1:n propensity

score matching, which has a smaller body of litera-

ture13,14than does 1:1 matching.12In each case, we con-

sideredbothfixedratiomatching,inwhichsetsmusthave

one treated patient and exactly n comparison patients, as

well as variable ratio matching, in which one treated

patient is matched to up to n comparison patients.1,14In

a cohort study, the analysis can ignore the matching set

if fixed ratio matching is applied—though at a possible

cost of precision1,20—but variable sizes of match groups

require accounting for the match using stratification by

number of matches or by matched set.13

For each method, we considered both sequential and

parallel matched set building. In sequential matched

set building, we created an initial group of 1:1 matches.

Then,weaddedsecondmatchestothe1:1matches,then

third matches to the 2:1 matches, and so forth, with

additional comparison patients added from among those

who had not been previously matched. This method

yieldedacohortinwhichthefirstmatchineachmatched

set was the best possible match, and each succeeding

match was of equal or lesser quality. Any ties were

brokenrandomly.The advantage of this approach isthat

each treated patient has the opportunity to be matched

with his or her best available comparison group patient,

without the potential best comparison group patient

being used as a secondary match for another treated pa-

tient.However,thisapproach maycompromisebalance.

In parallel matched set building, we sought to

minimize the within-set distance among the overall

matched cohort. In this method, the best treated-to-

comparison match is made first. Then, if the next best

match would involve an already-matched treated patient,

we made a second (third, up tonth) match for that treated

patient even if there were other treated patients who

had not yet been assigned a first match. Although this

method should yield well-matched sets, treated patients

may be “starved” of their best first match in favor of a

second-position match for another treated patient.

For each combination of fixed and variable ratios,

and sequential and parallel approaches, we applied

the following match techniques. In all cases, we

worked on the natural scale of propensity scores rather

than a logit or other transformation.12

(1) Digit-based greedy matching. We applied a fifth

digit to first digit (5!1) greedy matching technique

Table 1.

(Continued)

Matching scheme

Standardized differences of

measured variables*

C1

C5

D1

D5

Mean

stand-

ardized

distance

Max.

standard

distance

Mean

number of

matched

sets

Mean % of

treated patients

matched

Mean

matching

ratio (1:n)

Mean

treatment effect

(SD)**

Mean

bias (%)

Mean

squared

error

Balanced nearest neighbor matching

Sequential variable ratio

0.057

0.181

0.060

0.187

0.121

0.127

4268

32.64

2.1

1.00 (0.36)

10.4

0.129

Parallel variable ratio

0.045

0.141

0.048

0.142

0.094

0.102

3771

31.52

2.2

0.99 (0.40)

0.6

0.157

Sequential fixed ratio

0.000

0.001

0.001

0.001

0.001

0.014

916

20.00

4.0

0.99 (0.70)

1.2

0.494

Parallel fixed ratio

0.000

10.001

0.000

0.000

0.000

0.014

894

20.00

4.0

1.01 (0.70)

10.6

0.490

Digit-based greedy matching

Sequential variable ratio

0.059

0.187

0.062

0.193

0.125

0.131

4283

32.18

2.1

1.03 (0.36)

13.4

0.128

Parallel variable ratio

0.028

0.088

0.031

0.085

0.059

0.067

3676

33.73

2.0

1.00 (0.41)

0.3

0.166

Sequential fixed ratio

0.002

0.005

0.001

0.007

0.003

0.015

1140

20.00

4.0

1.06 (0.62)

16.2

0.390

Parallel fixed ratio

0.000

0.001

0.000

0.002

0.001

0.016

747

20.00

4.0

1.05 (0.80)

14.5

0.647

*The variable C1 is the first continuous variable, whereas C5 is the fifth. The variable D1 is the first dichotomous variable, whereas D5 is the fifth.

**The expected treatment effect is 1.0. Any change from 1.0 is bias. SD is standard deviation.

j. a. rassen et al. 72

Copyright © 2012 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2012; 21(S2): 69–80

DOI: 10.1002/pds

Page 5

as described previously,10,12,15with several modifi-

cations to the Parsons’ algorithm. We (i) matched

comparisonpatientstotreatedpatients,withthecom-

parisonsortedbyincreasingpropensityscoreandthe

treated patients sorted randomly; (ii) broke any ties

by using the smallestmatch distanceamong possible

matches; (iii) broke any remaining ties by using a

random comparison patient; and (iv) substantially

improved the speed of the algorithm by using ad-

vanced data structures.21

(2) Pairwise nearest neighbor matching. We imple-

mented a nearest neighbor matching algorithm

that minimized distance within matched sets and

applied a caliper of 0.05 on the propensity score

scale. Whereas others have suggested smaller

calipers,12,3we used 0.05 to allow for ready

comparison to 5!1 greedy matching. Using a

smaller caliper may improve match quality but

may also limit matches and thus lower precision.

Webelievethatresultsatacaliperof0.05willover-

estimateanybiasascomparedwithsmallercalipers.

(3) Balanced pairwise nearest neighbor matching.

Balanced nearest neighbor matching extends near-

est neighbor matching by requiring that comparison

patients alternate between having scores greater

than (to the right of) and less than (to the left of)

theirmatchedtreatmentpatient.Anyodd-numbered

match (first, third, ...) can occur on the left or the

right of the treated patient; even-numbered matches

must then occur on the side opposite of where the

prior odd-numbered match occurred. We imple-

mented this extension to avoid the potential prob-

lem of comparison patients’ consistently clustering

on one side of the matched treated patient.

SIMULATION STUDY

We tested these approaches in a simulation study. In

each run of the simulation, we created 20000 patients.

Following the design described by Austin,12we

assigned each patient’s exposure by using a binomial

distribution and a base exposure prevalence from

Type

Variable

Ratio

1:1

Match Type

Greedy

0K 5K10K

Cohort Size

0.000.05 0.10

Covariate Dist.

0.0% 6.0%

% Bias

0.20.4 0.6

Variance

0.20.4

MSE

0.6

Nearest Neighbor

Balanced NN

1:2 Greedy

Nearest Neighbor

Balanced NN

1:3 Greedy

Nearest Neighbor

Balanced NN

1:4 Greedy

Nearest Neighbor

Balanced NN

Fixed1:1 Greedy

Nearest Neighbor

Balanced NN

1:2Greedy

Nearest Neighbor

Balanced NN

1:3 Greedy

Nearest Neighbor

Balanced NN

1:4 Greedy

Nearest Neighbor

Balanced NN

Key Measures

Avg. Number of Matched Sets

643

2,000

3,000

3,869

Parallel Match?

Sequential

Parallel

Average of Cohort Size, average of Covariate Dist., average of % Bias, average of Variance and average of MSE for each Match Type broken down by

Type and Ratio. Size shows average of Number of Matched Sets. Shape shows details about Parallel Match?. The data is filtered on Baseline Expo-

sure Prevalance, which has multiple members selected. The view is filtered on Match Type, which has multiple members selected.

Figure 1.

sizeofthematchedcohorts(smallest=643;largest=3869);circularpointsindicatesequentialmatching,whereassquarepointsindicateparallel.NN,nearestneighbor

Observed results from simulations of various 1:n matching approaches, averaged over all simulation runs. Points are sized in proportion to the average

one-to-many matching in cohort studies73

Copyright © 2012 John Wiley & Sons, Ltd.

Pharmacoepidemiology and Drug Safety, 2012; 21(S2): 69–80

DOI: 10.1002/pds