Rank Reversal in Indirect Comparisons
Edward C. Norton, PhD1,2,3,*, Morgen M. Miller, BS1,2, Jason J. Wang, PhD4, Kasey Coyne, BA4,
Lawrence C. Kleinman, MD, MPH4,5
1Department of Health Management and Policy and2Department of Economics, University of Michigan, Ann Arbor, MI, USA;3National Bureau of Economic
Research, Cambridge, MA, USA;4Department of Health Evidence and Policy and5Department of Pediatrics, Mount Sinai School of Medicine, New York, NY,
A B S T R A C T
Objective: To describe rank reversal as a source of inconsistent interpre-
tation intrinsic to indirect comparison (Bucher HC, Guyatt GH, Griffith LE,
Walter SD. The results of direct and indirect treatment comparisons in
treatments and to propose best practice. Methods: We prove our main
points with intuition, examples, graphs, and mathematical proofs. We
also provide software and discuss implications for research and policy.
Results: When comparing treatments by indirect means and sorting
them by effect size, three common measures of comparison (risk ratio,
risk difference, and odds ratio) may lead to vastly different rankings.
Conclusions: The choice of risk measure matters when making indirect
comparisons of treatments. The choice should depend primarily on the
study design and the conceptual framework for that study.
Keywords: indirect comparisons, risk, risk difference, risk ratio, odds
Copyright © 2012, International Society for Pharmacoeconomics and
Outcomes Research (ISPOR). Published by Elsevier Inc.
When a direct comparison of two or more treatments within the
same study is not possible, researchers must instead make indirect
comparisons of those treatments across different trials. Bucher et al.
 first showed how to conduct an indirect comparison meta-analy-
is consistent for patients in the different trials [2,3], Bucher et al. 
showed that the log odds of the adjusted indirect comparison of two
each treatment compared with its control. Their basic method has
proved quite influential. While others have expanded upon it [3–5],
the basic model has been used in numerous meta-analyses.
Although Bucher et al. present their model by using ORs, they
also assert that the “method could be equally applied to estimates
of relative risk” [1, p. 684]. Song et al. [2, p. 489] state that Bucher’s
method of adjusted indirect comparison “may also be used when
the relative efficacy is measured by risk ratio or by risk difference.”
Indeed, there are a few studies that have modified the formulas
appropriately and used either the risk ratio (RR) or the risk differ-
ence (RD) instead of the more common OR (e.g., ).
The central point of this article is that when indirectly compar-
ing two or more treatments, the choice of how to express results
may directly affect the conclusion. In other words, when sorting
treatments by effect size (which we refer to as ranking), the choice
between RR, RD, and OR matters. We describe these three mea-
sures and show when indirect comparisons using any of the mea-
sures will be different from the others. We illustrate our results
with intuition, examples, graphs, and mathematical proofs. We
are unaware of previous description of this phenomenon, which
we call rank reversal.
are each compared with a control (called C) in separate studies. The
goal is to use the information in these two separate studies to indi-
rectly compare treatments A and B, and thereby rank their effective-
ness. For example, compared with a placebo, is the desirable out-
come more likely when a patient takes drug A rather than drug B?
As we explain, answering this question is not always straightfor-
preferred to drug B when measured by an RR, but drug B may be
preferred to drug A when measured by an OR, even though both OR
and RR are measures of relative effectiveness.
We prove that rankings are not always consistent across these
risk measures, describe under what circumstances the rankings
are the same or different, explain how uncertainty affects the
main results, introduce software that can help identify problems,
discuss implications, and recommend best practice for research
and policy. When the study design is such that the results could be
expressed in terms of either RRs or ORs, the burden is on the
Consider pairs of probabilities ?R0, R1?, with R0referring to the
baseline (control) risk and R1to the risk if treated. These risks
* Address correspondence to: Edward C. Norton, M3108 SPH II, 1415 Washington Heights, University of Michigan, Ann Arbor, MI 48109.
1098-3015/$36.00 – see front matter Copyright © 2012, International Society for Pharmacoeconomics and Outcomes Research (ISPOR).
Published by Elsevier Inc.
V A L U E I N H E A L T H X X ( 2 0 1 2 ) X X X
Available online at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/jval
are defined as probabilities of an event during a specified period
of time , and so are bounded by 0 and 1. For any pair of
probabilities ?R0, R1? we can define the RR, RD, and OR with R0as
We state two proofs and direct the reader to the Appendix in Sup-
plemental Materials found at http://dx.doi.org/10.1016/j.jval.
2012.06.001 for the details of those proofs. First, within a single
study, in which baseline risk is the same, treatment alternatives
will have the same ranking regardless of the risk measure used. In
other words, all three risk measures are strictly monotonic in
changes in risk to the treatment group R1. Second, if both R0and R1
represented by the pairs of points ?R0
ranked differently by the RR, RD, and OR. The total derivative of
each measure (with respect to both R0and R1) can have different
signs, indicating rank reversal. For details, see the Appendix in
A? and ?R0
B?, may be
It is easy to show rank reversal with graphs (see  for similar
graphs). Each pair of probabilities can be plotted on a unit square
graph. If treatment and baseline risks are the same, that is, R1?
R0, then the point falls along the 45-degree line. If the treatment
risk is higher than the baseline risk, then the point lies to the
northwest of the 45-degree line.
For any point ?R0, R1? on the graph, isoquants show all other
that all have the same value of a quantity, such as the RR, RD, or
OR. By definition, all three isoquants in this example must pass
with a slope of 1.5 ? 0.6/0.4. The RD isoquant is parallel to the
45-degree line, with the intercept on the y-axis equal to 0.2 ?
0.6?0.4. The OR isoquant is an arc connecting the origin to the
point (1, 1) such that the OR along the arc is always 2.25 ? ?0.6⁄
?1 ? 0.6?? ⁄ ?0.4 ⁄ ?1 ? 0.4??.
The intuition of the first proof is shown graphically by mov-
ing due north of (0.4, 0.6) in Figure 1. All those northerly points
lie on isoquants representing RR values greater than 1.5, RD
values greater than 0.2, and OR values greater than 2.25. Moving
due north is the graphical equivalent of taking a derivative with
respect to R1(positive change in R1, holding R0constant). Simi-
larly, moving due south of the point falls below all three iso-
quants (negative change in R1only). That intuition holds for any
pair of risks, including those below the 45-degree line or not on
the negative 45-degree line. For any fixed baseline risk R0, the
rankings of any two treatments will be the same for RR, RD,
The situation changes when comparing points that do not
share the same baseline risk R0. To make our point simpler, we
have redrawn Figure 1 without the RD isoquants, to focus only
on RR and OR (see Fig. 2). We can compare any point to (0.4, 0.6)
and ask: Is one RR higher than the other? Do the RRs and ORs
have the same ranking? If the other point lies to the northwest
of (0.4, 0.6) in the area marked A, then it will have higher values
of both RR and OR. If it lies to the southeast (large area marked
D), then it will have lower values of both RR and OR. In these
cases, both RR and OR rank these points the same; there is no
The interesting cases lie to the southwest and to the northeast
in the areas marked B and C. For example, take the point (0.6, 0.8)
of 1.5); as such, it lies below the RR isoquant. But it has a higher OR
than the point (0.4, 0.6) and lies above the OR isoquant. Therefore,
(0.6, 0.8) has a different ranking compared with (0.4, 0.6) when
using RR than when using OR.
We show two other examples, again focusing on just RR and
OR. Both examples lie along the same OR isoquant, and so only the
RR is changing. One has a tiny area B and a relatively large area C
than area C (see Fig. 4). These examples show that location on the
graph affects the probability that other points in the vicinity will
have rank reversal problems. When both risks are small, the
chance that the OR of another point nearby exceeds the RR is
Finally, to bring RD back into the picture, we return to Figure
1, which has all three isoquants. In this figure, we see that there
are eight regions, not four. In two of them (A and H), there are no
0.2.4 .6 .81
Isoquants for RR, RD, and OR through (.4,.6)
Fig 1 – Isoquants for the point (.4, .6). OR, odds ratio; RD,
risk difference; RR, risk ratio.
0.2 .4 .6 .81
Isoquants for RR and OR through (.4,.6)
Fig 2 – Isoquants for the point (.4, .6). OR, odds ratio; RR,
V A L U E I N H E A L T H X X ( 2 0 1 2 ) X X X
rank reversals. The remaining areas (B through G) have one
measure that disagrees with the other two measures about
which point has a higher numerical value. For example, points
in area E have a lower RR than (0.4, 0.6) but higher RD and OR.
When given the same data for an indirect comparison or meta-
analysis, researchers who report RR or OR could reach different
conclusions about which treatment is preferred, given that the
baseline risks are different. The areas B and F do not appear in
In summary, after drawing a point ?R0,R1? on the graph and
the corresponding three isoquants, only the points entirely
above all three isoquants or entirely below all three have the
same ranking on all measures. While the area of the graph with
the same ranking is always in the majority, in our experience
reading the literature, studies that are compared with each
other often lie broadly along the southwest to the northeast
corridor, making the choice of measure (RR, RD, or OR) impor-
tant in the final comparison (e.g., ).
as opposed to being estimated with error. In real clinical trials,
however, risks are estimated with uncertainty (with confidence
intervals or standard errors). This uncertainty does not change the
basic conclusions of this article, namely, that different risk mea-
sures can lead to different rank ordering. However, with uncer-
tainty, the comparisons of risk become probabilistic instead of
deterministic. That is, if one bootstrapped the rankings, taking
draws of each risk from the estimated distribution of risks, and
then computed the risk measures, the ranking may not always be
We have written Stata software that illustrates rank reversal for
any baseline risk ?R0? and risk if treated ?R1?. The program, called
graphiso, takes as input any risk pair ?R0, R1? and produces a
graph with isoquants for each of the three risk measures (RR,
RD, and OR). In addition, the program calculates the areas for
which each of the three risk measures generates a different
ranking. The areas can be interpreted as probabilities of rank
reversal only in the unlikely case of a uniform distribution of
risk. This program file and a help file are available from the
authors on request.
As an example, our Stata program can recreate Figure 1 for
?R0, R1? ? ?0.4, 0.6? and calculate areas with the command graphiso
.4 .6. The program displays the following table of areas:
Above RR, Above RD, Above OR 0.2933
Above RR, Above RD, Below OR 0.0000
Above RR, Below RD, Above OR 0.0239
Above RR, Below RD, Below OR 0.0161
Below RR, Above RD, Above OR 0.0267
Below RR, Above RD, Below OR 0.0000
Below RR, Below RD, Above OR 0.0239
Below RR, Below RD, Below OR 0.6161
Total area ? 1
All same 0.9095
OR different 0.0239
RR different 0.0428
RD different 0.0239
In this example, the two areas where all three measures agree on
the ranking (areas A and H) together comprise 90% of the area of
the unit square (see Fig. 1). Measuring risk by using an OR gener-
ates a ranking different from the other two measures in region G,
this risk pair—it is impossible for another point to have a lower OR
than (0.4, 0.6) while simultaneously having a higher RR and RD.
Best Practice Recommendations
Our results lead to several recommendations for best practice
when making indirect comparisons. First, given the conceptual
framework and study design, decide which measure is most ap-
propriate for the research. There is an extensive literature on the
differences between RR, RD, and OR (for guidance, see textbooks
by authors of [7,10–12] and articles [13–15]). For some study de-
signs, such as case-control studies, only the OR is possible. Atten-
As a default, we recommend using one of the risk measures in-
stead of the OR, unless there is a conceptual justification favoring
the OR. Second, for studies that have a study design that allows
appropriate reporting of either RRs or ORs , show whether the
different measures would lead to the same or different rankings.
Either the result is robust across measures or the authors should
0 .2.4 .6 .81
Isoquants for RR and OR through (.8,.9)
Fig 4 – Isoquants for the point (.8, .9). OR, odds ratio; RR,
0 .2.4 .6.81
Isoquants for RR and OR through (.1,.2)
Fig 3 – Isoquants for the point (.1, .2). OR, odds ratio; RR,
V A L U E I N H E A L T H X X ( 2 0 1 2 ) X X X
acknowledge that other measures would lead to different conclu- Download full-text
sions and discuss this finding in light of the conceptual model.
The federal government is currently advocating head-to-head
clinical trials, with several drugs being tested at once against the
same placebo. This design both avoids the problem of comparing
treatments with different placebos and also will increase dramat-
ical research. While the methods of Bucher et al.  will inform the
analysis of these trials, the results of this study demonstrate the
importance of the choice of risk measure when conducting indi-
rect comparisons. This study has several important conclusions.
First, we have two main mathematical results. The first is that if
comparisons are made within the same study to the same control,
then the three measures will always have the same ranking. The
more interesting result is that the choice of the risk measure mat-
ters when making indirect comparisons of treatments with differ-
ent baseline risks. This result holds regardless of whether or not
there is uncertainty about the risks.
Our results have strong policy implications. Because research-
ers often choose a risk measure out of convenience or habit, policy
decisions based on treatment ranking may be driven in part by
these arbitrary decisions. Researchers need to confront the prob-
lem by showing whether the results are sensitive to the choice of
risk measure and explain which measure is preferred for their
study and why. This supports the importance of the explicit state-
ment of a conceptual model and reference to that model in sup-
porting the choice of measures.
This brings us to our third main conclusion: nonlinear func-
tions are challenging to understand. The prior debate in the liter-
ature between RRs and ORs has often focused on the magnitude of
able information. RRs and ORs do not differ in their sign (direc-
tion), only in their numerical magnitude and therefore in their
interpretation. Here we show a new problem. When making indi-
rect comparisons, RRs and ORs can lead to opposite conclusions
regarding which option to favor. Nonlinear functions, such as RRs
and ORs, emphasize different aspects of risk and have different
properties; these differences are consequential.
David Hutton provided helpful comments on an earlier draft.
Source of financial support: The authors gratefully acknowl-
edge funding from AHRQ (Grant 1R18HS018032), NIH/NCRR
(3UL1RR029887), and NIH/NCRR (3UL1RR029887-03S1).
Supplemental material accompanying this article can be found in
the online version as a hyperlink at http://dx.doi.org/10.1016/j.
jval.2012.06.001 or, if a hard copy of article, at www.valueinhealth
journal.com/issues (select volume, issue, and article).
R E F E R E N C E S
 Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and
indirect treatment comparisons in meta-analysis of randomized
controlled trials. J Clin Epi 1997;50:683–91.
 Song F, Glenny A-M, Altman DG. Indirect comparison in evaluating
relative efficacy illustrated by antimicrobial prophylaxis in colorectal
surgery. Controlled Clin Trials 2000;21:488–97.
 Glenny AM, Altman DG, Song F, et al. Indirect comparisons of
competing interventions. Health Technol Assess 2005;9:1–134.
 Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect
comparison for estimating efficacy of competing interventions:
empirical evidence from published meta-analyses. BMJ 2003;326:472–5.
 Eckermann S, Coory M, Willan AR. Indirect comparison: relative risk
fallacies and odds solution. J Clin Epidemiol 2009;62:1031–6.
 Hasselblad V, Kong DF. Statistical methods for comparison to placebo
in active-control trials. Drug Information J 2001;35:435–49.
 Rothman KJ, Greenland S, Lash TL. Modern Epidemiology 3rd ed.).
Philadelphia: Lippincott Williams & Wilkins, 2008.
 Deeks JJ. Issues in the selection of a summary statistic for meta-
analysis of clinical trials with binary outcomes. Stat Med 2002;21:1575–
 Matchar DB, McCrory DC, Orlando LA, Patel MR. Systematic review:
comparative effectiveness of angiotensin-converting enzyme
inhibitors and angiotensin II receptor blockers for treating essential
hypertension. Ann Intern Med 2008;148:16–29.
 Oleckno WA. Epidemiology: Concepts and Methods. Long Grove, IL:
Waveland Press, Inc., 2008.
 Woodward M. Epidemiology: Study Design and Data Analysis (2nd
ed.). Boca Raton, FL: Chapman & Hall/CRC, 2005.
 Rothman KJ. Epidemiology: An Introduction. New York: Oxford
University Press, 2002.
 Cornfield J. A method for estimating comparative rates from clinical
data: applications to cancer of the lung, breast, and cervix. J Natl
Cancer Inst 1951;11:1269–75.
 Greenland S. Interpretation and choice of effect measures in
epidemiologic analyses. Am J Epidemiol 1987;125:761–8.
 Walter SD. Choice of effect measure for epidemiological data. J Clin
 Klaidman S. How well the media report health risk. Daedalus
 Teuber A. Justifying risk. Daedalus 1990;119:235–54.
 Altman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when
events are common [Letter]. BMJ 1998;317:1318.
 Bier VM. On the state of the art: risk communication to the public.
Reliabil Eng System Saf 2001;71:139–50.
 Kleinman LC, Norton EC. What’s the risk? A simple approach for
estimating adjusted risk ratios from nonlinear models including
logistic regression. Health Serv Res 2009;44:288–302.
 Yelland LN, Salter AB, Ryan P. Relative risk estimation in randomized
controlled trials: a comparison of methods for independent
observations. Int J Biostat 2011;7:Article 5,1–31.
 Sackett DL, Deeks JJ, Altman DG. Down with odds ratios! Evid Based
 Greenland S, Holland P. Estimating standardized risk differences from
odds ratios. Biometrics 1991;47:319–22.
V A L U E I N H E A L T H X X ( 2 0 1 2 ) X X X