A simple method for estimating relative risk using logistic regression.
ABSTRACT Odds ratios (OR) significantly overestimate associations between risk factors and common outcomes. The estimation of relative risks (RR) or prevalence ratios (PR) has represented a statistical challenge in multivariate analysis and, furthermore, some researchers do not have access to the available methods. Objective: To propose and evaluate a new method for estimating RR and PR by logistic regression.
A provisional database was designed in which events were duplicated but identified as nonevents. After, a logistic regression was performed and effect measures were calculated, which were considered RR estimations. This method was compared with binomial regression, Cox regression with robust variance and ordinary logistic regression in analyses with three outcomes of different frequencies.
ORs estimated by ordinary logistic regression progressively overestimated RRs as the outcome frequency increased. RRs estimated by Cox regression and the method proposed in this article were similar to those estimated by binomial regression for every outcome. However, confidence intervals were wider with the proposed method.
This simple tool could be useful for calculating the effect of risk factors and the impact of health interventions in developing countries when other statistical strategies are not available.

Article: Transition overtime in household latrine use in rural Bangladesh: a longitudinal cohort study.
[Show abstract] [Hide abstract]
ABSTRACT: In a lowincome country like Bangladesh, where the poverty rate is higher in rural compared to urban areas, the consistent use of sanitary latrines over time is a challenge. To address this issue, the Water, Sanitation, and Hygiene (WASH) program of the Bangladesh Rural Advancement Committee (BRAC) was devised to improve health of the rural poor through enhanced sanitation services, such as by providing loans or education. Sanitary latrine use in households and changes over time were assessed in this study.BMC Public Health 07/2014; 14(1):721. · 2.08 Impact Factor  Mohammad Hasan Namazi, Habibollah Saadat, Morteza Safi, Hossein Vakili, Saeed Alipourparsa, Mohammadreza Bozorgmanesh, Habib Haybar[Show abstract] [Hide abstract]
ABSTRACT: The aim of this study was to examine the hypothesis that pentraxin 3 (PTX3) can have a diagnostic value for predicting anatomical complexity of coronary artery stenosis as measured by the Synergy between PCI with Taxus and Cardiac Surgery (SYNTAX) score.Korean Circulation Journal 07/2014; 44(4):2206.  SourceAvailable from: Leonardo Bastos
Article: Obtaining adjusted prevalence ratios from logistic regression model in crosssectional studies
[Show abstract] [Hide abstract]
ABSTRACT: In the last decades, it has been discussed the use of epidemiological prevalence ratio (PR) rather than odds ratio as a measure of association to be estimated in crosssectional studies. The main difficulties in use of statistical models for the calculation of PR are convergence problems, availability of adequate tools and strong assumptions. The goal of this study is to illustrate how to estimate PR and its confidence interval directly from logistic regression estimates. We present three examples and compare the adjusted estimates of PR with the estimates obtained by use of logbinomial, robust Poisson regression and adjusted prevalence odds ratio (POR). The marginal and conditional prevalence ratios estimated from logistic regression showed the following advantages: no numerical instability; simple to implement in a statistical software; and assumes the adequate probability distribution for the outcome.09/2014;
Page 1
RESEARCH ARTICLE Open Access
A simple method for estimating relative risk
using logistic regression
Fredi A DiazQuijano
Abstract
Background: Odds ratios (OR) significantly overestimate associations between risk factors and common outcomes.
The estimation of relative risks (RR) or prevalence ratios (PR) has represented a statistical challenge in multivariate
analysis and, furthermore, some researchers do not have access to the available methods. Objective: To propose
and evaluate a new method for estimating RR and PR by logistic regression.
Methods: A provisional database was designed in which events were duplicated but identified as nonevents.
After, a logistic regression was performed and effect measures were calculated, which were considered RR
estimations. This method was compared with binomial regression, Cox regression with robust variance and
ordinary logistic regression in analyses with three outcomes of different frequencies.
Results: ORs estimated by ordinary logistic regression progressively overestimated RRs as the outcome frequency
increased. RRs estimated by Cox regression and the method proposed in this article were similar to those
estimated by binomial regression for every outcome. However, confidence intervals were wider with the proposed
method.
Conclusion: This simple tool could be useful for calculating the effect of risk factors and the impact of health
interventions in developing countries when other statistical strategies are not available.
Keywords: Logistic regression, Odds ratio, Prevalence ratio, Relative risk.
Background
The odds ratio (OR) is commonly used to assess asso
ciations between exposure and outcome and can be esti
mated by logistic regression, which is widely available in
statistics software. OR has been considered an approxi
mation to the prevalence ratio (PR) in crosssectional
studies or the risk ratio (RR, which is mathematically
equivalent to PR) in cohort studies or clinical trials.
This is acceptable when the outcome is relatively rare (<
10%). However, since many health outcomes are com
mon, the interpretation of OR as RR is questionable
because OR overstates RR, sometimes dramatically [13].
Moreover, the OR has been considered an “unintelligi
ble” effect measure in some contexts [3].
Binomial regression has been recommended for the
estimation of RRs (and PRs) in multivariate analysis [4].
However, sometimes this statistical method cannot esti
mate RR because convergence problems are frequent.
Therefore, the Cox regression with robust variance has
been recommended as a suitable method for estimating
RRs [5,6].
However, these statistical methods (binomial and Cox
regression) are not widely available in freeware (such as
Epidat or EpiInfo). Therefore, the ability to estimate
PRs and RRs in multivariate models could be limited in
research groups with scant resources. In this article, a
strategy for estimating RRs with ordinary logistic regres
sion is proposed. This new method could be useful for
identifying risk factors and estimating the impact of
health interventions in developing countries.
Methods
Database
A database of 1000 observations with dichotomous vari
ables was created to simulate a cohort study in which a
common event (incidence of 50%) would be strongly
Correspondence: frediazq@msn.com
Grupo Latinoamericano de Investigaciones Epidemiológicas, Organización
Latinoamericana para el Fomento de la Investigación en Salud (OLFIS),
Bucaramanga, Colombia
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
© 2012 DiazQuijano; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Page 2
related to two independent predictors (A and B). These
predictors would also be statistically associated with one
another, resulting in a moderate confounding effect.
Then, a third independent variable with a prevalence of
40% was included (predictor C). This variable was ran
domly distributed, but more often in positive than nega
tive predictor A group. Thus, this variable was
statistically associated with the outcome in a univariate
analysis but the association would be explained by the
presence of predictor A in a multivariate model. Finally,
additional dependent variables were generated by ran
domly selecting a proportion of cases. Thus, outcome
variables with frequencies of 20% and 5% were obtained.
The first table shows the hypothetical distribution of
subjects according to the predictors and outcomes
(Table 1).
Statistical analysis
Statistical analysis was performed using STATA soft
ware (STATA®/IC 11.0). RRs and 95% confidence inter
vals (CI) were estimated by applying logbinomial
regression and Cox regression with a constant in the
time variable [6]. In order to obtain corrected CIs by
Cox regression, the robust variance option was applied
[7]. ORs and their correspondent CIs were also esti
mated using an ordinary logistic regression. After uni
variate estimations were calculated, ORs and RRs were
obtained in multivariate models including all indepen
dent variables (predictors A, B and C).
Proposed modification to logistic regression analysis
The logbinomial model is similar to logistic regression
in assuming a binomial distribution of the outcome.
However, in a logistic regression the link function is the
logarithm of the odds, which is the ratio between cases
and noncases, while in binomial regression the link
function is the logarithm of the proportion, i.e., the ratio
between cases and cases plus noncases [4].
In a binomial regression model with k covariates, the
function is written as:
?a/(a + b)?
where a is the number of cases and b is the number
of noncases, and Xithe covariates. Thus, a/(a + b) is
the probability of success (e. g., the proportion of sick
persons in a group), and the RR (or PR) estimated of a
given covariate Xiis ebi.
On the other hand, in a logistic regression model, the
function is written as:
?a/b?
where a/b is the odds of success and the OR estimated
of a given covariate Xiis ebi.
In order for the case information to be included in the
denominator of the estimates in a logistic regression, all
observed cases were duplicated in a provisional database
and identified as noncases. Thus, a number of observa
tions was included equaling that of the cases and con
taining the same information about the covariates. Thus,
this new logistic function could be written as:
?a/(y)?
where y includes noncases as well as cases, although
all of them are identified as noncases. Afterwards, a
logistic regression procedure was performed with the
modified dataset. The “ORs” obtained were considered
direct estimations of RRs because bidefined the rela
tionship between Xiand the Log [a/(y)], which in this
model would be mathematically similar to Log [a/(a +
b)] of the logbinomial model. For each outcome, a pro
visional database was prepared.
This strategy for logistic regression recognizes an
entire cohort as controls. This trick is innovative but
analogous to the analysis of casecohort studies. In that
design, cases of a particular outcome are compared with
Log= β0+ β1X1+ ... + βkXk
Log= β0+ β1X1+ ... + βkXk
Log= β0+ β1X1+ ... + βkXk
Table 1 Hypothetical distribution of subjects according to the predictors and outcome incidence
High incidence
(50%)
Intermediate
incidence (20%)
Low incidence
(5%)
Independent
Variable
Cases
(n = 500)
Noncases
(n = 500)
Cases
(n = 200)
Noncases
(n = 800)
Cases
(n = 50)
Noncases
(n = 950)
Total
(n = 1000)
Predictor A
positive
negative
409
91
191
309
161
39
439
361
45
5
555
395
600
400
Predictor B
positive
negative
398
102
102
398
159
41
341
459
36
14
464
486
500
500
Predictor C
positive
negative
227
273
173
327
84
116
316
484
23
27
377
573
400
600
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
Page 2 of 6
Page 3
a sample (subcohort) of the entire cohort that gave rise
to all cases [8]. The objective of selecting this sub
cohort is to estimate the frequency of exposure in the
entire cohort. For this reason, such studies have also
been called caseexposure studies [9].
This subcohort may include some cases, which would
consequently be overrepresented in the analysis. Then,
by comparing the frequency of exposure between the
cases and the subcohort set, we obtain a direct estimate
of RR (not OR) [911]. Similarly, in the method pro
posed here, the cases would be compared against the
entire cohort and thus all cases would be overrepre
sented. This affects the variance of the estimates and for
this reason the CIs are wider [11]. Therefore, an infla
tion factor for the Standard Error (SE) of each predictor
and outcome incidence was calculated as the ratio
between SE obtained with the proposed method and SE
resulting from binomial regression (as reference
method).
Results
For the rarer event (incidence of 5%), RRs estimated by
logbinomial were similar to those calculated both by
the Cox regressions and the proposed method (modified
logistic regression) (Table 2). Few differences were iden
tified among the CIs of RRs: CIs from the modified
method were wider than those estimated by logbino
mial and Cox regression with the robust variance
option. ORs estimated by ordinary logistic regression
were close to RR values. Predictors A and B were statis
tically associated with the outcome in univariate analysis
but only A was independently associated in the multi
variate model (Table 2).
For the second and third outcomes, with incidences of
20% and 50% respectively, the differences between RRs
in logbinomial regression and ORs in ordinary logistic
regression were more evident (Tables 3 and 4). This was
especially remarkable for the commonest event, where
the ORs of predictors A and B were at least twice the
corresponding RR values (Table 4).
On the other hand, RRs estimated in Cox regressions
and modified logistic regression were similar or virtually
identical to those estimated by logbinomial regression.
However, the CIs outputted by the proposed method
were wider than those obtained by the other models
(Tables 3 and 4). Consequently, the SE inflation factor
rose for each predictor as the outcome frequency
increased (Figure 1).
Discussion
The use of an adjusted odds ratio to estimate an
adjusted relative risk or prevalence ratio is appropriate
for studies of rare outcome but may be misleading when
the outcome is common. Such overestimation may inap
propriately affect clinical decisionmaking or policy
development [3]. For example, overestimation of the
importance of a risk factor may lead to unintentional
errors in the economical analysis of potential interven
tion programs or treatment, which could be particularly
harmful in developing countries.
The ordinary logistic model estimates OR (not RR)
and was initially adapted for casecontrol studies since
data from this type of study design can only determine
OR [12]. Moreover, a casecontrol study is an optimal
choice for analyzing rareevent risk factors, for which
OR is a close approximation of RR. Thus, ordinary logis
tic regression is eminently useful for case control stu
dies mainly because the numeric value of OR mimics
RR [12].
On the other hand, RR and PR can be directly deter
mined from data based on cohort and crosssectional
studies, respectively, which are practical only for rela
tively common outcomes. However, in such circum
stances OR estimated by ordinary logistic regression will
Table 2 RRs and ORs and corresponding CIs of associations between a rare event (incidence = 5%) and three
independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox regression with robust
variance and logistic regression with the proposed modification
Independent
variableregression: RR
(CI) (CI)
Logbinomial Logistic
regression: OR
Cox
regression 
robust: RR (CI)
Modified
Logistic
regression: RR (CI)
Predictor A
Unadjusted
Adjusted *
6 (2.4  14.98)
4.96 (1.89  12.98)
6.41 (2.52  16.28)
5.26 (1.97  14.06)
6 (2.4  14.99)
4.97 (1.91  12.92)
6 (2.36  15.25)
4.99 (1.86  13.34)
Predictor B
Unadjusted
Adjusted *
2.57 (1.4  4.71)
1.59 (0.85  2.97)
2.69 (1.43  5.06)
1.64 (0.85  3.18)
2.57 (1.4  4.71)
1.59 (0.84  3.01)
2.57 (1.37  4.83)
1.59 (0.82  3.09)
Predictor C
Unadjusted
Adjusted *
1.28 (0.74  2.2)
0.98 (0.57  1.69)
1.29 (0.73  2.29)
0.97 (0.54  1.74)
1.28 (0.74  2.2)
0.97 (0.57  1.65)
1.28 (0.72  2.26)
0.96 (0.54  1.72)
* Adjusted by the other independent variables.
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
Page 3 of 6
Page 4
be more discrepant than RR (or PR). This was exempli
fied in the results of this paper in that ORs progressively
overestimated RRs as the outcome frequency increased.
Indeed, OR will always be greater than RR if RR is
greater than 1 (adverse event) and OR will also be less
than RR if RR less than 1 (protective effect). Therefore,
the uncritical application of logistic regression and the
misinterpretation of OR as RR can lead to serious errors
in determination of both the importance of risk factors
and the impact of interventions on clinical practice and
public health [13].
For these reasons, several strategies for estimating RRs
in multivariate analysis have been proposed [7,1416].
Binomial regression is considered the most adequate
choice. However, binomial models often predict prob
abilities greater than one and sometimes this regression
cannot find possible values and converge in a model.
Consequently, other alternative methods have been
proposed when binomial regression cannot converge in
a model. Cox regression with robust variance using a
constant in the time variable seems like a good alterna
tive [7]. However, these options and other statistical
alternatives are only available in sophisticated software
that some research groups cannot afford.
This paper presents a strategy for logistic regression
that recognizes an entire cohort as controls. As the
results show, this method can appropriately estimate
RRs or PRs, even in analyses with common outcomes.
Moreover, the method proposed in this article could be
easily performed using free statistics programs that
include only logistic regression for multivariate analysis
of dichotomous outcomes.
However, the proposed method is associated with SE
inflation, which increases confidence intervals. A simple
and practical correction factor cannot be established for
this problem because, in a multivariate regression, the
Table 3 RRs and ORs and corresponding CIs of associations between an intermediate frequency event (incidence =
20%) and three independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox
regression with robust variance and logistic regression with the proposed modification
Independent
variable regression: RR
(CI)
LogbinomialLogistic
regression: OR
(CI)
Cox
regression 
robust: RR
(CI)
Modified
Logistic
regression: RR
(CI)
Predictor A
Unadjusted
Adjusted *
2.75 (1.99  3.81)
1.79 (1.27  2.52)
3.39 (2.33  4.95)
2.06 (1.36  3.12)
2.75 (1.99  3.81)
1.77 (1.26  2.48)
2.75 (1.9  3.99)
1.75 (1.16  2.64)
Predictor B
Unadjusted
Adjusted *
3.88 (2.82  5.34)
3.15 (2.24  4.43)
5.22 (3.6  7.56)
4.07 (2.75  6.03)
3.88 (2.82  5.34)
3.15 (2.26  4.39)
3.88 (2.69  5.59)
3.15 (2.13  4.65)
Predictor C
Unadjusted
Adjusted *
1.09 (0.85  1.4)
0.92 (0.72  1.17)
1.11 (0.81  1.52)
0.89 (0.63  1.25)
1.09 (0.85  1.4)
0.92 (0.72  1.17)
1.09 (0.8  1.48)
0.93 (0.67  1.28)
* Adjusted by the other independent variables.
Table 4 RRs and ORs and corresponding CIs of associations between a common event (incidence = 50%) and three
independent variables, estimated by Logbinomial regression, ordinary logistic regression, Cox regression with robust
variance and logistic regression with the proposed modification
Independent
variableregression: RR
(CI)
(CI)
LogbinomialLogistic
regression: OR
Cox
regression 
robust: RR
(CI)
Modified
Logistic
regression: RR
(CI)
Predictor A
Unadjusted
Adjusted *
3 (2.48  3.62)
1.9 (1.58  2.28)
7.27 (5.44  9.72)
4.07 (2.88  5.74)
3 (2.48  3.62)
1.89 (1.56  2.28)
3 (2.31  3.89)
1.88 (1.41  2.51)
Predictor B
Unadjusted
Adjusted *
3.9 (3.26  4.67)
3.08 (2.56  3.72)
15.23 (11.19  20.71)
10.97 (7.95  15.14)
3.9 (3.26  4.67)
3.09 (2.56  3.72)
3.9 (3.04  5.01)
3.09 (2.36  4.04)
Predictor C
Unadjusted
Adjusted *
1.25 (1.1  1.41)
1.02 (0.95  1.1)
1.57 (1.22  2.03)
1.12 (0.8  1.57)
1.25 (1.1  1.41)
1.05 (0.96  1.15)
1.25 (1  1.55)
1.06 (0.84  1.34)
* Adjusted by the other independent variables.
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
Page 4 of 6
Page 5
standard error for each predictor depends on its correla
tion with all variables included in the model.
Therefore, since the obtained CIs can be wider than
those estimated by other models, investigators must be
aware that the risk of Type II error could be higher. For
this reason, when an association is not statistically sig
nificant with the proposed method, ordinary logistic
regression could be used for testing the hypothesis that
association measure is different than unity. This is possi
ble since the null hypothesis is mathematically equiva
lent for both OR and RR, because when RR is equal to
1, OR is also equal to 1.
Conclusion
The proposed method may be useful for estimating RRs
or PRs appropriately in analysis of common outcomes.
However, because the resultant CIs are wider than those
derived from other methods, this strategy should be
employed when logistic regression is the only method
available. This new method may help research groups
from developing countries where access to sophisticated
programs is limited.
Abbreviations
CI: Confidence interval; OR: Odds ratio; PR: Prevalence ratio; RR: Relative risk;
SE: Standard Error
Authors’ contributions
FAD conceived the study, created the database, designed and executed the
analysis, and wrote the manuscript.
Competing interests
The author declares that they have no competing interests.
Received: 1 August 2011 Accepted: 15 February 2012
Published: 15 February 2012
References
1.McNutt LA, Wu C, Xue X, Hafner JP: Estimating the relative risk in cohort
studies and clinical trials of common outcomes. Am J Epidemiol 2003,
157:9403.
2.Zhang J, Yu KF: What’s the Relative Risk? A Method of Correcting the
Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998,
280:16901691.
3. Pearce N: Effect measure in prevalence studies. Environ Health Perspect
2004, 112:10471050.
4.Wacholder S: Binomial regression in GLIM: estimating risk ratios and risk
differences. Am J Epidemiol 1986, 123:174184.
5. Nijem K, Kristensen P, AlKhatib A, Bjertness E: Application of different
statistical methods to estimate risk for selfreported health complaints
among shoe factory workers exposed to organic solvents and plastic
compounds. Norsk Epidemiologi 2005, 15:111116.
6.Lee J, Chia KS: Estimation of prevalence rate ratios for cross sectional
data: an example in occupational epidemiology. Br J Ind Med 1993,
50:861862.
7.Barros AJD, Hirakata VN: Alternatives for logistic regression in cross
sectional studies: an empirical comparison of models that directly
estimate the prevalence ratio. BMC Med Res Methodol 2003, 3:21.
8. Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K: Casecohort design in
practice  experiences from the MORGAM Project. Epidemiol Perspect
Innov 2007, 4:15.
9.Flanders WD: Limitations of the caseexposure study. Epidemiology 1990,
1:3438.
10. Sato T: Estimation of a common risk ratio in stratified casecohort
studies. Stat Med 1992, 11:1599605.
11. Sato T: Risk ratio estimation in casecohort studies. Environ Health Perspect
1994, 102(Suppl 8):536.
12.Lee J, Tan CS, Chia KS: A practical guide for multivariate analysis of
dichotomous outcomes. Ann Acad Med Singapore 2009, 38:714719.
Figure 1 Inflation Factor of Standard Error (SE) for each predictor according to incidence of outcome.
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
Page 5 of 6
Page 6
13. Schwartz LM, Woloshin S, Welch HG: Misunderstandings about the effects
of race and sex on physicians’ referrals for cardiac catheterization. N
Engl J Med 1999, 341:27983.
Localio AR, Margolis DJ, Berlin JA: Relative risks and confidence intervals
were easily computed indirectly from multivariable logistic regression. J
Clin Epidemiol 2007, 60:874882.
Thompson ML, Myers JE, Kriebel D: Prevalence odds ratio or prevalence
ratio in the analysis of cross sectional data: what is to be done? Occup
Environ Med 1998, 55:272277.
Coutinho LM, Scazufca M, Menezes PR: Methods for estimating prevalence
ratios in crosssectional studies. Rev Saude Publica 2008, 42:992998.
14.
15.
16.
Prepublication history
The prepublication history for this paper can be accessed here:
http://www.biomedcentral.com/14712288/12/14/prepub
doi:10.1186/147122881214
Cite this article as: DiazQuijano: A simple method for estimating
relative risk using logistic regression. BMC Medical Research Methodology
2012 12:14.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at
www.biomedcentral.com/submit
DiazQuijano BMC Medical Research Methodology 2012, 12:14
http://www.biomedcentral.com/14712288/12/14
Page 6 of 6
Supplementary resources (1)

Methods4RRinCommonBinaryOutcomes