Afe Babalola University

Question

Asked 31 January 2015

# Is it weird to get a very big odds ratio in logistic regression?

I estimated logit using enter method and one of the odds is of 3962.988 with sig. 0.000.

And another model, estimated using forward stepwise (likelihood ratio), produced odds ratio of 274.744 with sig. 0.000.

Total N is 180, missing 37. The model is fitted based on Omnibus and Hosmer & Lemeshow. 2LL is ok.

Is there anything wrong? Please assist.

## Most recent answer

These answers are very helpful. I just had a similar problem with my results

## Popular answers (1)

National University of Singapore

Yes, getting a large odds ratio is an indication that you need to check your data input for:

1. Outliers

2. Amount of Missing Values and handle the missing values

3. The metric used for the analysis may need to be changed for example from 'cents' to 'dollars'.

4. The way in which the data was coded may be incorrect and may need to be

changed.

5. If the standard deviation is too large may need to segment and perform your

analysis on segments.

The above 5 suggestions are examples of how to manage large odds ratio results, to make them more meaningful.

I hope this helps!

Kind Regards

Carol

29 Recommendations

## All Answers (46)

Sri Venkateshwaraa Medical College Hospital and Research Centre

Hi,

Odds radio wouldn't come very huge value. Kindly check your analysis and find out the 95% Confidence Interval where the odds ratio value lie in between the 95% C.I or not.

Out of 180, 37 missing value were also there in your data. Kindly reduce it. That's why you got this huge odds ratio value. Kindly do again.

All the best.

Universiti Teknologi MARA

Hi Senthilvel Vasudevan adn Kelvyn Jones...thank you for the reply.

When I run the analysis, I didnt take out the outliers at the first place, and also I didnt consider the 'studentized' ( i use SPSS) to auto-remove the outliers. I keep the outliers but on the same time it didnt violate the logit assumptions.

I was thinking to re-run a second model and omit the outliers based on the residual of the first model, says residual within >-2.5 and > 2.5

FYI, the independent variables is financial ratios, so it is a real data...no multicollinearity probs identified. The odds ratio lies in between CI.

Makerere University Lung Institute, Kampala Uganda

Indeed there is something wrong!! This often happens if you have very few numbers in one of the comparison groups. Of course the 95% CI is also often huge, a sign of low precision and hence a very weak power to detect a difference. Clean up your data for missing 37 or use exact tests like Fisher's exact. The problem with logistic regression is that it assumes a parametric distribution which is unstable for small numbers!!

Hope this is helpful..

National University of Singapore

Yes, getting a large odds ratio is an indication that you need to check your data input for:

1. Outliers

2. Amount of Missing Values and handle the missing values

3. The metric used for the analysis may need to be changed for example from 'cents' to 'dollars'.

4. The way in which the data was coded may be incorrect and may need to be

changed.

5. If the standard deviation is too large may need to segment and perform your

analysis on segments.

The above 5 suggestions are examples of how to manage large odds ratio results, to make them more meaningful.

I hope this helps!

Kind Regards

Carol

29 Recommendations

Oakland University

How many predictor variables do you have? I saw something like this when I ran a model with several variables were very correlated with each other. Check the VIF/Tolerance for each predictor variable if you have more than one.

1 Recommendation

Universiti Teknologi MARA

Hi Levi Mugenyi, Carol Hargreaves and Andrew Ekstrom....thank you for your reply.

I re-run the data after I omitting the outliers and missing values. However, the odds is still very large (for variable TLTA), even getting larger. Seems that only TLTA produced weird odds while the rest produced acceptable (i guess so) odds between 9-18.

I have 10 predictors, all are in ratios form (financial data). VIF is low, all are lesser than 3.0.

In theoretical perspective/past literature, TLTA has a very big influence in predicting.

Ironically, not many papers reveal their odd ratios. How important it is to discuss about odds?

Thank you once again.

Harvard University

it is possible that the outcome is a virtual certainty (in your data, at least) above a certain value of the predictor. consider categorizing (at least to explore) or using exact methods, as suggested above.

1 Recommendation

Florida International University

I agree with Andrew and Carol. Outliers and variables that are highly correlated with each other are primary reasons for an inflated Odds Ratio.

Florida International University

Are you predictors highly correlated with each other? This is collinearilty. I would suggest you read about the variables for finance that should not be in the same model. Usually you have one financial variable and the covariates are demographic.

Universiti Teknologi MARA

Hi Joan Vaccaro,

Thanks for reply. The most highest variable that is correlate is only 0.5, which is less than 0.8 (Is 0.8 a good benchmark?). While all of the VIF are less than 0.3...obviously this is not a multicollinearity problem, isnt it? However, using factor analysis, 3 variables fall under a same category (and these variables are among those that highly correlate i.e. 0.5). Is this the reason?

Please advice. Thanks.

Florida International University

I suggest separate models for those three variables. Would this fit your financial question?

1 Recommendation

Universiti Teknologi MARA

Dear Joan Vaccaro,

Yes, I tried it but need to adjust some research questions, maybe. Good suggestion. Thanks !

University of Bristol

The underlying problem is that you do not have sufficient information to fit the models that you are attempting . You are in effect modelling a cross tabulation with the outcome in two cells (yes/No) and 3 predictors which are at a minimum 2 cells each. So the full cross-tabulation is 2 by 2 by 2 by 2 that is 16 cells. [I know that the data do not actually look like this.] If the data were completely balanced and there is no collinearity between the X's and with 140 non-missing observations - you will have less than 10 bits of information in each cell. - that is not very much. You are then using an automatic procedure that could ( will!) capitalize on chance results and the estimates will be unreliable - so frankly you should have no faith in what you have found and that is what the implausible value are telling you.

22 Recommendations

Jouf University

It might be possible that predictor variables are present in you all outcome cases. For example your aim is to determine risk factor of death in a village and you tested poverty as a factor. If poverty is present in all death cases then you will get very high odds ratio in logit.

1 Recommendation

University of Chile

A very big Odds Ratio should always lead to a couple of questions whose answers are in the de design and data:

Is it no more than a a mathematical outcome?.

Does it make biological sense?.

Is it the sample size sufficient enough to exclude alfa error?.

Finally, is it reproducible?. It may be just a serendipity.

My view is that no matter the biostatistical method you used, a too large OR has to be regarded with extreme caution.

Best regards.

3 Recommendations

University of North Bengal

I got odd with 18.14 (95%CI 2.04 - 27.24), Hosmer Lemeshow= 19.49; p=0.01; Nagelkerke= 0.41 which is significant at p<0.001.

N=992

Missing= 0

I have used enter method crude logistic regression

please comment on it.

Yes those odds ratios are implausible and I think the main reason is that one of the outcome categories is so small (probably the number of cases; the yes's) leading to rare events outcome. Conventional logistic regression may not be a good option for such kind of data and there are a number of alternatives including exact estimation, Bayesian logistic regression, penalized estimation which is also called the Firth method. The exact estimation and the Firth methods are both implemented in SAS but the exact method requires lot of memory space to be executed. Try these methods I can guarantee that your odds ratios will be much better

4 Recommendations

Maastricht University

Hi all,

Follow this question, I would like to ask the plausible values for odds ratio. I got an estimate 8 with 95% CI (5, 13.4). Does it seem usual?

Tran.

1 Recommendation

University of Bristol

It depend on the context; is it plausible to you|? It is not a thing that can be answered technically - technically it is a large effect with wide confidence intervals but there looks to be an effect - the lower CI is 5- but does it make sense to you and who reads your papers that you are trying to publish - can you justify it.

3 Recommendations

Lakehead University Thunder Bay Campus

Roslina, if you are still following this thread, here are a couple of articles that I think you'll find helpful.

2 Recommendations

University of North Bengal

I have solved a similar problem by correcting my reference used.

other checks are...

once again going through data sets

run the MLR by adding and removing some variables

1 Recommendation

Shanghai Municipal Center for Disease Control and Prevention

@Tauqeer Hussain Mallhi ,I met the same situation you mentioned.How to solve it ? I am intending to exclude the factor from logistic regression.Is it acceptable? Any reference paper? Thanks!

Sunway University

Yeah, getting too high OR is unusual situation and shows some mistakes in handling dataset. So, you need to revisit and reanalyze your dataset.

4 Recommendations

Florida International University

Feng, presenting select data is not acceptable. If this factor was critical in you hypothesis as a potential confounder, it should not be removed. If the variable is not well-constructed, then it has to be removed and a limitation stated in your publications.

The Leprosy Mission International - Bangladesh

I got an OR 46.621, 95% CI 22.279, 97.56. My sample is 410 without any missing data. Is it okay?

University of Niš

Rishad, did you solve the problem with very high OR? Now I have same situation.

2 Recommendations

Federal Medical Centre, Asaba, Delta, Nigeria

I have similar problem. So how does one code the data to eliminate such very large OR??

University of North Bengal

my other observation says such high odd value could be due to not having enough cases in the category identified.

Re-checking dataset will be helpful.

re-categorization is also helpful.

Also try different statistical approach, like I was trying adjusted logistic regression and later I changed it with crude logistic regression.

Oklahoma State University - Center for Health Sciences

If your independent variable is continuous, consider the scale and range of that value. Imagine that age is the variable and the OR is 1.5, which means the odds of the outcome go up by 1.5 for every 1 unit (year) of age. Now imagine that the independent variable is a serum cytokine concentration with a range of 0.25-1.20 nM and an OR of 15. The OR means that for every 1 unit (nM) of the cytokine, the odds of the outcome measure increase by 15. But here, an increase of 1 may be nearly out of the realm of possibility. Thus the OR is not very interpretable. In this case, you can change the scale of the independent variable by multiplying by 10. Now your OR is 1.5 which means for every 0.1 increase in the cytokine concentration, the odds go up by 1.5. Of course, you should still closely examine your data, check your assumptions, and test the effects of outliers.

5 Recommendations

University of Bristol

If the intercept (the parameter associated with the constant) is high the outcome for that sort of person is very common. If it is very low, the outcome is rare. For interpretation, it depends on 'who' the constant is , that is when all the Xs are zero and that may be outside the range of the data, and not a very meaningful value, especially if there are continuous Xs in the model. You need to think carefully who is your base category against which the relative odds are in effect being compared, and I routinely centre continuous variables around their average, so that the constant represent a typical person.

You may want to look at

And when you exponentiate to get the relatibe odds you only do it for the regression Betas not the intercept which is treated as the base, so from an odds perspective you can ignore the intercept but from a logit and probability perspective you need to take account of the estimated intercept

3 Recommendations

Ahmadu Bello University

If the regression constant is high, what of the metrics? {That "constant" is the threshold value of the response variable (or intercept) when all the independent variables (i.e., IVs' unique effects) are set to zero.}

Now, if the odd [constant effect of a predictor X,] on the likelihood that "one outcome" will occur, is large [value] it indicates it's very strongly likely! But a much greater than 1000 odds, with P-value of 0.000000? is very unrealistic but significant. The coding may be misleading or wrong, data dispersed, and data entrying processes have included Systematic Errors to the sample and the metrics? Perhabs, the two (odd components) were not on same unit of measurement, ...and if it's such a rare occurrence or sure event changing "cents" to "dollars", "milligrams" to "grams" would make less the spurious results.

1 Recommendation

Ahmadu Bello University

Roslina ShafiAnother ANGLE to this is the entrying method, hence number of variables needed. Stepwise entry method(instead of enter all

for instance the options in SPSS) should be alternatively used and results of reduced model performance compared.

"It has been suggested that the data should contain at least ten events for each variable entered into a logistic regression model."

Pls, I stand to be corrected, and my observation may not reflect the exact nature of the experiment (observation), the dataset and structure.

2 Recommendations

University of Dhaka

I got a OR with a very high value and at the place of CI i got only 0 when I had done a logistic regression .so can I say that my OR is significant?

1 Recommendation

University of Bristol

Samia Ashrafi The results appear to be very untrustworthy although I do not fully understand - there should be two values for a CI - upper and lower - I suspect there has been a computational errors due to huge uncertainty about the results. I would not claim anything!

3 Recommendations

CMH Lahore Medical College & Institute of Dentistry

Dear Kelvyn Jones can you help me in understanding the logistic regression results. The file is attached. The Hosmer-Lemeshow results are significant.

## Similar questions and discussions

## Related Publications

La domanda di trasporto urbano: modelli logit per l’individuazione dei fattori dominanti (di Francesca Condino) - ABSTRACT: Il presente lavoro pone l’attenzione sulla possibilità di applicare modelli di tipo aleatorio, e più in particolare di regressione logistica, all’ambito della domanda di trasporto urbano. Con riguardo alla struttura del contri...

Ordinal logistic regression models are classified as either proportional odds models, continuation ratio models or adjacent category models. The common model assumption of these models is that the log odds do not depend on the outcome category. This assumption is also known as the "proportionality" or "parallel logits" assumption. Nonproportional a...

Chapter 5 focuses on the logistic regression model. It pays special attention to interpreting parameters. It discusses methods of inference for the parameters, and discusses extensions with multiple predictors, some of which may be categorical. The chapter closes with details about model fitting.