Science topic
Logistic Regression - Science topic
Explore the latest questions and answers in Logistic Regression, and find Logistic Regression experts.
Questions related to Logistic Regression
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
When conducting a logistic regression analysis in SPSS, a default threshold of 0.5 is used for the classification table. Consequently, individuals with a predicted probability < 0.5 are assigned to Group "0", while those with a predicted probability > 0.5 are assigned to Group "1". However, this threshold may not be the one that maximizes sensitivity and specificity. In other words, adjusting the threshold could potentially increase the overall accuracy of the model.
To explore this, I generated a ROC curve, which provides both the curve itself and the coordinates. I can choose a specific point on this curve.
My question now is, how do I translate from this ROC curve or its coordinates to the probability that I need to specify as the classification cutoff in SPSS (default: 0.50)? The value must naturally fall between 0 and 1.
- Do I simply need to select an X-value from the coordinate table where I have the best sensitivity/specificity and plug it into the formula for P(Y=1)?
- What do I do when I have more than one predictor (X) variable? Choose the best point/coordinate for both predictors separately and plug in the values into the equation for P(Y=1) and calculate the new cutoff value?
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
Hello, I am looking for some guidance on how to calculate p-values for mixed-effects multinomial logistic regression using the npmlt() function from the mixcat package in R. I have fitted a model using this function and obtained the estimates and standard errors for each parameter. However, I am not sure how to derive the p-values from these values. I have tried to compute the Z values by dividing the estimates by the standard errors, and then compare them with the critical values from a standard normal distribution. Is this a valid method? I have yet found any documentation or examples on how to calculate p-values using npmlt() in R. I would appreciate any discussion or suggestion on this topic. Thank you very much.
Hello everyone! I'm seeking a comprehensive understanding of how to handle confounding variables when comparing two groups based on the presence of a specific variable. Should I use propensity score matching or multivariable logistic regression for this purpose?
Hello,
Please I need to perform a logistic regression analysis using 2 independent variables, each has multiple indicators using SPSS. For example, the independent variable perceived behavioral control (PBC) is measured using two indicators, which are self-efficacy and easy-to-start (each is binary). The other independent variable is the subjective norm, measured by 2 indicators (respect and motivation), each of which is also binary.
My question is: how to deal with the multiple indicators for one independent variable when performing the analysis?
In case that I want the outcome to appear like in the attached table, in which it includes only the independent variables (not each indicator individually). I assume that I need to compute each variable by summing its indicators but I am not sure if this is correct. So, I need the assistance of experts.
I hope that I am able to communicate my inquiry properly.
Thank you.
Hello,
I estimated a mixed-effect logistic regression model (GLMM) and I need to evaluate it. Specifically, I tried a few combinations of the independent variables in the model and I need to compare between them.
I know that for a regular logistic regression model (GLM), the Nagelkerke R-squared fits (a pseudo R-squared measure). But does it fit also for a mixed-effect model? If not, what is the correct way for evaluating a mixed-effect logistic regression model?
Thanks!
I want to know whether i should include the non significant categorical variables in multinomial logistic regression.
Regards.
I am using SPSS to perform binary logistic regression. One of the parameters generated is the prediction probability. Is there a simple mathematical formula that could be used to calculate it manually? e.g. based on the B values generated for each variable in model?
I have measured Irisin levels in plasma and now I'm trying to analyze the results. As far as I have read, I need to perform a 4 parameter logistic regression, should I use logarithm for absorbances and for concentrations?
Hi everybody! :)
I wish to investigate a possible prediction between two variables I'm currently studying, and wanted to use a logistic regression model.
Is it possible to run a multinomial logistic regression having just one nominal independent variable with four categories? My dependent variable is also nominal with four categories.
Many thanks for your attention :)
Can someone suggest me the best method for meta-analysis of proportions where there is a high heterogeneity. I am using random effects model to estimate the pooled proportion. I have done the pooled proportion and subgroup analysis with both logistic regression and dersimonian Laird method. Both yielded a varying result. Which one should I take into consideration?
Hello,
I am running a logistic regression, random forest, and support vector machine models to predict whether a loan will default. My data is highly imbalanced, which I have addressed using resampling techniques (Random Over Sampler). I am getting unusually high ROC-AUC (very close to 1) scores that do not match the results of similar studies. I have already addressed variables with high correlations. I am at a slight loss of what to try next, and was looking for some guidance.
Update: I removed some variables and there were significant improvements in the Logistic Regression and LinearSVC models. However, the Random Forest is still returning scores equal to 1 for all metrics (accuracy, precision, recall, AUC, and F1). Any suggestions to address the Random Forest?
Thanks!
I have been performed the multinomial logistic regression model in SPSS. The goodness of fit shows that the model fits the data. Based on my literature study, there are several methods can be performed to validate the model, but SPSS 23's window of performing Logistic Regression doesn't show the options. Kindly help me to inform that what particular method I can use to validate the model in SPSS.
I run a multinomial logistic regression. In the SPSS output, under the table "Parameter Estimates", there is a message "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing." How should I deal with this problem? Thank you.
Seeking insights from the research community: Does the imbalance of textual genres within corpora, when used as an explanatory variable rather than the response variable, affect the performance of logistic regression and classification models? I'm interested in understanding how the unequal distribution of word counts across genres might introduce biases and influence the accuracy of these machine learning algorithms. Any explanations or relevant details on mitigating strategies are appreciated. Thank you!
Hi. I'm planning to conduct a multinomial logistic regression analysis for my predictive model (3 outcome categories). How can I estimate the sample size? I believe the EPV rule of thumb is suitable only for binary outcomes. Is there any formula/software that I can use?
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
I am currently trying to perform mediation analysis with a long panel data set including controll variables in Stata.
Trying to do this i found solutions on how to do a morderated mediation regression with and without controll variables and i also found ways to run a regression with panel data but i did not find a way how to match this.
Is there a way how i can consider mediator variables in my panel regression?
Does anyone have information, links, advice on how to approach this challenge?
if there would be any literature on BLR Coefficient, that would be very helpful to understand.
I am running 6 separate binomial logistic regression models with all dependent variables having two categories, either 'no' which is coded as 0 and 'yes' coded as 1.
4/6 models are running fine, however, 2 of them have this error message.
I am not sure what is going wrong as each dependent variable have the same 2 values on the cases being processed, either 0 or 1.
Any suggestions what to do?
Dear everyone,
I have imputed my data using STATA's command "mi impute" (m=200) - only one variable had missing. STATA's "mi estimate" command was used when running the logistic regression model.
How would you interpret an OR close to 1/ being 1 when the p-value is <0.05 ( 95% CI does not cross 1)? An example from the regression model for a continous variable is: OR = 0.9965 with 95% CI 0.9938 and 0.9992 which STATA round to OR = 1.00 and 95% CI: 1.00;1.00. Am I on to something when thinking that this can be possible as the p-value is not perfectly tied to the magnitude of an effect? Thus, statistically the predictor has an effect on the outcome, but that this effect is very small and may be of little clinically or otherwise "practically" significant?
And do you have any suggestions on how to report such findings in a research paper?
Best regards
Regine
Dear colleagues,
I analyzed my survey data using binary logistic regression, and I am trying to assess the results by looking at the p-value, B, and Exp(B) values. However, the task is also to specify the significance of the marginal effects. How to interpret the results of binary logistic regression considering the significance of the marginal effects?
Best,
Can someone suggest a R package for Blinder Oaxaca decomposition for logistic regression models?
This is what I know so please let me know if I miss anything:
1. We do bivariate and collect all the p-values
2. Choose p-values <0.25
3. Run logistic analysis for all the variables whose p-value is <0.25
4. Eliminate the variables one-by-one, starting by the highest number of p-value in the new model
5. But when to stop eliminating them all?
And to look for adjusted Odds Ratio:
1. Collect all variables into one model
2. Run a log reg analysis and look up for the Exp B
3. Compare them with the previous odds ratio from the previous bivariate analysis
p.s: I use complex sample models

I found this discussion very helpful for the analysis I need to conduct for my thesis:
I am not sure if I understand it right, that the multiple trials that are mentioned in the discussion are done within the same sample? (but I think so) In my case I have a control and a treatment group, and each respondent goes through 4 questions where he/she has to choose between train and plane. I now want to analyse the difference between control and treatment group over all four choices combined. So I think I could use the approach with a weighted logistic regression model and the dependent variable "proportion_train" (which represents how often the train was chosen out of the four choices).
I am also not sure about the interpretation of the coefficients of this model then. I know these are most likely the log-odds-ratios (or equivalently differences in log-odds). But do these log-odds-ratios show the probability combined over all trials (which I want to find out), or the per-trial probability?
Also, in some other forum someone used "family=quasibinomial" for a logistic regression model with proportion data. How do I find out if for my data I have to use "family=binomial" or "family=quasibinomial"? Or can you in general say that for a weighted logistic regression model with proportion data as dependent variable, the family is binomial?
I also read somewhere else that one needs to account for the correlation within the individuals by including random intercepts for the individual IDs (I guess as e. g. in my case the four questions were answered by the same individuals (but different individuals in control and treatment group of course). In the discussion I mentioned above, no one included such a random intercept for the individuals in the weighted logistic regression model (with proportion data as dependent variable), so I am wondering if it's necessary or not?
Thanks for your support!
I have a dichotomous predictor variable (0=low income and 1=middle income) and 4 correlated ordinal outcome variables (portion of food from different 4 food sources using a 7-point likert scale).
What is the most appropriate model for this type of data? I thought about ordinal logistic regression, but wouldn't that just give me 4 separate estimates for each outcome variable? I would also like a aggregated score for all food sources. I'm looking to show the odds that each income group utilizes these food sources. Thanks in advance!
Hello, I'm working on a study with 17 different biomarkers, I've assessed the diagnostic power of each of them alone using ROC analysis. However, trying to find different combinations to improve sensitivity and specificity manually while taking into account cost-effectiveness is extremely time consuming.
Is there a way to automate this process?
Hi
I would want to ask if for a longitudinal study with repeated measures (an outcome compared among three groups of patients with logistic regression, and measured at determinated timepoints: basal, weeks 8, 16 and 24, NOT basal vs week 8 or basal vs week 16,....), I am committing a type I error (or anyother).
Many thanks
Hello! I am applying logistic regression on an independent variable gender and another dependent binary variable. The result of ordinal logistic regression as well as binary logistic regression are showing significant relationship between variables but it is not showing fitted model and the values of Cox n Snell and Nagelkerke are too loo. The sample size is 450. Why is this happening?
Good afternoon,
I would like to share a quick survey.
For inference and predictive models with a binary variable, do you prefer to use Binary Logistic Regressions Models or Gradient Boosting Decision Tree Models, and why?
Thanks for your attention
What is the difference between univariate and multivariate logistic regression?
When other significant variables were added in final model, one of the variables turned out non significant in logistic regression. Should I do any additional analysis for this or should I do separate logistic regression by splitting the data (according to the site)?
Is it possible to determine the p-trend value of ORs in IBM SPSS 25? Please!!!
My dependent variable is Cancer (yes/no) and the independent variable is inflammatory diet (in quartiles) and several other variables that entered as adjustments, in two different models. I evaluated the influence of these other variables on the ORs of each quartile of the inflammatory diet.
I need to determine the p-trend by evaluating the dose-response across quartiles for each model.
Does anyone know the commands?! I've tried every way and it didn't work.
I have seen this question asked previously on Research Gate. And the answer is no. However, I may be conflating terms such as "transformation", "normalization" with "linearity in the logit" for any continuous independent variable--one of the assumptions in logistic regression. As in here:
Please advise. Thanks.
Which is better in baseline bivariate data analysis, the Chi-Square test or Logistic regression? AND why? As you know both of them give the same value of OR in 2x2 table of binary data.
Hello. I need to perform a logistics regression for a small sample size (n=20) to determine predictive factors for an event. The only problem is that a few data points skew the whole sample, leading me to receive a massive Odds Ratio of >200. This is not a realistic result, and am wondering how to best negate this. How does one properly perform a logistical regression with small sample sizes with high variance in certain variables? Anyone have any tips or tricks?
Thanks in advance
I would like to know if I am wrong by doing this. I made quartiles out of my independent variable and from that I made dummy variables. When I do linear regression I have to record the betas with 95%CI per quartile per model (I adjust my model 1 for age and sex). Can I enter all the dummies into the model at the same time or do I have to enter them separately (while also adjusting for age and sex for example)?
So far I entered all the dummies and adjusted for age and sex at the same time but now I wonder whether SPSS doesn't adjust for the second dummy variable and the third.. So I think I need to redo my calculations and just run my models with one dummy in each.
Thank you.
Firth logistic regression is a special version of usual logistic regression which handles separation or quasi-separation issues. To understand the Firth logistic regression, we have to go one step back.
What is logistic regression?
Logistic regression is a statistical technique used to model the relationship between a categorical outcome/predicted variable, y(usually, binary - yes/no, 1/0) and one or more independent/predictor or x variables.
What is maximum likelihood estimation?
Maximum likelihood estimation is a statistical technique to find the best representative model that represents the relationship between the outcome and the independent/predictor variables of the underlying data (your dataset). The estimation process calculates the probability of different models to represent the dataset and then selects the model that maximizes this probability.
What is separation?
Separation means empty bucket for a side! Suppose, you are trying to predict meeting physical activity recommendations (outcome - 1/yes and 0/no) and you have three independent or predictor variables like gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no). Suppose, you have a combination, gender = male, socio-economic condition = rich, incentive for physical activity = no, which always predict not meeting physical activity recommendation (outcome - 0/no). This is an example of complete separation.
What is quasi-separation?
Reconsider the above example. We have 50 adolescents for the combination- gender = male, socio-economic condition = rich, incentive for physical activity = no. For 49/48 (not exactly 50, near about 50) of them, outcome is "not meeting physical activity recommendation" (outcome - 0/no). This is the instance of quasi-separation.
How separation or quasi-separation may impact your night sleep?
When separation or quasi-separation is present in your data, the traditional logistic regression will keep increasing the co-efficient of predictors/independent variables to infinite level (to be honest, not infinite, the wording should be without limit) to establish the bucket theory - one of the outcomes is completely or nearly empty. When the anomaly happens, it is actually suggesting that the traditional logistic regression model is outdated here.
There is a bookish name of the issue - convergence issue. But how to know convergence issues have occurred with the model?
- Very large co-efficient estimates. The estimates could be near infinite too!
- Along with large co-efficient estimates, you may see large standard errors too!
- It may also happen that logistic regression tried several times (known as iterations) but failed to get the best model or in bookish language, failed to converge.
What to do if such convergence issues have occurred?
Forget all the hard works you have done so far! You have to start your new journey with an alternative logistic regression, which is known as Firth logistic regression. But what Firth logistic regression actually does? Without using much technical terms, Firth logistic regression actually leads to more reliable co-efficients, which helps to choose best representative model for your data ultimately.
How to conduct Firth logistic regression?
First install the package "logistf" and load it in your R-environment.
install.packages("logistf")
library(logistf)
Now, assume you have a dataset "physical_activity" with a binary outcome variable "meeting physical activity recommendation" and three predictor/independent variables: gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no).
pa_model <- logistf(meet_PA ~ gender + sec + incentive, data = physical_activity)
Now, display the result.
summary(pa_model)
You got log odds. Now, we have to convert it into odds.
odds_ratios_pa <- exp(coef(pa_model))
print(odds_ratios_pa)
Game over! Now, how to explain the result?
Don't worry! There is nothing special. The explanation of Firth logistic regression's result is same as traditional logistic regression model. However, if you are struggling with the explanation, let me know in the comment. I will try my best to reduce your stress!
Note: If you find any serious methodological issue here, my inbox is open!
Dear Team,
I am running a multinomial logistic regression model for one of the fictitious data before i implement the same on my real data.
Here i am trying to predict based some scores and economic group whether a person will go for a diploma, general or honors.
The code below:
m11$prog2<- relevel(m11$prog, ref = "honors"
already loaded the nnet library. However i got the below error:
Error in relevel.factor(m11$prog, ref = "honors") :
'ref' must be an existing level
I have tried searching on SO and nabble but did not find an answer that could help.
Please suggest what is incorrect. Also checked the class of the var and is a factor variable.
1. Logistic regression with backward elimination
2. Penalised Lasso regression
I have data that was collected from medical residents(same group-binary outcome with categorical predictors) with pre and post test on same group with a sample size of 21, for this type of study designs I normally do repeated measures for binary outcomes using Proc Genmod or Proc Glimmix, I am thinking of doing this by treating both as a independent groups and consider pre test group as a comparision group and just do a logistic regression and report based on odds ratio since my outcome is binary, any thoughts and suggestions on this?
Dear,
I have conducted a study where 18 patiënts were included. I ran a logistic regression, the model is significant but none of my predictors are. My R square is 1 which also a bit strange. What do I need to do here? How do I report this?
Hell everyone.
Recently I am modeling a logistic regression. The outcome Y has 40 subjects but only 10 events (Y = 1). I hope to figure out whether predictor A is associated with Y after adjusting 4 variables. So, I have included 5 variables as X . I know according to the EPV rule, this logistic model could only have 2 variables. But adjusting the other 4 covariates is essential and after adjusting them, A turns out to be significant in the model which is good.
However, the odds ratio CI is very large [1,220]. Then I use penalized glm ('logistf' or 'brglmFit' in R), the CI turns to be [1,35]. It's better, but still to wide. I'm afraid too wide CI is not good to prove the reliability of my results.
Could you please give me any suggestions about this situation? Million Thanks!!
(all the covariates have been standardized)
Hi,
I am new to a quantitative research and wanted to use a logistic regression to count the probability. However, I found that my data is not normally distributed and therefore should use a non-parametric test. I used SPSS but willing to learn any new software if I could count my non parametric data to obtain the probability result. Is there any software that could help solve my problem or I should just ignore the fact that my data is non-parametric?
Example Scenario: 1 categorical variable (Yes/No), 3 continuous dependent variables
- 3 independent sample's t tests are conducted (testing for group differences on three variables); let's assume 2 of the 3 variables are significant with medium-large effect sizes
- a binomial logistic regression is conducted for the significant predictors (for classification purposes and predictor strength via conversion of unstandardized beta weights to standardized weights)
Since 4 tests are planned, the alpha would be set at .0125 (.05/4) via the Bonferroni correction. Should the adjusted alpha be also applied to the p-values for the Wald tests in the "variables in question" output?
Thank you in advance!
Please provide a comparison of how cumulative probability and the outcomes (PCP, ACP, EST, PRE) can be estimated by Excel vs. SPSS for the attached Ordinal logistic regression dataset. Kindly, give a detailed example?
Hello, I have a question regarding using a binary-coded dependent variable on the Mann-Whitney U test.
I have a test with 15 questions from 3 different categories in my study. The answers are forced answers and have one correct answer. I coded the answers as binary values with 1 being correct and 0 being incorrect.
Therefore, for 3 different categories, the participants have a mean score between 0 and 1 representing their success (I took the mean because I have many participants who did not answer 2 or 3 questions).
Does it make sense to put a mean of binary coded value as a dependent variable on a nonparametric test or it sounds weird and I should apply something else like chi-square or logistic regression?
I have prepared my data in SPSS complex sampling. I have applied univariate logistic regression and considered those variables for multivariate logistic regression which have p values less than 0.05 in univariate. But I would like to know, is there any option in SPSS complex sampling where I apply forward selection, backward elimination, or stepwise logistic regression?
Dear Researchers
I am doing a multinomial logistic regression using the data from the National Survey on Drug Use and Health 2021. I'm a novice with R and I'll probably need to figure out pretty much everything while I'm doing it, so I hope it's okay I'll just post further questions in this topic.
Now I ran into a problem trying to mutate a numeral variable (K6 Scale point, values between 0-24) into 3 different sections. Basically, I want groups that have points between 13-24, between 5-13 and between 0-5.
This is the error message I got:
"Error: '=>' is disabled; set '_R_USE_PIPEBIND_' envvar to a true value to enable it"
I have no idea what this means.
I tried to create the groups like this:
NSDUH_adults <- NSDUH_adults %>%
mutate(high_k6=case_when(K6SCMON>=13~TRUE, K6SCMON<13~FALSE)
(moderate_k6=case_when(K&SCMON >=~TRUE, K6SCMON >~FALSE)
(low_k6=case_when(K6SCMON =>5~FALSE))
This works fine with 1 group only but apparently not with 3.
Is there a better way to do it?
Thanks
I am creating a risk score from some variables using the following steps:
1- Dividing data into training and validation cohorts.
1- Logistic regression (unadjusted then fully adjusted).
2- Selecting the variables (p<0.05) in the fully adjusted.
3- Transforming Bs into scores.
4- ROC curve.
5- Calibration using the validation cohort.
I have problems with the last 3 steps. I am using SAS. So I will be grateful if you can give me sources for the codes.
I tested multiple linear regression analysis with my Likert scale data and it violated the normality assumption of OLS, after that I found ordinal logistic regression and tested but the p-value of parallel lines and goodness of fit(Pearson) is less than 5%. What to do?
Suppose you are conducting a logistic regression including two continuous predictors. The first predictor does not contain intrinsically meaningful units (i.e., no meaningful 0). The second predictor does have intrinsically meaningful units and a meaningful 0 (i.e., dollars). You want to examine the simple effects of each predictor and also the interaction effect.
Should you center the first predictor? Both predictors? Leave both in raw units?
Thanks,
BT
I have calculated Cox Regression in SPSS (HR) but is there any way of calculating RR in SPSS?
I am interested to extract the actual probability values (between o and 1) from a logistic regression curve (sigmoid curve) in python as shown in pink color in the attached image.

I am conducting analysis on timely utilization of ANC and number of ANC by social demographic and husband characteristics of women.
The dependent variables are two:
Number of ANC - categorical (no ANC, don't know, less than 3, 4-7, greater than 8)
Time of ANC - months of initiation (early, late, don't know, no ANC).
The independent VARIABLES are much consisting of women social demographic information and husbt characteristics.
I have discovered that the grand option I am going to use is multinomial logistics regression. But
I have question.
should I run two dependent variable of time of ANC and number of ANC at the same time with the independent VARIABLES using multinomial logistics regression
Should i run number of ANC
with social demographic and husband separately, and time of ANC with social demographic and husband characteristics separately on the multinomial logistics regression
.
Hello,
If anyone can help, I don't understand what is written in the upper limit for confidence interval for IV.2 "+inf", and why does it appear here, and how to interpret it. Also what does median unbiased estimates (MUE) mean?
Is there any thing wrong in the test exact logistic regression I performed?
Thank you in advance.

Hi,
If there is a binary/nominal dependent variable and one wants to do a logistic regression analysis. In that case, for a given categorical predictor, how many minimum observations are required per category to conduct analysis? Should I include independent variables that have low frequencies in some categories of response?
For eg, if dependent variable is treatment received on time (1=yes, 0=no) and a categorical predictor of education status has 3 categories ( illiterate=0, primary=1, >secondary=2). In this case, if there were only 12 illiterate women who were treated on time but the rest categories across the DV had a decent sample (>=30). Can we still use education as a predictor for this analysis? Please share any references of rules/guidelines.
Dear Scholars,
I am currently working on HPV and cervical cancer self-reporting data.
I want to evaluate the risk values of the population using the risk factors of the diseases mentioned above.
I want to categorize the risk as low-risk or high-risk depending on the OR.
The condition is;
For high-risk using their odd ratio values conditioned that if the OR >1.
If OR < 1 then low-risk.
If OR = 1, the risk of those who are exposed or unexposed to a risk factor is similar.
Hence, It is best to say that if OR < 1 or OR = 1, then OR ≤ 1 which means low-risk.
The problem now is I have some negative coefficients for OR with logistic regression and some results from risk estimate are >1 (in some cases 3,4, 7, 32).
Please, let me know if it is possible to have a one-on-one chat with an expert to discuss the problem.
Thank you all for your help in advance.
Regards,
Melvin.
For the variable selection of my binary logistic regression model, I am performing a Box-Tidwell test. I am doing so because linearity of the independent variables and logodds is an assumption of the logistic regression. The variables that are significant will not be taken into account in my logistic regression.
However, some of my independent variables contain zeroes or negative values, rendering the log-odds transformation infeasible. How should I deal with these (non-logit transformed) variables? Ultimately, I am performing this test as part of the variable selection process to reduce the number of variables from 300+ to a number that is easier to handle and to better interpret the individual effects that my variables have on my dependent variable. I am also testing for multicollinearity as logistic regressions assume little correlation between the independent variables.
If my phrasing is somewhat unclear, I'd be happy to clarify.
Ps I have 120K observations of my 330 variables
On my side I would come up with some, very large OR values, and I would run logistic regressions on each variable separately and come up with the same values. Is there any way to remedy such OR values that I don't know what they represent? Is it due to too many variables, or is it due to missing values, and perhaps is there any point in reporting such OR values?
Different researchers use different p value cut off points e.g. p<0.25, 0.2, and others include some variables without such restriction if authors believe the variables are significant.
What type of variables are included in multivariate logistic regression analysis? Does this always work?
Dear scholars,
I want a statistical model to analyze my data on rare a rare disease( asymptomatic or submicroscopic malaria). I want a consultation from experts in the field.
I am convinced that logistic regression is not suitable for my study however there are dozens of published articles used it. I want to see it in different way.
Hence, need your prompt responses.
Abdissa B.
I have 12 variables (9 categorical and 3 continuous) entered into a multinomial logistic regression with a backwards stepwise deletion approach. 3 variables excluded from the model as indicated in my step summary for non-significance. The final multinomial logistic regression model fitness was significant, X2(20) = 210.541, p < .001, which suggests the independent variables included significantly predict my outcome. However, my output shows conflicting information after that and I am finding the interpretation challenging.
1. Goodness of fit shows significant Pearson (p = .002) and deviance = 1.000.
2. Remaining 8 variables all significant at p < .05 in the likelihood ratio test BUT in the parameter estimates some of the independent variables are not showing up as significant in either model.
What does this conflicting output indicate?
While running Binary logistic regression in SPSS, we are get Exp (B) in the output that we are considering as odd ratio. My question whether this Exp (B) should be considered as crude odds ratio (COR) or adjusted odds (AOR) ratio?
How to get both COR and AOR? What are the steps in SPSS to get both?
Is there any role of Stepwise method for obtaining AOR? Meaning if we do regression by Entre method we get OR for all variables but by doing with backward or forward method we will get strong predictor. Can we say the OR obatined by entre method is COR while OR obtained in stepwise method is AOR?
Thank you all in advance....
Can you still have a good model despite a p-value < .05 for the H-L goodness of fit test? Any alternative testing in SAS or R?
I´ve found different ways of dealing with the problem of selecting variables for multivariate binomial logistic regression. Some authors include only variables with a significant bivariate association with the outcome but I don't have an explanation for this behavior. Could someone help me with this issue?
I am using Logistics Regression on a dataset where the dependent variable is a categorical one. I have multiple independent variables some of which are categorical. I want to know which of them are significant ones and which are just noise variables. How can I find the correlation between 2 categorical variables?