Science topic

# Logistic Regression - Science topic

Explore the latest questions and answers in Logistic Regression, and find Logistic Regression experts.
Questions related to Logistic Regression
• asked a question related to Logistic Regression
Question
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
Just for the sake of completeness: it might be possible if there is a meaningful reference concentration of troponin you could refer to, but I doubt that there is such a value.
• asked a question related to Logistic Regression
Question
When conducting a logistic regression analysis in SPSS, a default threshold of 0.5 is used for the classification table. Consequently, individuals with a predicted probability < 0.5 are assigned to Group "0", while those with a predicted probability > 0.5 are assigned to Group "1". However, this threshold may not be the one that maximizes sensitivity and specificity. In other words, adjusting the threshold could potentially increase the overall accuracy of the model.
To explore this, I generated a ROC curve, which provides both the curve itself and the coordinates. I can choose a specific point on this curve.
My question now is, how do I translate from this ROC curve or its coordinates to the probability that I need to specify as the classification cutoff in SPSS (default: 0.50)? The value must naturally fall between 0 and 1.
1. Do I simply need to select an X-value from the coordinate table where I have the best sensitivity/specificity and plug it into the formula for P(Y=1)?
2. What do I do when I have more than one predictor (X) variable? Choose the best point/coordinate for both predictors separately and plug in the values into the equation for P(Y=1) and calculate the new cutoff value?
Good! I'm glad to hear we got there in the end. ;-)
• asked a question related to Logistic Regression
Question
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
Hello Santiago Ferriere Steinert. These two ORs are from different studies, right? How many ORs do you have in total? If I had only the two ORs you describe, I think I would just report them separately. If they were two ORs of a much larger number of ORs, and all but that one were from models that treated the X-variable as continuous, I might compare the OR from the 2x2 table to the pooled estimate of the OR from the other studies. But I think more information is needed. HTH.
• asked a question related to Logistic Regression
Question
Hello, I am looking for some guidance on how to calculate p-values for mixed-effects multinomial logistic regression using the npmlt() function from the mixcat package in R. I have fitted a model using this function and obtained the estimates and standard errors for each parameter. However, I am not sure how to derive the p-values from these values. I have tried to compute the Z values by dividing the estimates by the standard errors, and then compare them with the critical values from a standard normal distribution. Is this a valid method? I have yet found any documentation or examples on how to calculate p-values using npmlt() in R. I would appreciate any discussion or suggestion on this topic. Thank you very much.
Welcome Xi Chen
• asked a question related to Logistic Regression
Question
Hello everyone! I'm seeking a comprehensive understanding of how to handle confounding variables when comparing two groups based on the presence of a specific variable. Should I use propensity score matching or multivariable logistic regression for this purpose?
• asked a question related to Logistic Regression
Question
Hello,
Please I need to perform a logistic regression analysis using 2 independent variables, each has multiple indicators using SPSS. For example, the independent variable perceived behavioral control (PBC) is measured using two indicators, which are self-efficacy and easy-to-start (each is binary). The other independent variable is the subjective norm, measured by 2 indicators (respect and motivation), each of which is also binary.
My question is: how to deal with the multiple indicators for one independent variable when performing the analysis?
In case that I want the outcome to appear like in the attached table, in which it includes only the independent variables (not each indicator individually). I assume that I need to compute each variable by summing its indicators but I am not sure if this is correct. So, I need the assistance of experts.
I hope that I am able to communicate my inquiry properly.
Thank you.
How do you know that your multiple indicators really measure the same construct?
Your binary variables make me fear that you dichotomised a measurement scale score. Not a great idea – imagine taking a black and white picture and replacing each pixel with either white or black, depending on whether it was above or below the median brightness. Your intuition is correct. you can lose over a third of the information content of your variable as a result.
• asked a question related to Logistic Regression
Question
Hello,
I estimated a mixed-effect logistic regression model (GLMM) and I need to evaluate it. Specifically, I tried a few combinations of the independent variables in the model and I need to compare between them.
I know that for a regular logistic regression model (GLM), the Nagelkerke R-squared fits (a pseudo R-squared measure). But does it fit also for a mixed-effect model? If not, what is the correct way for evaluating a mixed-effect logistic regression model?
Thanks!
If your model outputs the likelihood, there are different pseudo r-square measures you can calculate, including Nagelkerke. See: https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
But there is a question as to what model you would consider the null model when you have random effects.
However, for logistic regression, you might consider the Efron's pseudo r-square described at that link. Or maybe, a count pseudo r-squared, that evaluates the proportion of observations the model predicts correctly.
Also, as Girma Beressa mentions, measures like AIC, BIC, or AICc may be more appropriate to decide among models.
• asked a question related to Logistic Regression
Question
I want to know whether i should include the non significant categorical variables in multinomial logistic regression.
Regards.
Well, keeping the significance cut-off at p 0.05 is usually too stringent a criteria so what we usually do is to keep the sig. at .25 for allowing entry into MLR. Moreover, biological plausiblity is another criteria which overrides even the first one. It usually applies to categorical variables in epidemiological analyses.
In the end what experience has taught me is that the thinking and approach of the researcher is the most valuable asset that guide him/her in understanding the data set.
• asked a question related to Logistic Regression
Question
I am using SPSS to perform binary logistic regression. One of the parameters generated is the prediction probability. Is there a simple mathematical formula that could be used to calculate it manually? e.g. based on the B values generated for each variable in model?
People have certainly done that, Nasir Al-Allawi. A Google search on <logistic regression scoring system> turns up lots of resources. Good luck with your work.
• asked a question related to Logistic Regression
Question
I have measured Irisin levels in plasma and now I'm trying to analyze the results. As far as I have read, I need to perform a 4 parameter logistic regression, should I use logarithm for absorbances and for concentrations?
I use the program Prism GraphPad for that. The data used is only the concentration level of the standard curve and the optical density levels. Then I interpolate the values for the estimation of the concentration of the protein in my samples. After selecting the interpolate option, a parameters window is open, where I select the 4 parameter logistic regression.
• asked a question related to Logistic Regression
Question
Hi everybody! :)
I wish to investigate a possible prediction between two variables I'm currently studying, and wanted to use a logistic regression model.
Is it possible to run a multinomial logistic regression having just one nominal independent variable with four categories? My dependent variable is also nominal with four categories.
Many thanks for your attention :)
What is x and y and what is your hypothesis? We can't know what is the best thing to do before we know that.
• asked a question related to Logistic Regression
Question
Can someone suggest me the best method for meta-analysis of proportions where there is a high heterogeneity. I am using random effects model to estimate the pooled proportion. I have done the pooled proportion and subgroup analysis with both logistic regression and dersimonian Laird method. Both yielded a varying result. Which one should I take into consideration?
Neither is ideal. See this paper:
• asked a question related to Logistic Regression
Question
Hello,
I am running a logistic regression, random forest, and support vector machine models to predict whether a loan will default. My data is highly imbalanced, which I have addressed using resampling techniques (Random Over Sampler). I am getting unusually high ROC-AUC (very close to 1) scores that do not match the results of similar studies. I have already addressed variables with high correlations. I am at a slight loss of what to try next, and was looking for some guidance.
Update: I removed some variables and there were significant improvements in the Logistic Regression and LinearSVC models. However, the Random Forest is still returning scores equal to 1 for all metrics (accuracy, precision, recall, AUC, and F1). Any suggestions to address the Random Forest?
Thanks!
@maya thanks for clarifying
In general you might want to replicate so that you have different validation sets (inner cross validation loops with 80% data (for instance) and outer cross validation with the remaining 20%) so that you can obtain the sampling distribution of AucROC/AucPR accounting for both sampling error/model building/monte carlo variation
Making sure that the splits are stratified according to the outcome
You don't specify how imbalanced your class is but generally speaking Precision-Recall curves are more appropriate with class imbalance
• asked a question related to Logistic Regression
Question
I have been performed the multinomial logistic regression model in SPSS. The goodness of fit shows that the model fits the data. Based on my literature study, there are several methods can be performed to validate the model, but SPSS 23's window of performing Logistic Regression doesn't show the options. Kindly help me to inform that what particular method I can use to validate the model in SPSS.
Hello Tanmoy Basu In a multinomial logistic regression model that you do using SPSS software, there are many tools and outputs to check the appropriateness of the model, I have sent some of them in the attached images. As far as I know, SPSS is not lacking in this field and it is a good software.
• asked a question related to Logistic Regression
Question
I run a multinomial logistic regression. In the SPSS output, under the table "Parameter Estimates", there is a message "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing." How should I deal with this problem? Thank you.
Hello Atikhom,
Which particular statistic in the output was omitted? Which version of spss are you using?
Are you willing to post your data and proposed model, so that others could attempt to recreate the condition?
Some obvious points to consider about your data set whenever unexpected results such as the one you report occur:
1. Are your data for any continuous variables suitably scaled? (so that the leading significant digit is not many orders of magnitude away from the decimal point)
2. For your categorical variables, do you have cases in each possible cell/level? (check frequencies for each such variable)
3. Do you have any instances of perfect relationships among categorical variables (perhaps due to empty cells)? (check cross-tabulations for variable pairs and sets)
4. Is one of the IVs redundant with another variable in the data set?
5. Do you have missing data (and are attempting some sort of imputation process)?
That may not cover the waterfront, but at least it may give you some ideas when checking your data.
• asked a question related to Logistic Regression
Question
Seeking insights from the research community: Does the imbalance of textual genres within corpora, when used as an explanatory variable rather than the response variable, affect the performance of logistic regression and classification models? I'm interested in understanding how the unequal distribution of word counts across genres might introduce biases and influence the accuracy of these machine learning algorithms. Any explanations or relevant details on mitigating strategies are appreciated. Thank you!
Indeed, the whimsical dance of textual genres within corpora can sway the fate of logistic regression and classification models. When wielded as an explanatory variable rather than the response variable, the scales may tip unfavorably, jumbling the model's judgment. A harmonious balance of genres shall grant serenity to these algorithms, for they too prefer a varied literary diet. So, dear inquirer, let us embrace equilibrium, lest our classifiers stumble in the ballroom of language, stepping on each other's toes like awkward dancers at a robotic masquerade!
• asked a question related to Logistic Regression
Question
Hi. I'm planning to conduct a multinomial logistic regression analysis for my predictive model (3 outcome categories). How can I estimate the sample size? I believe the EPV rule of thumb is suitable only for binary outcomes. Is there any formula/software that I can use?
I'd use simulations. You can use any programming language for this. I recommend R, hat could also use later to analyze multinominal models.
• asked a question related to Logistic Regression
Question
Logistic Regression type
Have a look at these free resources
Module 9: Single-level and Multilevel Models for Ordinal Reponses: Concepts
• asked a question related to Logistic Regression
Question
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
Hello Hannah Belcher. How did you determine that the proportional odds (aka., parallel lines) assumption was too severely violated? And what is your sample size?
I ask those questions, because the test of proportional odds is known to be too liberal (i.e., it rejects the null too easily), particularly as n increases. You can find some relevant discussion of this (and many other issues) in this nice tutorial:
HTH.
• asked a question related to Logistic Regression
Question
I am currently trying to perform mediation analysis with a long panel data set including controll variables in Stata.
Trying to do this i found solutions on how to do a morderated mediation regression with and without controll variables and i also found ways to run a regression with panel data but i did not find a way how to match this.
Is there a way how i can consider mediator variables in my panel regression?
Does anyone have information, links, advice on how to approach this challenge?
Yes You can
• asked a question related to Logistic Regression
Question
if there would be any literature on BLR Coefficient, that would be very helpful to understand.
This software does this by iteration. You can try it for 25 days. Click "help" to see examples. Let me know if you need assistance.
• asked a question related to Logistic Regression
Question
I am running 6 separate binomial logistic regression models with all dependent variables having two categories, either 'no' which is coded as 0 and 'yes' coded as 1.
4/6 models are running fine, however, 2 of them have this error message.
I am not sure what is going wrong as each dependent variable have the same 2 values on the cases being processed, either 0 or 1.
Any suggestions what to do?
Sure Bruce Weaver I was running logistic regression as a part of the propensity score matching technique. While watching the tutorial video on Youtube (https://youtu.be/2ubNZ9V8WKw) I realized I am including variables in the equation that I shouldn't have. So, I excluded them and magically, the error did not appear anymore. It was that simple.(: Good luck to everyone who's facing this error!
• asked a question related to Logistic Regression
Question
Dear everyone,
I have imputed my data using STATA's command "mi impute" (m=200) - only one variable had missing. STATA's "mi estimate" command was used when running the logistic regression model.
How would you interpret an OR close to 1/ being 1 when the p-value is <0.05 ( 95% CI does not cross 1)? An example from the regression model for a continous variable is: OR = 0.9965 with 95% CI 0.9938 and 0.9992 which STATA round to OR = 1.00 and 95% CI: 1.00;1.00. Am I on to something when thinking that this can be possible as the p-value is not perfectly tied to the magnitude of an effect? Thus, statistically the predictor has an effect on the outcome, but that this effect is very small and may be of little clinically or otherwise "practically" significant?
And do you have any suggestions on how to report such findings in a research paper?
Best regards
Regine
"Thus, statistically the predictor has an effect on the outcome, but that this effect is very small and may be of little clinically or otherwise "practically" significant?"
Exactly.
The statistical significance is an indication that your data provides enough information about the effect to be able to distinguish the estimate (here: 0.9965) from the hypothesized value (here: 1.0). Since the estimate is lower than the hypothesized value, you can conclude that the (still unknown!) effect is also lower.
The confidence interval is the range of all hypothesized values for which the data would not be statistically significant. Hence, this is a range of effect values that are "not too incompatible" with your data. Values outside the interval are "too incompatible" with your data. You see that the (0.95-)confidence interval in your case does not include 1.0, which corresponds to your p value being < 0.05. So, whatever hypothetical value is not too incompatible with your data is lower than 1.0, but very close to 1.0. This may not include any value of clinical or practical relevance.
• asked a question related to Logistic Regression
Question
Dear colleagues,
I analyzed my survey data using binary logistic regression, and I am trying to assess the results by looking at the p-value, B, and Exp(B) values. However, the task is also to specify the significance of the marginal effects. How to interpret the results of binary logistic regression considering the significance of the marginal effects?
Best,
To specify the significance of the marginal effects in binary logistic regression analysis, you can interpret the results by examining the p-values, B (coefficient estimates), and Exp(B) (exponentiated coefficient estimates) values. The p-value indicates the statistical significance of each predictor variable's effect on the outcome variable. A low p-value (typically less than 0.05) suggests a significant effect. The B values represent the estimated change in the log-odds of the outcome for a one-unit change in the predictor, with positive values indicating a positive association and negative values indicating a negative association. Exp(B) provides the odds ratio, which quantifies the change in odds for a one-unit increase in the predictor. An Exp(B) greater than 1 indicates an increased odds of the outcome, while a value less than 1 implies a decreased odds. By considering the significance of the marginal effects, you can determine the direction, magnitude, and statistical significance of the predictor variables' impacts on the binary outcome variable in your logistic regression analysis.
• asked a question related to Logistic Regression
Question
Can someone suggest a R package for Blinder Oaxaca decomposition for logistic regression models?
Hi,
1)This R package
and the related paper
Hlavac M (2022). oaxaca: Blinder-Oaxaca Decomposition in R. Social Policy Institute, Bratislava, Slovakia. R package version 0.1.5
2)this post:
3)the following paper
Good luck,
Hamid
• asked a question related to Logistic Regression
Question
This is what I know so please let me know if I miss anything:
1. We do bivariate and collect all the p-values
2. Choose p-values <0.25
3. Run logistic analysis for all the variables whose p-value is <0.25
4. Eliminate the variables one-by-one, starting by the highest number of p-value in the new model
5. But when to stop eliminating them all?
And to look for adjusted Odds Ratio:
1. Collect all variables into one model
2. Run a log reg analysis and look up for the Exp B
3. Compare them with the previous odds ratio from the previous bivariate analysis
p.s: I use complex sample models
Version 23
• asked a question related to Logistic Regression
Question
I found this discussion very helpful for the analysis I need to conduct for my thesis:
I am not sure if I understand it right, that the multiple trials that are mentioned in the discussion are done within the same sample? (but I think so) In my case I have a control and a treatment group, and each respondent goes through 4 questions where he/she has to choose between train and plane. I now want to analyse the difference between control and treatment group over all four choices combined. So I think I could use the approach with a weighted logistic regression model and the dependent variable "proportion_train" (which represents how often the train was chosen out of the four choices).
I am also not sure about the interpretation of the coefficients of this model then. I know these are most likely the log-odds-ratios (or equivalently differences in log-odds). But do these log-odds-ratios show the probability combined over all trials (which I want to find out), or the per-trial probability?
Also, in some other forum someone used "family=quasibinomial" for a logistic regression model with proportion data. How do I find out if for my data I have to use "family=binomial" or "family=quasibinomial"? Or can you in general say that for a weighted logistic regression model with proportion data as dependent variable, the family is binomial?
I also read somewhere else that one needs to account for the correlation within the individuals by including random intercepts for the individual IDs (I guess as e. g. in my case the four questions were answered by the same individuals (but different individuals in control and treatment group of course). In the discussion I mentioned above, no one included such a random intercept for the individuals in the weighted logistic regression model (with proportion data as dependent variable), so I am wondering if it's necessary or not?
If I understand correctly each participant makes four choices between plane and train coded as 0/1. If so you can't use a regular logistic regression because you have repeated measures. The solution is probably to use something like a multilevel generalised linear model (or maybe generalised estimating equations/GEEs which might be easier to run in some software). These can have convergence issues so I usually fit them using Bayesian approaches in R (e.g., using brms).
In general the models will produce coefficients on the log odds scale so e^b will give you them on the odds scale and you get probabilities out with some basic maths (or better still use prediction functions in your software to plot predicted probabilities).
What you get is a regression equation that predicts probabilities (with some calculation) overall or for different effects. Using marginal effects or predictions is a common way to plot these.
The quasi binomial allows for under- or over- dispersion. The easy way to check this is to see if changing the family alters the model. I'm guessing someone recommended the quasi binomial because your observations are clustered. I think its better to model the structure directly with a random intercept for the participant. Something like GEE treats this as a nuisance variable and handles it that way, but I find the multilevel approach more useful in handling a wide range of models.
• asked a question related to Logistic Regression
Question
I have a dichotomous predictor variable (0=low income and 1=middle income) and 4 correlated ordinal outcome variables (portion of food from different 4 food sources using a 7-point likert scale).
What is the most appropriate model for this type of data? I thought about ordinal logistic regression, but wouldn't that just give me 4 separate estimates for each outcome variable? I would also like a aggregated score for all food sources. I'm looking to show the odds that each income group utilizes these food sources. Thanks in advance!
• asked a question related to Logistic Regression
Question
Hello, I'm working on a study with 17 different biomarkers, I've assessed the diagnostic power of each of them alone using ROC analysis. However, trying to find different combinations to improve sensitivity and specificity manually while taking into account cost-effectiveness is extremely time consuming.
Is there a way to automate this process?
Thank you both very much for the answers
• asked a question related to Logistic Regression
Question
Hi
I would want to ask if for a longitudinal study with repeated measures (an outcome compared among three groups of patients with logistic regression, and measured at determinated timepoints: basal, weeks 8, 16 and 24, NOT basal vs week 8 or basal vs week 16,....), I am committing a type I error (or anyother).
Many thanks
David Olivares Why no measurement at starting time?
Could you share some data? I could try with my software:
• asked a question related to Logistic Regression
Question
Hello! I am applying logistic regression on an independent variable gender and another dependent binary variable. The result of ordinal logistic regression as well as binary logistic regression are showing significant relationship between variables but it is not showing fitted model and the values of Cox n Snell and Nagelkerke are too loo. The sample size is 450. Why is this happening?
• asked a question related to Logistic Regression
Question
Good afternoon,
I would like to share a quick survey.
For inference and predictive models with a binary variable, do you prefer to use Binary Logistic Regressions Models or Gradient Boosting Decision Tree Models, and why?
Dear Inès,
I choose between Binary Logistic Regression and Gradient Boosting Decision Tree models depending on the specific problem, data characteristics, and the goals of the analysis, as each method has its strengths and weaknesses:
Binary Logistic Regression:
Strengths:
a. Simple and interpretable model: Logistic regression provides a linear relationship between the log-odds of the binary outcome and the input features, making it easy to understand the effect of individual features.
b. Works well with linearly separable data.
c. Provides probabilities for the binary outcome, which can be useful for understanding confidence in predictions or for ranking predictions.
d. Fast training and prediction times.
Weaknesses:
a. Limited to linear relationships between features and the log-odds of the binary outcome. It may not perform well on complex, non-linear relationships between features and the target variable.
b. Assumes independence between features, which may not always be true in real-world scenarios.
Strengths:
a. Can model complex, non-linear relationships between features and the target variable.
b. Can handle interactions between features automatically.
c. Typically has higher predictive accuracy than logistic regression for a wide range of problems.
d. Can be tuned with various hyperparameters to achieve the best possible performance.
Weaknesses:
a. Can be more computationally intensive and take longer to train than logistic regression, especially with large datasets and many features.
b. Interpretability can be more challenging, as GBDT models are an ensemble of decision trees, making it harder to understand the contribution of individual features.
c. Prone to overfitting if not properly tuned or regularized.
In general, if the primary goal is interpretability and the relationship between the features and the binary outcome is expected to be relatively simple and linear, binary logistic regression may be the better choice. However, if the primary goal is predictive accuracy, and the relationships between the features and the binary outcome are expected to be more complex and non-linear, GBDT models are usually a better choice.
In practice, it is often beneficial to try both models, and possibly others, to compare their performance on your specific problem and data. Additionally, you may consider using techniques such as LASSO or Ridge regularization with logistic regression or feature importance analysis with GBDT models to improve interpretability and model performance.
• asked a question related to Logistic Regression
Question
What is the difference between univariate and multivariate logistic regression?
Just noticed that the authors of the presentation I linked above used multivariate/multivariable wrongly. Here is a link to a better source, also citing the paper linked by Bruce and being inline with Béatice's comment:
• asked a question related to Logistic Regression
Question
When other significant variables were added in final model, one of the variables turned out non significant in logistic regression. Should I do any additional analysis for this or should I do separate logistic regression by splitting the data (according to the site)?
Hello Sri,
The most likely answer is that the information carried by the variable is redundant with information carried by other variables in the full logistic model (e.g., collinearity).
In such cases, a candidate IV can show relationship with the DV when evaluated by itself, but not when included with other IVs in a more complete model.
• asked a question related to Logistic Regression
Question
Is it possible to determine the p-trend value of ORs in IBM SPSS 25? Please!!!
My dependent variable is Cancer (yes/no) and the independent variable is inflammatory diet (in quartiles) and several other variables that entered as adjustments, in two different models. I evaluated the influence of these other variables on the ORs of each quartile of the inflammatory diet.
I need to determine the p-trend by evaluating the dose-response across quartiles for each model.
Does anyone know the commands?! I've tried every way and it didn't work.
• asked a question related to Logistic Regression
Question
Hello, Does anyone know how to determine the p-trend of ORs in SPSS? please.
Samuel Oluwaseun Adeyemo my version is 25.. I didn't find those commands 8 and 9 in it.
• asked a question related to Logistic Regression
Question
I have seen this question asked previously on Research Gate. And the answer is no. However, I may be conflating terms such as "transformation", "normalization" with "linearity in the logit" for any continuous independent variable--one of the assumptions in logistic regression. As in here:
Hello Sara,
For a continuous IV, logistic regression requires no assumption about the shape of the distribution for scores on the variable.
However, the presumed relationship between scores on the IV and the log-odds of the target outcome (DV occurrence) is linear in form. Otherwise, the resultant LR coefficient (or its exponentiated value as an estimate of change in OR per unit change in IV) wouldn't make much sense.
• asked a question related to Logistic Regression
Question
Which is better in baseline bivariate data analysis, the Chi-Square test or Logistic regression? AND why? As you know both of them give the same value of OR in 2x2 table of binary data.
The choice between the Chi-Square test and Logistic regression in baseline bivariate data analysis depends on the research question and the nature of the data being analyzed. Both tests are used to analyze categorical data and can provide valuable information in different situations.
The Chi-Square test is a non-parametric test used to compare the frequencies of two or more categorical variables. It can be used to test for associations between two categorical variables and can be used to assess the independence of two categorical variables. The Chi-Square test can be a useful tool in descriptive data analysis and can provide information on the strength and direction of the relationship between two categorical variables.
Logistic regression, on the other hand, is a parametric regression model used to model the relationship between a binary response variable and one or more predictor variables. It can be used to assess the association between a binary outcome variable and a set of predictor variables, including categorical and continuous variables. Logistic regression can be used to predict the probability of the outcome variable given the predictor variables, and it can also be used to identify important predictor variables in the model.
In general, if the research question is focused on the association between two categorical variables, and there is no need to model the relationship between the variables or control for confounding variables, the Chi-Square test may be a more appropriate choice. On the other hand, if the research question is focused on predicting a binary outcome variable and identifying important predictor variables, then logistic regression may be a more appropriate choice.
• asked a question related to Logistic Regression
Question
Hello. I need to perform a logistics regression for a small sample size (n=20) to determine predictive factors for an event. The only problem is that a few data points skew the whole sample, leading me to receive a massive Odds Ratio of >200. This is not a realistic result, and am wondering how to best negate this. How does one properly perform a logistical regression with small sample sizes with high variance in certain variables? Anyone have any tips or tricks?
I am sorry to say that you need more observation.
• asked a question related to Logistic Regression
Question
I would like to know if I am wrong by doing this. I made quartiles out of my independent variable and from that I made dummy variables. When I do linear regression I have to record the betas with 95%CI per quartile per model (I adjust my model 1 for age and sex). Can I enter all the dummies into the model at the same time or do I have to enter them separately (while also adjusting for age and sex for example)?
So far I entered all the dummies and adjusted for age and sex at the same time but now I wonder whether SPSS doesn't adjust for the second dummy variable and the third.. So I think I need to redo my calculations and just run my models with one dummy in each.
Thank you.
What you are looking for is called linear regression. Good news is that linear regression is quickly done and easy to interpret. It will also give you more statistical power, as by categorization you loose information.
And don't worry, I've seen this categorization non-sense done by seasoned professors (and sometimes forced upon their students). That's from a past era, where you literally had to crunch the numbers with pencil and paper, which is easier with categories.
• asked a question related to Logistic Regression
Question
Firth logistic regression is a special version of usual logistic regression which handles separation or quasi-separation issues. To understand the Firth logistic regression, we have to go one step back.
What is logistic regression?
Logistic regression is a statistical technique used to model the relationship between a categorical outcome/predicted variable, y(usually, binary - yes/no, 1/0) and one or more independent/predictor or x variables.
What is maximum likelihood estimation?
Maximum likelihood estimation is a statistical technique to find the best representative model that represents the relationship between the outcome and the independent/predictor variables of the underlying data (your dataset). The estimation process calculates the probability of different models to represent the dataset and then selects the model that maximizes this probability.
What is separation?
Separation means empty bucket for a side! Suppose, you are trying to predict meeting physical activity recommendations (outcome - 1/yes and 0/no) and you have three independent or predictor variables like gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no). Suppose, you have a combination, gender = male, socio-economic condition = rich, incentive for physical activity = no, which always predict not meeting physical activity recommendation (outcome - 0/no). This is an example of complete separation.
What is quasi-separation?
Reconsider the above example. We have 50 adolescents for the combination- gender = male, socio-economic condition = rich, incentive for physical activity = no. For 49/48 (not exactly 50, near about 50) of them, outcome is "not meeting physical activity recommendation" (outcome - 0/no). This is the instance of quasi-separation.
How separation or quasi-separation may impact your night sleep?
When separation or quasi-separation is present in your data, the traditional logistic regression will keep increasing the co-efficient of predictors/independent variables to infinite level (to be honest, not infinite, the wording should be without limit) to establish the bucket theory - one of the outcomes is completely or nearly empty. When the anomaly happens, it is actually suggesting that the traditional logistic regression model is outdated here.
There is a bookish name of the issue - convergence issue. But how to know convergence issues have occurred with the model?
- Very large co-efficient estimates. The estimates could be near infinite too!
- Along with large co-efficient estimates, you may see large standard errors too!
- It may also happen that logistic regression tried several times (known as iterations) but failed to get the best model or in bookish language, failed to converge.
What to do if such convergence issues have occurred?
Forget all the hard works you have done so far! You have to start your new journey with an alternative logistic regression, which is known as Firth logistic regression. But what Firth logistic regression actually does? Without using much technical terms, Firth logistic regression actually leads to more reliable co-efficients, which helps to choose best representative model for your data ultimately.
How to conduct Firth logistic regression?
install.packages("logistf")
library(logistf)
Now, assume you have a dataset "physical_activity" with a binary outcome variable "meeting physical activity recommendation" and three predictor/independent variables: gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no).
pa_model <- logistf(meet_PA ~ gender + sec + incentive, data = physical_activity)
Now, display the result.
summary(pa_model)
You got log odds. Now, we have to convert it into odds.
odds_ratios_pa <- exp(coef(pa_model))
print(odds_ratios_pa)
Game over! Now, how to explain the result?
Don't worry! There is nothing special. The explanation of Firth logistic regression's result is same as traditional logistic regression model. However, if you are struggling with the explanation, let me know in the comment. I will try my best to reduce your stress!
Note: If you find any serious methodological issue here, my inbox is open!
Hi David Eugene Booth , you are right! I have fixed it. Thanks a lot
• asked a question related to Logistic Regression
Question
Dear Team,
I am running a multinomial logistic regression model for one of the fictitious data before i implement the same on my real data.
Here i am trying to predict based some scores and economic group whether a person will go for a diploma, general or honors.
The code below:
m11\$prog2<- relevel(m11\$prog, ref = "honors"
Error in relevel.factor(m11\$prog, ref = "honors") :
'ref' must be an existing level
I have tried searching on SO and nabble but did not find an answer that could help.
Please suggest what is incorrect. Also checked the class of the var and is a factor variable.
I am experiencing the same issue. Could not figure out why.
Error in nnet::multinom(OverallCondition ~ ., data = data_house, family = "binomial") :
need two or more classes to fit a multinom model
In nnet::multinom(OverallCondition ~ ., data = data_house, family = "binomial") :
groups ‘Poor’ ‘Average’ ‘Good’ are empty
I am sitting with the same error around 2 hrs.
• asked a question related to Logistic Regression
Question
1. Logistic regression with backward elimination
2. Penalised Lasso regression
Forgot another paper you should have a look at. Best wishes David Booth
• asked a question related to Logistic Regression
Question
I have data that was collected from medical residents(same group-binary outcome with categorical predictors) with pre and post test on same group with a sample size of 21, for this type of study designs I normally do repeated measures for binary outcomes using Proc Genmod or Proc Glimmix, I am thinking of doing this by treating both as a independent groups and consider pre test group as a comparision group and just do a logistic regression and report based on odds ratio since my outcome is binary, any thoughts and suggestions on this?
Hello Kris,
Have you published the results of this study? are the data available? I'm looking for this kind of data for a certain methodology that I am proposing.
Thanks.
• asked a question related to Logistic Regression
Question
Dear,
I have conducted a study where 18 patiënts were included. I ran a logistic regression, the model is significant but none of my predictors are. My R square is 1 which also a bit strange. What do I need to do here? How do I report this?
You have quasi-complete or complete separation. Ideally you need more data - or to use an approach that adds more information (e.g., Bayesian regression - though other options exist).
• asked a question related to Logistic Regression
Question
Hell everyone.
Recently I am modeling a logistic regression. The outcome Y has 40 subjects but only 10 events (Y = 1). I hope to figure out whether predictor A is associated with Y after adjusting 4 variables. So, I have included 5 variables as X . I know according to the EPV rule, this logistic model could only have 2 variables. But adjusting the other 4 covariates is essential and after adjusting them, A turns out to be significant in the model which is good.
However, the odds ratio CI is very large [1,220]. Then I use penalized glm ('logistf' or 'brglmFit' in R), the CI turns to be [1,35]. It's better, but still to wide. I'm afraid too wide CI is not good to prove the reliability of my results.
(all the covariates have been standardized)
Take a look at the unadjusted odds ratio and see if the precision is better? And see if the unadjusted point estimate is any different from the adjusted estimate. Adjustment is intended to reduce bias, but in exchange you are getting high variance. If your unadjusted ratio has lower variance and the ratios don't differ much then your making a bad trade between bias and variance. Just live with a more precise, but somewhat biased estimate.
• asked a question related to Logistic Regression
Question
Hi,
I am new to a quantitative research and wanted to use a logistic regression to count the probability. However, I found that my data is not normally distributed and therefore should use a non-parametric test. I used SPSS but willing to learn any new software if I could count my non parametric data to obtain the probability result. Is there any software that could help solve my problem or I should just ignore the fact that my data is non-parametric?
Logistic regression does not assume normality. You can conduct a logistic regression with these data in SPSS.
• asked a question related to Logistic Regression
Question
Example Scenario: 1 categorical variable (Yes/No), 3 continuous dependent variables
- 3 independent sample's t tests are conducted (testing for group differences on three variables); let's assume 2 of the 3 variables are significant with medium-large effect sizes
- a binomial logistic regression is conducted for the significant predictors (for classification purposes and predictor strength via conversion of unstandardized beta weights to standardized weights)
Since 4 tests are planned, the alpha would be set at .0125 (.05/4) via the Bonferroni correction. Should the adjusted alpha be also applied to the p-values for the Wald tests in the "variables in question" output?
Its essentially analogous to ANOVA or ANCOVA in this context so logically one should consider this. However:
- with three categories you get complete protection against the partial null hypotheses if the overall test of the effect (probably the LRT for the factor analogous to the F test). This is the logic of Fisher's LSD and so no correction would be needed if you relied on the overall test of the effect for Type I error protection.
- with four or more categories this property no longer holds, but its not a good idea to use the Bonferroni correction as its rather conservative. I'd suggest a modified Bonferroni procedure such as the Hommel or Hochberg correction. Hochberg can be done by hand easily, but it and the Hommel correction is also implemented in R with the p.adjust() function. As the input is just a set of uncorrected p values you can use output from R or another package very easily:
> p.vals <- c(.0012, .043, .123, .232)
This uses Wright's adjusted p value approach rather than altering alpha, but the decisions are equivalent:
Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics, 48, 1005–1013. doi:10.2307/2532694.
• asked a question related to Logistic Regression
Question
Please provide a comparison of how cumulative probability and the outcomes (PCP, ACP, EST, PRE) can be estimated by Excel vs. SPSS for the attached Ordinal logistic regression dataset. Kindly, give a detailed example?
Given that you mentioned SPSS, this tutorial might also be helpful:
• asked a question related to Logistic Regression
Question
Hello, I have a question regarding using a binary-coded dependent variable on the Mann-Whitney U test.
I have a test with 15 questions from 3 different categories in my study. The answers are forced answers and have one correct answer. I coded the answers as binary values with 1 being correct and 0 being incorrect.
Therefore, for 3 different categories, the participants have a mean score between 0 and 1 representing their success (I took the mean because I have many participants who did not answer 2 or 3 questions).
Does it make sense to put a mean of binary coded value as a dependent variable on a nonparametric test or it sounds weird and I should apply something else like chi-square or logistic regression?
It depends, but first, why did some people not answer some? Were these not presented to them by some random mechanism, or were they the last few (and perhaps easier or harder than others), or did people see them but decide not to answer because they thought they would miss them or it would take to long to answer them, or etc. Second, I am not sure what the three categories means. Is it that they got say 5 items on math, 5 on reading, and 5 on science? or that there were three groups? The choice of statistic will depend on this. Third, what research questions do you have? And finally, Mann-Whitney is usually used for two categories of people (i.e., two groups).
• asked a question related to Logistic Regression
Question
I have prepared my data in SPSS complex sampling. I have applied univariate logistic regression and considered those variables for multivariate logistic regression which have p values less than 0.05 in univariate. But I would like to know, is there any option in SPSS complex sampling where I apply forward selection, backward elimination, or stepwise logistic regression?
Yes there's almost certainly a way to do all of these things but NEVER ever use them because they don't work . Read the attached papers to see why and how they don't work. In addition alternative approaches are mentioned that do work David Booth
• asked a question related to Logistic Regression
Question
Dear Researchers
I am doing a multinomial logistic regression using the data from the National Survey on Drug Use and Health 2021. I'm a novice with R and I'll probably need to figure out pretty much everything while I'm doing it, so I hope it's okay I'll just post further questions in this topic.
Now I ran into a problem trying to mutate a numeral variable (K6 Scale point, values between 0-24) into 3 different sections. Basically, I want groups that have points between 13-24, between 5-13 and between 0-5.
This is the error message I got:
"Error: '=>' is disabled; set '_R_USE_PIPEBIND_' envvar to a true value to enable it"
I have no idea what this means.
I tried to create the groups like this:
mutate(high_k6=case_when(K6SCMON>=13~TRUE, K6SCMON<13~FALSE)
(moderate_k6=case_when(K&SCMON >=~TRUE, K6SCMON >~FALSE)
(low_k6=case_when(K6SCMON =>5~FALSE))
This works fine with 1 group only but apparently not with 3.
Is there a better way to do it?
Thanks
pls try this and let me know
mutate(k6scale = cut(K6SCMON, breaks=c(-1,4,12,25), labels=c("Low k6", "Moderate k6", "High k6")))
• asked a question related to Logistic Regression
Question
I am creating a risk score from some variables using the following steps:
1- Dividing data into training and validation cohorts.
2- Selecting the variables (p<0.05) in the fully adjusted.
3- Transforming Bs into scores.
4- ROC curve.
5- Calibration using the validation cohort.
I have problems with the last 3 steps. I am using SAS. So I will be grateful if you can give me sources for the codes.
• asked a question related to Logistic Regression
Question
I tested multiple linear regression analysis with my Likert scale data and it violated the normality assumption of OLS, after that I found ordinal logistic regression and tested but the p-value of parallel lines and goodness of fit(Pearson) is less than 5%. What to do?
Could you share your raw data so we could have a look at them?
• asked a question related to Logistic Regression
Question
Suppose you are conducting a logistic regression including two continuous predictors. The first predictor does not contain intrinsically meaningful units (i.e., no meaningful 0). The second predictor does have intrinsically meaningful units and a meaningful 0 (i.e., dollars). You want to examine the simple effects of each predictor and also the interaction effect.
Should you center the first predictor? Both predictors? Leave both in raw units?
Thanks,
BT
Hi Bruce Weaver , yes I agree :)
• asked a question related to Logistic Regression
Question
I have calculated Cox Regression in SPSS (HR) but is there any way of calculating RR in SPSS?
Bhogaraju Anand Thank you very much. was a great help. :)
• asked a question related to Logistic Regression
Question
I am interested to extract the actual probability values (between o and 1) from a logistic regression curve (sigmoid curve) in python as shown in pink color in the attached image.
You can use
model.predict_proba(x_test) to get the probability values if using the sklearn library.
• asked a question related to Logistic Regression
Question
I am conducting analysis on timely utilization of ANC and number of ANC by social demographic and husband characteristics of women.
The dependent variables are two:
Number of ANC - categorical (no ANC, don't know, less than 3, 4-7, greater than 8)
Time of ANC - months of initiation (early, late, don't know, no ANC).
The independent VARIABLES are much consisting of women social demographic information and husbt characteristics.
I have discovered that the grand option I am going to use is multinomial logistics regression. But
I have question.
should I run two dependent variable of time of ANC and number of ANC at the same time with the independent VARIABLES using multinomial logistics regression
Should i run number of ANC
with social demographic and husband separately, and time of ANC with social demographic and husband characteristics separately on the multinomial logistics regression
.
Multinomial logistic regression is a model of anlysis where dependent variable is categorical with more than two categories and had no sense ordinal property.Therefore you should analysis these two variables separately using multinomial logistic regression
.
• asked a question related to Logistic Regression
Question
Hello,
If anyone can help, I don't understand what is written in the upper limit for confidence interval for IV.2 "+inf", and why does it appear here, and how to interpret it. Also what does median unbiased estimates (MUE) mean?
Is there any thing wrong in the test exact logistic regression I performed?
Thank you Dr. David.
• asked a question related to Logistic Regression
Question
Hi,
If there is a binary/nominal dependent variable and one wants to do a logistic regression analysis. In that case, for a given categorical predictor, how many minimum observations are required per category to conduct analysis? Should I include independent variables that have low frequencies in some categories of response?
For eg, if dependent variable is treatment received on time (1=yes, 0=no) and a categorical predictor of education status has 3 categories ( illiterate=0, primary=1, >secondary=2). In this case, if there were only 12 illiterate women who were treated on time but the rest categories across the DV had a decent sample (>=30). Can we still use education as a predictor for this analysis? Please share any references of rules/guidelines.
Hello Marian,
Here's the glib rule: More is better as regards numbers of cases observed for each level of a categorical variable. Your resultant parameter estimates will be better as a result. There are various guidelines offered, but these are generally guidelines and not commandments. Here are some general observations:
1. You want at least some cases in each level of a categorical variable. Otherwise, you have insufficient information to make inferences about one or more of the categories/levels. If some levels are genuinely rare in the population, and your sample reflects this, then you may find alternative models (e.g., Poisson) to be preferable.
2. Whether it makes sense to use an unordered, k-level categorical variable (when k > 2) in your analysis without re-expressing as (k - 1) dummy variates hinges on the type of analysis you plan to conduct. In a chi-square test, it could work; in a logistic or ordinal regression model, it will not.
3. It can make sense to over-sample rare categories in order to help with precision of parameter estimates.
4. In the context of chi-square contingency tables, the usual advice is to have sufficient frequencies such that expected cell frequencies are at least 5.
5. One option would be to run bootstrap estimates of standard errors for each estimated coefficient, and if that SE exceeded some threshold, then judge that variable's contribution as indeterminate (e.g., more data required).
• asked a question related to Logistic Regression
Question
Dear Scholars,
I am currently working on HPV and cervical cancer self-reporting data.
I want to evaluate the risk values of the population using the risk factors of the diseases mentioned above.
I want to categorize the risk as low-risk or high-risk depending on the OR.
The condition is;
For high-risk using their odd ratio values conditioned that if the OR >1.
If OR < 1 then low-risk.
If OR = 1, the risk of those who are exposed or unexposed to a risk factor is similar.
Hence, It is best to say that if OR < 1 or OR = 1, then OR ≤ 1 which means low-risk.
The problem now is I have some negative coefficients for OR with logistic regression and some results from risk estimate are >1 (in some cases 3,4, 7, 32).
Please, let me know if it is possible to have a one-on-one chat with an expert to discuss the problem.
Regards,
Melvin.
I am working on a same kinda study and ready to collaborate.
• asked a question related to Logistic Regression
Question
For the variable selection of my binary logistic regression model, I am performing a Box-Tidwell test. I am doing so because linearity of the independent variables and logodds is an assumption of the logistic regression. The variables that are significant will not be taken into account in my logistic regression.
However, some of my independent variables contain zeroes or negative values, rendering the log-odds transformation infeasible. How should I deal with these (non-logit transformed) variables? Ultimately, I am performing this test as part of the variable selection process to reduce the number of variables from 300+ to a number that is easier to handle and to better interpret the individual effects that my variables have on my dependent variable. I am also testing for multicollinearity as logistic regressions assume little correlation between the independent variables.
If my phrasing is somewhat unclear, I'd be happy to clarify.
Ps I have 120K observations of my 330 variables
It seems that you are a bit confused. Let's let the regression equation. Look like this: logit (y)=a+bx1+cx2+.... Now the linear requirement on the RHS IS THAT it be a linear function of the REGN COEFFICIENTS NOT THE Xs. Now on the lhs linearity of the logit funcn means that linear data stays linear under the transformation to logit(y) This is important only in psych measurement data and survey responses. You don't seem to have a need to test either of these conditions you don't transform any of the Xs because you don't have a reason to do it..
The multicollinearity is good to test and I recommend using VIFs. Now variable selection is a complicated problem and a paper that indicates how to do that and an R program that you may use
I have attached below. Best wishes David Booth
• asked a question related to Logistic Regression
Question
On my side I would come up with some, very large OR values, and I would run logistic regressions on each variable separately and come up with the same values. Is there any way to remedy such OR values that I don't know what they represent? Is it due to too many variables, or is it due to missing values, and perhaps is there any point in reporting such OR values?
A ratio can becom huge when the denominator is close to zero.
Don't look only on the point estimate. Always consider the entire confidence interval. If when the point estimate is huge, the lower limit of the confidence interval may still be quite close to 1. This case is not uncommon and would indicate that your data does not provide much information about the odds ratio (as the data are compatible with odds ratios of any size, from almost 1 to almost infinity). Sometimes the confidence intervan even inclused values lower than one. In this case your data does not even provedie enough information to make a claim if the treament is beneficial or hazardous. In combination with a large confidence interval this is then a quite useless result (a result from which no useful conclusions can be drawn - because there is too little useful information in the sample data).
• asked a question related to Logistic Regression
Question
Different researchers use different p value cut off points e.g. p<0.25, 0.2, and others include some variables without such restriction if authors believe the variables are significant.
What type of variables are included in multivariate logistic regression analysis? Does this always work?
Thank you Booth, materials you shared are quite interesting
• asked a question related to Logistic Regression
Question
Dear scholars,
I want a statistical model to analyze my data on rare a rare disease( asymptomatic or submicroscopic malaria). I want a consultation from experts in the field.
I am convinced that logistic regression is not suitable for my study however there are dozens of published articles used it. I want to see it in different way.
Abdissa B.
Abdissa Biruksew Hordofa What is the hypothesis you want to test?
• asked a question related to Logistic Regression
Question
I have 12 variables (9 categorical and 3 continuous) entered into a multinomial logistic regression with a backwards stepwise deletion approach. 3 variables excluded from the model as indicated in my step summary for non-significance. The final multinomial logistic regression model fitness was significant, X2(20) = 210.541, p < .001, which suggests the independent variables included significantly predict my outcome. However, my output shows conflicting information after that and I am finding the interpretation challenging.
1. Goodness of fit shows significant Pearson (p = .002) and deviance = 1.000.
2. Remaining 8 variables all significant at p < .05 in the likelihood ratio test BUT in the parameter estimates some of the independent variables are not showing up as significant in either model.
What does this conflicting output indicate?
In primis: is your sample big enough?
In secundis: to verify the adaptability of a continuous curve to the observed data, it is good to calculate R square.
Finally, it is good to carry out the decomposition of the deviance (variance) to verify the variability within and between the groups you selected in the sample.
• asked a question related to Logistic Regression
Question
While running Binary logistic regression in SPSS, we are get Exp (B) in the output that we are considering as odd ratio. My question whether this Exp (B) should be considered as crude odds ratio (COR) or adjusted odds (AOR) ratio?
How to get both COR and AOR? What are the steps in SPSS to get both?
Is there any role of Stepwise method for obtaining AOR? Meaning if we do regression by Entre method we get OR for all variables but by doing with backward or forward method we will get strong predictor. Can we say the OR obatined by entre method is COR while OR obtained in stepwise method is AOR?
Hello Sujal Parkar. When people say "stepwise", they are usually referring to an algorithmic variable selection method that tends to produce overfitted models. See this Stata FAQ for a long list of reasons why you should avoid algorithmic variable selection methods like that:
But I think you might actually be thinking about entering the focal explanatory variable in Block 1, and adding the adjustment variables in Block 2. If so, that would be called "hierarchical" regression in the statistical vernacular I speak. And that will indeed show you the crude OR for your focal variable in Block 1 and an adjusted OR in Block 2. Here is an example using one of the "sample" datasets that comes with SPSS.
NEW FILE.
DATASET CLOSE ALL.
* Modify the path in the GET FILE command to point
* to the folder where the sample datasets are stored
GET FILE = "C:/SPSSdata/survey_sample.sav".
FREQUENCIES sex wrkstat.
* Let new variable workft be an indicator
* for working full-time (1=Y, 0=No).
COMPUTE workft = wrkstat EQ 1.
FORMATS workft (F1).
CROSSTABS wrkstat BY workft.
* Let new variable male be an indicator for male sex (1=Y, 0=N).
COMPUTE male = sex EQ 1.
FORMATS male (F1).
CROSSTABS sex by male.
* Estimate a model with X=male and Y=workft in Block 1,
* then add age and years of education to the model in Block 2.
LOGISTIC REGRESSION VARIABLES workft
/METHOD=ENTER male
/METHOD=ENTER age educ
/PRINT=CI(95)
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).
* The OR for male in Block 1 is the crude OR;
* the OR for male in Block 2 is adjusted for age and years of education.
HTH.
• asked a question related to Logistic Regression
Question
Can you still have a good model despite a p-value < .05 for the H-L goodness of fit test? Any alternative testing in SAS or R?