Science topic

Logistic Regression - Science topic

Explore the latest questions and answers in Logistic Regression, and find Logistic Regression experts.
Questions related to Logistic Regression
  • asked a question related to Logistic Regression
Question
4 answers
for an outcome (relapse after therapy, yes or no), when univariate logistic regression was applied; p value for each variable was significant, then if multivariate logistic regression was applied with enter method or forward selection, all the significant variables P values became high as 0.9 or even 1.0.
how can these results be solved and what is the explanation?, what step did we miss?
Relevant answer
Answer
Testing collinearity can be applied in both linear and logistic regression and VIP for logistic regression (e.g. in SPSS) applied through collinearity in linear regression menues... while VIP value is less than 5, so multicollinearity is not an issue.. in this case sample size should be considered if small and other type of regression may be useful.. I heared about Lasso and Ridge to deal with small sample size.
  • asked a question related to Logistic Regression
Question
2 answers
Logistic regression can handle small datasets by using shrinkage methods such as penalized maximum likelihood or Lasso. These techniques reduce regression coefficients, improving model stability and preventing overfitting, which is common in small sample sizes (Steyerberg et al., 2000).
Steyerberg, E., Eijkemans, M., Harrell, F. and Habbema, J. (2000) ‘Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets’, Statistics in medicine, 19(8), pp. 1059-1079.
Relevant answer
Answer
I have the book Statistical methods in medical research by Armitage and Berry
  • asked a question related to Logistic Regression
Question
3 answers
What are the simple effect and main effects in the regression model? While I'm familiar with the main and intraction effects in the multinomial logistic regression model, I have no idea what simple effects are and how they are involved with the regression model. I'd greatly appreciate it if you could explain this and recommend useful resources. Thank you.
Relevant answer
Answer
Nothing special. There are many web pages about the topic you will find when searching for these keywords. This should also be part of any introductory stats book dealing with factorial designs (multiway-ANOVA models). Unfortunately I don't have one around I could look up, but maybe someone else can recommend a good book on that topic.
  • asked a question related to Logistic Regression
Question
1 answer
One key limitation is multicollinearity, which affects the interpretability of results. Moreover, oversaturation in models with too many predictors can result in overfitting. Small datasets, or sparse data, can also challenge the accuracy of logistic regression models.
Relevant answer
Answer
I'd argue collinearity isn't really a problem except in the sense that you may have too little data to make inferences about unique effects. For example see https://janhove.github.io/posts/2019-09-11-collinearity/
Overfitting isn't really a logistic regression issue - but can be a problem.
Generally I'd worry most about:
- independence of observations
- linearity on the log odds scale of continuous predictors
- complete or partial separation
The latter is linked to sparsity and small samples. There are fairly good solutions for all of these issues.
  • asked a question related to Logistic Regression
Question
1 answer
Logistic regression can be adapted for survival analysis by modeling grouped event times to estimate parameters similar to those in proportional hazards models. This approach helps when analyzing intervals for event occurrences (Abbott, 1985).
Relevant answer
Answer
In survival analysis, logistic regression is mostly used to model binary outcomes, such as whether an event has occurred by a given period. Additionally, it is modified for competing risks, models, in which many events are possible, and discrete-time survival analysis by calculating the likelihood of an event in time intervals. Furthermore, logistic regression is linked to continuous-time data by variants such as the complementary log-log model, which approximates models like Cox proportional hazards.
  • asked a question related to Logistic Regression
Question
3 answers
Logistic regression provides the odds ratio for each predictor, quantifying how a one-unit increase in the predictor variable impacts the odds of the outcome, assuming other variables are held constant. Specifically, the odds ratio measures the likelihood of an event occurring versus not occurring, given the predictor. For example, an odds ratio of 2.0 means that for each unit increase in the predictor variable, the odds of the outcome happening double.
Relevant answer
Answer
If you want functional relationships that are linear in your model to look linear on your graph, you need to plot predicted log-odds rather than predicted probabilities. Alternatively, you can plot predicted odds and odds ratios on a log scale.
  • asked a question related to Logistic Regression
Question
3 answers
One limitation of the chi-square test is that it does not provide information on the strength or direction of the relationship between variables; it only indicates whether an association exists. Additionally, the test can be sensitive to sample size; large samples may lead to significant results even for small, trivial associations. Therefore, researchers should complement chi-square results with additional analyses, such as logistic regression, to gain deeper insights into the data.
Relevant answer
Answer
There are three ways to measure effect size: Phi (φ), Cramer’s V (V), and odds ratio (OR).
  • asked a question related to Logistic Regression
Question
4 answers
Hi! I'm currently conducting a study where I'm looking at an outcome variable that is ordinal, with the regressors and mediating variables being continuous in nature. Given that this is not a linear regression model, which is normally when the Baron and Kenny (1986) method is used, how would one go about testing a mediating relationship?
Relevant answer
Answer
Given your ordinal outcome variable and continuous regressors and mediators, a direct application of the Baron and Kenny (1986) method would not be appropriate since that method assumes a linear regression framework. However, you can adapt mediation analysis to suit an ordinal outcome by following these steps:
1. Use Ordinal Logistic (or Probit) Regression for the Outcome
Since your dependent variable is ordinal, an ordinal logistic regression or ordinal probit regression is suitable to model the relationship between the independent variable and the outcome variable. In this model:
  • You can regress the ordinal outcome (Y) on the independent variable (X).
  • Then, you can regress the ordinal outcome (Y) on both the independent variable (X) and the mediator (M).
This will allow you to examine the effect of the independent variable on the outcome both with and without the mediator, which is central to mediation analysis.
2. Linear Regression for Continuous Mediators
For the continuous mediating variable, you can use standard linear regression:
  • Regress the mediator (M) on the independent variable (X) using linear regression.
3. Test for Mediation Using the Product of Coefficients Approach
Mediation analysis in this case can be performed using the product of coefficients approach, which calculates the indirect effect as the product of:
  • The coefficient of the effect of XXX on MMM (denoted as aaa),
  • The coefficient of the effect of MMM on YYY, controlling for XXX (denoted as bbb).
The indirect effect is a×ba \times ba×b. If this product is significantly different from zero, there is evidence of mediation.
4. Bootstrap for Confidence Intervals
Since the distribution of the indirect effect might not be normal (especially when the outcome is ordinal), it's best to use bootstrapping to estimate the confidence intervals of the indirect effect. Bootstrapping is a non-parametric resampling method that provides more robust estimates of the mediation effect and its significance.
5. Alternative Methods for Nonlinear Mediation
You may also consider advanced mediation techniques designed for nonlinear models, such as structural equation modeling (SEM) or Bayesian mediation analysis. These methods can handle both ordinal outcomes and continuous mediators in a more flexible manner.
Steps for Mediation Analysis with Ordinal Outcome:
  1. Run ordinal logistic regression: Regress the outcome variable (Y) on XXX (this is the total effect).
  2. Run linear regression: Regress the mediator (M) on XXX (this gives the coefficient aaa).
  3. Run ordinal logistic regression again: Regress YYY on both XXX and MMM to get the coefficient bbb (this will give the direct effect and show how MMM mediates the effect of XXX).
  4. Calculate the indirect effect: Multiply a×ba \times ba×b.
  5. Use bootstrapping to test the significance of the indirect effect.
  • asked a question related to Logistic Regression
Question
4 answers
Interactions and confounder testing in logistic regression are essential for understanding how different variables influence the outcome of interest. what is the procedure of Interactions and cofounder testing in logistic regression?
Relevant answer
Answer
Thank you all.
  • asked a question related to Logistic Regression
Question
5 answers
Hello, fellow researchers! I'm hoping to find someone well familiar with Firth's logistic regression. I am trying to analyse whether certain emotions predict behaviour. My outcomes are 'approached', 'withdrew', & 'accepted' - all coded 1/0 & tested individually. However, in some conditions the outcome behaviour is a rare event, leading to extremely low cell frequencies for my 1's, so I decided to use Firth's method instead of standard logistic regression.
However, I can't get the data to converge & get warning messages (see below). I've tried to reduce predictors (from 5 to 2) and increase iterations to 300, but no change. My understanding of logistic regression is superficial so I have felt too uncertain to adjust the step size. I'm also not sure how much I can increase iterations. The warning on NAs introduced by coercion I have ignored (as per advice on the web) as all data looks fine in data view.
My skill-set is only a very 'rusty' python coding, so I can't use other systems. Any SPSS friendly help would be greatly appreciated!
***
Warning messages:
1: In dofirth(dep = "Approach_Binom", indep = list("Resent", "Anger"), :
NAs introduced by coercion
2: In options(stringsAsFactors = TRUE) :
'options(stringsAsFactors = TRUE)' is deprecated and will be disabled
3: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
logistf.fit: Maximum number of iterations for full model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
4: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
logistf.fit: Maximum number of iterations for null model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
5: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
Nonconverged PL confidence limits: maximum number of iterations for variables: (Intercept), Resent, Anger exceeded. Try to increase the number of iterations by passing 'logistpl.control(maxit=...)' to parameter plcontrol
Relevant answer
Answer
Data may fail to converge in Firth logistic regression in SPSS due to issues like complete separation, small sample size, multicollinearity, or outliers. It's essential to check the data for these problems and ensure that the model is specified correctly. Consider simplifying the model or collecting more data if necessary.
  • asked a question related to Logistic Regression
Question
3 answers
Classification Algorithms Explained!
In this comprehensive video, I dive deep into three essential classification algorithms:
1️ Logistic Regression – Understand how this fundamental algorithm helps make binary predictions.
2️ Decision Trees – Learn how these tree-based models work for classification tasks, and why they are popular for their simplicity and interpretability.
3️ Support Vector Machine (SVM) – Explore how SVMs classify data points by finding the optimal hyperplane.
👨‍💻 Whether you're a beginner or looking to enhance your ML skills, this video provides clear explanations with examples to make learning easy and engaging.
Check it out on YouTube and don't forget to subscribe for more AI and Machine Learning content!
#MachineLearning #AI #ClassificationAlgorithms #LogisticRegression #DecisionTrees #SVM #DataScience #AIForBeginners #ML #YouTube
Relevant answer
Answer
Dear Rahul Jain ,
Classification algorithms are a fundamental component of machine learning, empowering systems to automatically categorize data based on predefined classes or labels. These algorithms analyze the input data, extract patterns, and create a model that can be used to classify new, unseen instances.
Regards,
Shafagat
  • asked a question related to Logistic Regression
Question
2 answers
Hello!
I am performing a study to introduce a new test for a specific eye disease diagnosis. The new test has continuous values, the disease can be present in one or both eyes, and the disease severity by eye could also be different. Furthermore, the presence of the disease in one eye increases the probability of having the disease in the other eye.
Because we aim to estimate the diagnostic performance of the new test, we performed the new test and gold standard for the disease in both eyes in a sample of patients. However, the fact of repeated measurements by each patient could introduce intra-class correlation to the data, limiting analyzing the results as i.i.d. Therefore, diagnostic performance derived directly from a logistic regression model or ROC curve could not be correct.
What do you think is the best approach to calculate the AUC, sensitivity, specificity, and predictive values in this case?
I think that a mixed-effects model with the patient as a random intercept could be useful. However, I do not know if there is any method to estimate the diagnostic performance with this type of models.
Thank you in advance.
Relevant answer
Answer
Hi Abraham,
I think this has been previously adressed in various epi oriented papers.
A good reference and tutorial to do it on several software is:
Genders TS, Spronk S, Stijnen T, Steyerberg EW, Lesaffre E, Hunink MG. Methods for calculating sensitivity and specificity of clustered data: a tutorial. Radiology. 2012 Dec;265(3):910-6. doi: 10.1148/radiol.12120509. Epub 2012 Oct 23. PMID: 23093680.
I hope this helps!
  • asked a question related to Logistic Regression
Question
2 answers
Im currently conducting a study on factors that may influence the chances of patients having delirium post-surgery. I have around 30 variables that have been collected including continous (HB, HBA1c, urea levels (pre-op, peri-operatively and urea difference), alcohol audit c score, CPB duration etc), categorical (blood transfusion - yes/no, smoking status, drinking status, surgery type etc) and demographic information (gender, age, ethnicity). The study also looks at whether our current measurements of risk of delirium are good predictors of actual delirium (the DEAR score, consists of 5 yes/no questions and a final total score).
As with many studies using data from previous patients (around 750), there are a lot of missing information in many categories. I have already been conducting assumption tests including testing the linearity of the logs and this has excluded some variables. I am using SPSS if anyone knows of anything on these systems i can use.
QUESTIONS:
1. (More of a clarification) I have not been using the pre-op, peri-op and difference between urea levels scores in the same models as i assume this violates the assumption of independency between variables - this is correct yes? If so, I assume that other variables that measure the same thing in different ways (e.g., age at time of surgery and the delirium risk quesiton that asks if patients are over 80) should also be excluded from the same model, and instead test the model with each difference and select the strongest model for prediction?
2. What should i do with my missing data? There is a big chunk (around 50% of the 750ish patients included) that dont have a delirium risk score - should i only conduct my model with patients that have a score if im investigating the validity of the DEAR score for predicting delirium or will SPSS select these case automatically for me? Other missing data includes HBA1c (because we do not test every patient), ethnicity (as the patient has not declared their ethnicity on the system), Drinking status (no audit c score made for the patient as they either dont drink or were not asked about their drinking status) etc... I've seen some chat about using a theory to generate predictions for the missing information but I feel like using this for example, for gender wouldnt be sensible as my population is heavily male centric.
3. Part of our hypothesis is identifying a model of prediction for males and females separately if they show different significant influences on chances of delirium. Can i simply split my data files by gender and conduct the regression that way to get different models for each gender? When i have done this, I have not used gender as a variable in the regression, but have tested it with all the data and found a significant influence of gender but only when tested with age and ethnicity or on its own (not in a model that includes all of my variables, or in a model that includes only the significant variables determined from testing various models). Should I just ignore gender all together?
Sorry for what may seem to be very silly or 'dealers choice' questions - I am not very experienced with studies with this many variables or cases and usually have full data sets (normally collect data in the here and now, not based on previous patients).
Any help or suggestions would be much appreciated!
Relevant answer
Answer
Hello Lucy Birch. Given the number & nature of your questions, I recommend that you find someone local to consult. A quick search took me to this NIHR page:
Perhaps they could help you find a good consultant or collaborator.
Best of luck with your research.
  • asked a question related to Logistic Regression
Question
2 answers
I am using unit level data (IHDS round 2) & Stata 17
Relevant answer
Answer
It may depend on the software. For most models, for most software, you want your data in long format.
  • asked a question related to Logistic Regression
Question
4 answers
Seeking to clarify my understanding of determinants of minimal sample size for logistic regression.
I am looking at predictors of worry after injury. There are n=120 people in the ‘not worried’ group, and n=33 in the ‘worried’ group (I am aware some may take issue with categorising a continuous outcome!).
My understanding is that the sample size is largely dependent on the number of people in the smaller grp (so in this case, the ‘worried’ group. I also understand that the 10 events per predictor rule of thumb guide for logistic regression tends to underestimate the sample required.
Given this, do I have any grounds to run a logistic regression with even just 2 predictors (eg Age and Sex)? Or am I limited to t-tests/chi-squared as a first step / means of preliminary exploration of associations?
Relevant answer
Answer
Hello Prudence Butler. Nowadays, Frank Harrell recommends 20 events per parameter, not 10 or 15.
But see also the "much better way" he links to:
If you use either Stata or R, you could try the pmsampsize package to implement the "better" procedure described in that article.
Notice that I said events per parameter rather than events per variable. This excerpt from the BMJ article linked to explains why I did that (boldface added for emphasis):
The word “variable” is, however, misleading as some predictors actually require multiple β terms in the model equation—for example, two β terms are needed for a categorical predictor with three categories (eg, tumour grades I, II, and III), and two or more β terms are needed to model any non-linear effects of a continuous predictor, such as age or blood pressure. The inclusion of interactions between two or more predictors also increases the number of model parameters. Hence, as prediction models usually have more parameters than actual predictors, it is preferable to refer to events per candidate predictor parameter (EPP). The word candidate is important, as the amount of model overfitting is dictated by the total number of predictor parameters considered, not just those included in the final model equation.
Note too that rules of thumb about events-per-parameter are concerned with reducing the likelihood of overfitting. Conventional sample size estimation (power analysis), OTOH, is about ensuring sufficient power to detect the smallest effect size of interest.
I hope this helps.
  • asked a question related to Logistic Regression
Question
3 answers
The Analyse for Pilot study should be based on the descriptive stat and ideally not involved inferential stat.
what about exploratory study? can I do inferential stat like cox regression or logistic regression for exploratory study?
Thanks
Relevant answer
Pour une étude exploratoire, il est tout à fait approprié d'utiliser des statistiques inférentielles. Une étude exploratoire vise à identifier des relations potentielles et des hypothèses qui peuvent être testées plus rigoureusement dans des études futures. Vous pouvez utiliser des méthodes statistiques inférentielles telles que la régression de Cox ou la régression logistique pour examiner les associations et les effets entre les variables.
Cependant, il est important de noter que les résultats d'une étude exploratoire doivent être interprétés avec prudence et considérés comme préliminaires jusqu'à ce qu'ils soient confirmés par des études ultérieures.
  • asked a question related to Logistic Regression
Question
3 answers
I have two outcome variables:
1- binary
2- repeated measure
There is only one independent variable which is time varying covariate
Iam looking for the article to write the Stat for:
1- logistic regression with time varying covariate (the outcome is binary)
2- mixed effect logistic regression (the outcome is repeated measure)
other question is there is no problem if I have only one independent variable?
Thank you!
Relevant answer
Answer
Are you asking which statistical output to report? If so, at a minimum, you'll want to report:
OR = odds ratio
CI = confidence interval
p-value
For example (with fictional data):
There was a significant effect of time on the dependent variable, OR = 1.52, CI = [0.68, 1.89], p = 0.02.
  • asked a question related to Logistic Regression
Question
4 answers
I plan to apply multinomial logistic regression using the complex sample option of SPSS. The dependent variables have 04 categories (low, moderate, high, and very high), and 05 independent variables are classified/categorized as Yes/no. The 'low' category of dependent variable will be the reference. 'No' will be the reference category of each independent variable.
Relevant answer
Answer
Gabriel Ezenri The procedures you described is well known to me. But the problem is it does not give Odds ratio and corresponding p-value. It only give odds ratio.
  • asked a question related to Logistic Regression
Question
3 answers
I plan to apply multinomial logistic regression using the complex sample option of SPSS. The dependent variables have 04 categories (low, moderate, high, and very high), and 05 independent variables are classified/categorized as Yes/no. The 'low' category of dependent variable will be the reference. 'No' will be the reference category of each independent variable.
Relevant answer
Answer
Heba Ramadan These videos do not sufficiently describe the procedures.
  • asked a question related to Logistic Regression
Question
3 answers
Dear Colleagues
I carried out a multinomial logistic regression to predict the choice of three tenses based on the predictor variables as shown on the image. According to the SPSS output below, the predictor variable "ReportingVoice" appears to have the same result as the intercept. I wonder why this issue happens and how I should deal with this problem. Thank you for your help. Please note that I'm not good at statistics, so your detailed explanation is very much appreciated.
Relevant answer
Answer
Marius Ole Johansen Thank you for your very clear answer. I think I'll exclude this variable.
  • asked a question related to Logistic Regression
Question
3 answers
Hi,
How do I interpret the p-values of the goodness-of-fit test results in ordinal logistic regression? I'm looking for literature in the internet but I couldn't find any.
The tests that I need to know the interpretations are:
1) Lipsitz test
2) Hosmer-Lemeshow test
3) Pulkstenis-Robinson test
Also, any recommendations on the literature of the interpretations?
Thank you very much.
Relevant answer
Answer
Can you tell us what software you're using, and what goodness-of-fit test you're using ?
  • asked a question related to Logistic Regression
Question
11 answers
Hello every one,
I run binary logistic regression in SPSS but i did not have results because of complete separation. How can i solve this proplem?
Thanks in advance.
Relevant answer
Answer
If you need to install it manually, you can do so fairly easily via the Extension Hub: Extensions > Extension Hub. Then search on "Firth", as in the attached image. HTH.
  • asked a question related to Logistic Regression
Question
4 answers
Hi,
I am trying to do an Ordinal Logistic Regression (OLR) in R since this is the regression analysis that I need to use for my research.
I followed the tutorial video I found in YouTube in setting up the two sample models. Here are the 2 codes for the models:
modelnull <- clm(as.factor(PRODPRESENT)~1,
data = Ordinaldf,
link = "logit")
model1 <- clm(as.factor(PRODPRESENT)~Age+ as.factor(Gender)+Civil Status,
data = Ordinaldf,
link = "logit")
Next, I followed what the instructor did in doing anova and an error message prompted. It says, Error in UseMethod("anova") : no applicable method for 'anova' applied to an object of class "c('double', 'numeric')"
Is there something wrong in setting up the two sample models, hence, an error message in prompting? What needs to be done to fix the error?
Please help.
Thank you in advance.
Relevant answer
Answer
This comes from the depth of R. It means that the command anova() expects a different object type. Usually, it would simply be two model objects, like:
anova(modelnull, model1)
The object you pass to anova() is a vector of numbers. Looks like you have passed a vector of coefficients to anova, not a model object.
  • asked a question related to Logistic Regression
Question
6 answers
I have fit a Multinomial Logistic Regression (NOMREG) model in SPSS, and have a table of parameter estimates. How are these parameter estimates used to compute the predicted probabilities for each category (including the reference value) of the dependent variable?
Relevant answer
Answer
PCP. The results of this column are also obtained due to the selection of the Predicted category probability option. If you pay attention, the data in this column is the same as the largest number in the EST2, EST1 and EST3 columns. In other words, the numbers in this column can be seen as the probability of each person being placed in the PRE column.
ACP. The results of this column are obtained due to selecting the Actual category probability option. It is also very simple to understand. The numbers in this column show the probability of each person being placed in the level and the actual and observed group of the person. For example, in the picture I sent, person number eight is interested in mathematics. This article is specified from the Subject column. Now, based on the polynomial model, it is obtained that the probability that this person is interested in mathematics is equal to 26%.
  • asked a question related to Logistic Regression
Question
5 answers
I want to estimate the half-life value for the virus as a function of strain and concentration, and as a continuous function of temperature.
Could anybody tell me, how to calculate the half-life value in R programming?
I have attached a CSV file of the data
Relevant answer
Answer
Estimating the half-life of a virus involves understanding its stability and decay rate under specific environmental or biological conditions. This is a crucial parameter in virology, impacting everything from the design of disinfection protocols to the assessment of viral persistence in the environment or within a host. Here's a structured approach to estimating the half-life values for a virus:
  1. Defining Conditions:Environment: Specify the environmental conditions such as temperature, humidity, UV exposure, and presence of disinfectants, as these factors significantly affect viral stability. Biological: In biological systems, consider the impact of host factors such as immune response, tissue type, and presence of antiviral agents.
  2. Experimental Setup:Sampling: Begin by preparing a known concentration of the virus under controlled conditions. Time Points: Collect samples at predetermined time points that are appropriate based on preliminary data or literature values suggesting the expected rate of decay.
  3. Quantitative Assays:Plaque Assay: One of the most accurate methods for quantifying infectious virus particles. It measures the number of plaque-forming units (PFU) which reflect viable virus particles. PCR-Based Assays: These can measure viral RNA or DNA but do not distinguish between infectious and non-infectious particles. Adjustments or complementary assays might be required to correlate these results with infectivity. TCID50 (Tissue Culture Infective Dose): This assay determines the dilution of virus required to infect 50% of cultured cells, providing another measure of infectious virus titer.
  4. Data Analysis:Plot Decay Curves: Use logarithmic plots of the viral titer (e.g., PFU/mL or TCID50/mL) against time. The decay of viral concentration should ideally follow first-order kinetics in the absence of complicating factors. Calculate Half-Life: The half-life (t1/2) can be calculated using the equation derived from the slope (k) of the linear portion of the decay curve on a logarithmic scale:�1/2=ln⁡(2)�t1/2​=kln(2)​Statistical Analysis: Ensure statistical methods are used to analyze the data, providing estimates of variance and confidence intervals for the half-life.
  5. Validation and Replication:Replicate Studies: Conduct multiple independent experiments to validate the half-life estimation. Variability in viral preparations and experimental conditions can affect the reproducibility of results. Peer Review: Consider external validation or peer review of the methodology and findings to ensure robustness and accuracy.
  6. Interpretation and Application:Contextual Interpretation: Understand that the estimated half-life is context-specific. Results obtained under laboratory conditions may differ significantly from those in natural or clinical settings. Application in Risk Assessment: Use the half-life data to inform risk assessments, disinfection strategies, or predictive modeling of viral spread and persistence.
By meticulously following these steps and ensuring the precision of each phase of the process, one can accurately estimate the half-life of a virus under specific conditions. This information is essential for developing effective control strategies and understanding the dynamics of viral infections.
Perhaps this protocol list can give us more information to help solve the problem.
  • asked a question related to Logistic Regression
Question
2 answers
I'm applying multinomial logistic regression to examine the influence of various predictors on three outcomes. The issue I'm facing now is that the SPSS outputs returned extremely large positive/negative values of some predictor values, that is, "quasi-complete seperation". I want to deal with this problem using penalized logistic regression since I do not want to remove the variables. I would greatly appreciate it if you could give me advice on how to perform penalized logistic regression or recommend statistics services. Thank you.
Relevant answer
Answer
Abolfazl Ghoodjani Thank you for your help. It seems like it is beyond my knowledge. I greatly appreciate it if you could give me a step-by-step advice on how to install the extension and run the data. Thank you.
  • asked a question related to Logistic Regression
Question
4 answers
Hello everyone,
I have research including two objectives,
one of them is to assess the relationship using logistic regression.
another one is comparing two groups using Mann-Whitney U Test.
if I want to apply sample size formulation need to calculate separately for each objective?
also what is the Minium sample size for logistic?
Thanks.
Relevant answer
Answer
Thank you for clarifying, Bahar Ysr. PASS has routines for both logistic regression and the Wilcoxon-Mann-Whitney test. So I would estimate the needed sample sizes for both and then choose the larger of the two.
  • asked a question related to Logistic Regression
Question
5 answers
Hi there, I am currently struggling with running analysis on my data set due to missing data caused by the research design.
The IV is binary -- 0 = don't use, 1 = use
Participants were asked to rate 2 selection procedures they use, and 2 they don't use, and were provided with 5 option. So, for every participant there are ratings for 4/5 procedures.
Previous studies used a multilevel logistic regression, and analysed the data in HLM as it can cope with missing data.
Would R be able to handle this kind of missing data? I currently only have access to either SPSS and R. Or is there a particular way to handle this kind of missing data?
Relevant answer
Answer
I'm not sure you need to impute. Multilevel models can cope with missing outcome data without any imputation (basically this is just imbalance and the model is treating the missing responses as if MNAR). There is an issue here letting participants choose which option not to respond to - as that could bias things. If the choice were balanced or random then it would be better than letting them choose.
  • asked a question related to Logistic Regression
Question
3 answers
Hi! My hypothesis have 2 Likert scaled variables to check the effect on one dichotomous dependent variable. Which test to put in SPSS? Can the dichotomous variable later be checked as a DV in mediation analyses?
Relevant answer
Answer
Usually logistic regression.
  • asked a question related to Logistic Regression
Question
1 answer
Hi all,
I'm currently working on a logistic regression model in which I've included year as a random variable, so in the end I am working with a Generalized Linear Mixed Model (GLMM). I've built the model, I got an output and I've checked the residuals with a very handy package called 'Dharma' and everything is ok.
But looking for bibliography and documentation on GLMMs, I found out that a good practice for evaluating logistic regression models is the k-fold cross-validation (CV). I would like to perform the CV on my model to check how good it is, but I can't find a way to implement it in a GLMM. Everything I found is oriented for GLM only.
Anyone could help me? I would be very thankful!!
Iraida
Relevant answer
Answer
Use cvMixed function from the cv library. It works smoothly with glmer objects (GLMMs) from the lme4 library.
  • asked a question related to Logistic Regression
Question
1 answer
An emblem model for fraud detection in credit cards using logistic regression and random forest algorithms.
Relevant answer
Answer
Fit logistic regression to your dataset, then fit random forest to your dataset, and when you use them for inference with an independent dataset, average their predictions into one prediction. This is called ensemble averaging, and it is one of the simplest ways to benefit from fitting two different types of prediction models.
  • asked a question related to Logistic Regression
Question
5 answers
I really want to learn about I really want to learn about Linear Mixed-Effects Modeling
in SPSS or Mixed Models for Logistic Regression in SPSS. Can you show me:
1. Theory of those two models
2. How to run in SPSS
3. Is there a way to select variables into mixed models and random effects models?
Thank you
in SPSS or Mixed Models for Logistic Regression in SPSS. Can you show me:
1. Theory of those two models
2. How to run in SPSS
3. Is there a way to select variables into mixed models and random effects models?
Thank you
Relevant answer
Answer
Good morning Dr Quynh Nguyen Duc, I wish I was a Professor. But just Ms.
  • asked a question related to Logistic Regression
Question
3 answers
I have a problem with running my logistic regression. When I run my analysis, I get really strange values and I cannot find anywhere how I can fix it. I already changed my reference category and that led to less strange values but they are still there. Also, this only happens to two of my eight predictors. These two predictors have multiple levels/categories.
Can someone explain to me what's wrong and how I can fix it?
Relevant answer
Answer
You are using categorical independent variables. How could there be a logistic relationship???
  • asked a question related to Logistic Regression
Question
2 answers
I am curr research at phising detection Using URL . Using logistic Regression model . I have data set 1:10 ratio 20k legitimate and 2 k phishing .
Relevant answer
Answer
You can rely on classical prediction metrics such as:
  • Precision, also known as 'positive predictive value', measures the proportion of correctly predicted positive instances out of all instances predicted as positive. It is calculated as the number of true positive results divided by the number of all samples predicted to be positive. You want high in scenarios where minimizing false positives is essential.
  • Recall, also known as 'sensitivity', quantifies the proportion of correctly predicted positive instances out of all actual positive instances. It is calculated as the number of true positive results divided by the number of all samples that should have been identified. You want high recall in scenarios where missing actual positives (false negatives) is costly.
A suitable combination of the above metrics for your purposes is the F2-score, which is a variant of the F1 score that puts a stronger emphasis on recall compared to the standard F1 score. Placing a stronger emphasis on recall rather than precision makes it suitable for tasks where capturing all positive instances is crucial.
Hope you found my answer useful.
Bests
  • asked a question related to Logistic Regression
Question
3 answers
I am doing a conditional logistic regression in RStudio using the clogit()-function from the survival-package. I want to assess linearity on the log-odds scale and I prefer not to use Box-Tidwell test, instead I am looking for a quicker way.
I wonder if anyone here has a tip on how to quickly and easily assess this?
Relevant answer
Answer
Here you go for "Partial Residual Plots". Open the code in a ".Rmd" (R Markdown) file in RStudio.
```{r}
install.packages(c("survival", "rms"))
library(survival)
library(rms)
```
```{r}
# Example data
set.seed(123)
mydata <- data.frame(
outcome = sample(c(0, 1), 100, replace = TRUE),
covariate1 = rnorm(100),
covariate2 = rnorm(100)
)
# Fit clogit model
my_clogit_model <- clogit(outcome ~ covariate1 + covariate2, data = mydata)
fitted_probs <- plogis(predict(my_clogit_model))
# Create partial residual plots
par(mfrow = c(1, 2))
plot(mydata$covariate1, residuals(my_clogit_model) + log(fitted_probs / (1 - fitted_probs)),
xlab = "Covariate 1", ylab = "Partial Residuals", main = "Partial Residual Plot for Covariate 1")
abline(h = 0, col = "red")
plot(mydata$covariate2, residuals(my_clogit_model) + log(fitted_probs / (1 - fitted_probs)),
xlab = "Covariate 2", ylab = "Partial Residuals", main = "Partial Residual Plot for Covariate 2")
abline(h = 0, col = "red")
```
  • asked a question related to Logistic Regression
Question
4 answers
How to interpret negative total and direct effects and positive indirect effect? all are significant in mediation analysis
X --- M --- Y
Total Effect: Negative (-0.42)
Indirect Effect 1: Positive (0.03)
Indirect Effect 2: negative (-0.22)
Indirect Effect 3: positive (0.06) - Not significant
Direct Effect: Negative (-0.29)
Relevant answer
Answer
Bruce Weaver I'm sorry, I forgot to include the other indirect effects (which were negative, significant). Yes, the sum of these effects + direct effect is the total effect.
Total Effect: Negative (-0.42)
Indirect Effect 1: Positive (0.03)
Indirect Effect 2: negative (-0.22)
Indirect Effect 3: positive (0.06) - Not significant
Direct Effect: Negative (-0.29)
Now, can you tell me how I could interpret this positive indirect effect in the face of the total and direct positive effects?
  • asked a question related to Logistic Regression
Question
3 answers
Is the Hosmer and Lemeshow test in binary logistic regression mandatory? because I read that this test has many flaws, especially in data with repeated values.
in my research with data that has a lot of repeating values ​​it always doesn't pass the test. can I skip this test?
Relevant answer
Answer
No, the Hosmer-Lemeshow test is not mandatory in binary logistic regression. While it's commonly used to assess goodness-of-fit, its use is not required. Some statisticians argue against its use due to limitations and sensitivity to sample size and grouping method. Other goodness-of-fit tests like the Pearson Chi-Square test, Deviance test, and Likelihood Ratio test are also commonly used alternatives. The choice of which test to use depends on the specific context and goals of the analysis.
Goodness-of-fit tests in binary logistic regression are applied to assess how well the model fits the observed data. Here are some common tests and when they're typically applied:
1. Hosmer-Lemeshow Test: Usually applied after fitting the logistic regression model to the data. It assesses whether there is a significant difference between observed and expected outcomes across different groups defined by predicted probabilities. This test helps to evaluate the calibration of the model.
However, it has some limitations, such as sensitivity to sample size and grouping method.
2. Pearson Chi-Square Test: Applied to assess the overall fit of the model by comparing observed and expected frequencies in each category of the binary outcome variable. It tests the null hypothesis that there is no difference between observed and expected frequencies.
3. Deviance Test: Comparing the fit of the full model with a model containing only an intercept term. It's used to assess whether adding predictors significantly improves the fit of the model.
4. Likelihood Ratio Test (LRT): Applied to compare the fit of the current model with a null model (usually a model with only an intercept term). It evaluates whether adding predictors improves model fit significantly.
5. Pseudo R-squared: Calculated to measure the proportion of variance explained by the model. It's typically applied to understand the overall goodness of fit of the model, though it's less commonly used for model comparison due to its limitations.
6. Area Under the ROC Curve (AUC): Used to evaluate the discriminatory power of the model by measuring its ability to distinguish between the two outcome categories. It's commonly applied in classification tasks to assess model performance.
In general, these tests are applied after fitting the logistic regression model to the data to evaluate different aspects of model performance, such as calibration, overall fit, improvement in fit with added predictors, and discriminatory power.
  • asked a question related to Logistic Regression
Question
4 answers
Can someone suggest a R package for Blinder Oaxaca decomposition for logistic regression models?
Relevant answer
Answer
I have recently added easy to use R functions to Git-Hub for multivariate decomposition (non-linear models, complex svy designs etc.
  • asked a question related to Logistic Regression
Question
1 answer
I have molecular data (0,1) and a trait with continuous variables. My goal is to detect the significance of markers associated with the trait. Which statistical analysis should I perform? Should I use a t-test, logistic regression, or something else?
Relevant answer
Answer
Can you clarify the roles of the variables you mentioned? If one of them is a dependent variable, for example, which one is it? Thanks for clarifying.
Please clarify too what "a trait with continuous variables" means. Perhaps if you just said what the trait is (and what the continuous variables are), it would help readers to understand better. Thanks.
  • asked a question related to Logistic Regression
Question
2 answers
In most of the studies tobit regression is used but in tobit model my independent variable is not significant. Whether fractional logistic regression is also an appropriate technique to explore determinants of efficiency?
Relevant answer
Answer
When using efficiency scores as a dependent variable in subsequent regression analysis, researchers often encounter the issue of these scores being bounded between 0 and 1, which violates the assumption of unboundedness in standard linear regression models. To address this issue, fractional regression models, such as the fractional logistic regression, are employed as they are designed specifically for dependent variables that are proportions or percentages confined to the (0,1) interval.
Fractional logistic regression, based on the quasi-likelihood estimation, can be used to model relationships where the dependent variable is a fraction or proportion, which is exactly the nature of technical efficiency scores resulting from DEA. Therefore, it is suitable to apply fractional logistic regression in a two-stage DEA analysis where the first stage involves calculating the efficiency scores, and the second stage seeks to regress these scores on other explanatory variables to investigate what might influence the efficiency of the DMUs.
This two-stage approach, where the DEA is used first to compute efficiency scores and then fractional logistic regression is used in the second stage, helps to avoid the potential biases and inconsistencies that might arise if standard linear regression techniques were used with bounded dependent variables. It is an appropriate statistical technique for dealing with the special characteristics of efficiency scores and can provide more reliable insights into the factors influencing DMU efficiency.
  • asked a question related to Logistic Regression
Question
2 answers
I have data from a questionnaire study structured like so:
  • Age - Ordinal (18-24, 25-34, 35-44, 45-54, 55+)
  • Gender - Nominal (Male, Female)
  • AnxietyType - Nominal (Self-diagnosed, Professionally diagnosed)
  • AnxietyYears - Scale
  • ChronicPain - Nominal (No, Yes)
  • Response - Ordinal (Strongly Agree, Agree, Neutral, Disagree, Strongly disagree)
I am using SPSS to run an ordinal logistic regression with 'response' as my dependent variable and the other 5 as my independent variables.
When putting the data into SPSS I have coded it as follows:
  • Age - (18-24, 0) (25-34, 1) (35-44, 2) (45-54, 3) (55+, 4)
  • Gender - (Male, 0) (Female, 1)
  • AnxietyType - (Self-diagnosed, 0) (Professionally diagnosed, 1)
  • AnxietyYears - Scale
  • ChronicPain - (No, 0) (Yes, 1)
  • Response - (Strongly Agree, 1) (Agree, 2) (Neutral, 3) (Disagree, 4) (Strongly disagree, 5)
When I run the regression, this is my output with a significant result highlighted in yellow (attached).
From what I've read and understood about interpreting the results of an ordinal logistic regression, this is saying that:
"The odds ratio of being in a higher category of the dependent variable for males versus females is 2.244" which is saying that males are more likely to agree more strongly than females.
However, when I create a graph looking at the split of responses between males and females it shows that females are actually more likely to agree more strongly than males (see attached).
I would be grateful if anyone could help me to understand what I'm doing wrong - either in my modelling or my interpretation.
Relevant answer
Answer
Bruce Weaver - thank you so much for this, that is perfect. Once I reversed the coding for my DV it made perfect sense. Thank you again - I've been confused by this for days!
  • asked a question related to Logistic Regression
Question
9 answers
I am trying to examine the effects of studentification on private residents in a studentified area, either it is positive or negative (which is coded as 1 or 0 respectively) as the dependent variable.
The independent variables are effects of studentification (across literatures) on 5-likert scale.
The question is am I to also dichotomies the likert scale responses from (strongly disagree, disagree, neutral, agree and disagree) to (1: positive, 0: negative)?
Relevant answer
Answer
If you want to apply logistic regression, your variables must be qualitative with a score of 2 * 2 To be able to calculate the relative risk factor and odds ratio.
There you have an example where we used logistic regression
Benefit from commenting on the results and displaying the tables
To understand the idea more..
  • asked a question related to Logistic Regression
Question
3 answers
Firth logistic regression is a special version of usual logistic regression which handles separation or quasi-separation issues. To understand the Firth logistic regression, we have to go one step back.
What is logistic regression?
Logistic regression is a statistical technique used to model the relationship between a categorical outcome/predicted variable, y(usually, binary - yes/no, 1/0) and one or more independent/predictor or x variables.
What is maximum likelihood estimation?
Maximum likelihood estimation is a statistical technique to find the best representative model that represents the relationship between the outcome and the independent/predictor variables of the underlying data (your dataset). The estimation process calculates the probability of different models to represent the dataset and then selects the model that maximizes this probability.
What is separation?
Separation means empty bucket for a side! Suppose, you are trying to predict meeting physical activity recommendations (outcome - 1/yes and 0/no) and you have three independent or predictor variables like gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no). Suppose, you have a combination, gender = male, socio-economic condition = rich, incentive for physical activity = no, which always predict not meeting physical activity recommendation (outcome - 0/no). This is an example of complete separation.
What is quasi-separation?
Reconsider the above example. We have 50 adolescents for the combination- gender = male, socio-economic condition = rich, incentive for physical activity = no. For 49/48 (not exactly 50, near about 50) of them, outcome is "not meeting physical activity recommendation" (outcome - 0/no). This is the instance of quasi-separation.
How separation or quasi-separation may impact your night sleep?
When separation or quasi-separation is present in your data, the traditional logistic regression will keep increasing the co-efficient of predictors/independent variables to infinite level (to be honest, not infinite, the wording should be without limit) to establish the bucket theory - one of the outcomes is completely or nearly empty. When the anomaly happens, it is actually suggesting that the traditional logistic regression model is outdated here.
There is a bookish name of the issue - convergence issue. But how to know convergence issues have occurred with the model?
- Very large co-efficient estimates. The estimates could be near infinite too!
- Along with large co-efficient estimates, you may see large standard errors too!
- It may also happen that logistic regression tried several times (known as iterations) but failed to get the best model or in bookish language, failed to converge.
What to do if such convergence issues have occurred?
Forget all the hard works you have done so far! You have to start your new journey with an alternative logistic regression, which is known as Firth logistic regression. But what Firth logistic regression actually does? Without using much technical terms, Firth logistic regression actually leads to more reliable co-efficients, which helps to choose best representative model for your data ultimately.
How to conduct Firth logistic regression?
First install the package "logistf" and load it in your R-environment.
install.packages("logistf")
library(logistf)
Now, assume you have a dataset "physical_activity" with a binary outcome variable "meeting physical activity recommendation" and three predictor/independent variables: gender (male/female), socio-economic condition (rich/poor), and incentive for physical activity (yes/no).
pa_model <- logistf(meet_PA ~ gender + sec + incentive, data = physical_activity)
Now, display the result.
summary(pa_model)
You got log odds. Now, we have to convert it into odds.
odds_ratios_pa <- exp(coef(pa_model))
print(odds_ratios_pa)
Game over! Now, how to explain the result?
Don't worry! There is nothing special. The explanation of Firth logistic regression's result is same as traditional logistic regression model. However, if you are struggling with the explanation, let me know in the comment. I will try my best to reduce your stress!
Note: If you find any serious methodological issue here, my inbox is open!
Relevant answer
Answer
Thank you for this post. I am curious, can you conduct a Hosmer-Lemeshow goodness-of-fit test on your logistf model in R?
  • asked a question related to Logistic Regression
Question
4 answers
Would anyone happen to know how the percentages are calculated in SPSS for the predicted and observed categories? Is it something you can do by hand or does SPSS use some kind of internal calculation? The type of output I'm referring to is the screenshot below. Huge thanks in advance for any help!
Relevant answer
Answer
Hello Louise Rowland. Here is an example that shows how the classification table is generated. I hope this helps.
* Use the example show here:
.
NEW FILE.
DATASET CLOSE ALL.
SPSSINC GETURI DATA
FILETYPE=SAV DATASET=hsb2
.
* You may see an error message after the SPSSINC GETURI DATA
* command, but it appears not to matter.
COMPUTE honcomp = (write ge 60).
LOGISTIC REGRESSION honcomp with read science ses
/CATEGORICAL ses
/SAVE PRED(predprob).
* The default CUT() value is 0.5. See:
.
COMPUTE predhc = predprob GE 0.5.
FORMATS honcomp predhc (F1).
CROSSTABS honcomp BY predhc.
* Compare this crosstabulation to the Block 1 Classification table.
  • asked a question related to Logistic Regression
Question
3 answers
I have a choice model using multi-nominal logistic regression. Now I want to expand the model and consider unobserved heterogeneity using R
What is the best package to find random parameters using R?
I want to look into heterogeneity in mean and variance on the random parameter as well. Which package and command should I use to find heterogeneity in mean and variance using R?
Relevant answer
Answer
In R, to incorporate unobserved heterogeneity and random parameters in your multinomial logistic regression model, you can use the "mlogit" package, which provides functions for estimating mixed logit models. The "mlogit" package allows you to model random parameters and unobserved heterogeneity, providing a flexible framework for capturing individual-specific variation in choice behavior. You can use functions such as "mlogit" and "gmnl" within the "mlogit" package to specify and estimate models with random parameters.
  • asked a question related to Logistic Regression
Question
1 answer
I would like to create a forest plot for a study we are conducting, in order to represent data from a logistic regression model with OR and CI for each variable included. However, I'm struggling to do it with Meta-Essentials resources. Is it possible or does it work exclusively for meta-analysis? Thank you.
Relevant answer
Answer
Hello João Simões. I had never heard of Meta-Essentials, but the website says this about it:
Meta-Essentials is a free tool for meta-analysis. It facilitates the integration and synthesis of effect sizes from different studies. The tool consists of a set of workbooks designed for Microsoft Excel that, based on your input, automatically produces all the required statistics, tables, figures, and more. The workbooks can be downloaded from here. We also provide a user manual to guide you in using the tool (PDF / online) and a text on how to interpret the results of meta-analyses (PDF / online).
But if I understand your question, you are not doing meta-analysis. Rather, you appear to be estimating a logistic regression model for one sample, and you want to generate a forest plot displaying the OR and CI for each explanatory variable in the model. I.e., I think you want to do something like the first example shown here:
Have I understood you correctly? If so, what other statistical software do you use (besides Meta-Essentials)? Thanks for clarifying.
  • asked a question related to Logistic Regression
Question
3 answers
Hi,
I have some confusion as to which one is better for outcomes as a model using binomial regression or logistic regression. currently i am working on judicial decisions, outcomes in tax courts where the cases go either in favour of assessee or the taxman. The factors influencing the judges as reflected in the cases are duly represented by presence(1) or absence (0) of the same. If a factor is not considered in final judgment it takes '0' else '1'. if outcome is favourable to assessee - it is '1' else'0' - now which would be the best approach to put this into a regression model showing relationhip between outcome (dependant) and independent ( factors - may be 5-6 variables). I need some guidance on this . can i use any other better model for forecast after i can perform a bootstrap run for say 1000 simulations and then arrive average outcomes and related statistics.
Relevant answer
Answer
In your case, the appropriate choice would be logistic regression rather than binomial regression. Logistic regression is specifically designed for binary outcomes, which seems to align with your scenario where the judicial decisions can go either in favor of the assessee (1) or the taxman (0).
Logistic regression models the probability of a binary outcome, and it's well-suited for situations where the dependent variable is categorical and has two possible outcomes. Binomial regression, on the other hand, is a more general term that can encompass logistic regression as a special case, but it's not the same thing. Logistic regression is a type of binomial regression.
Given that you have a binary outcome and you want to model the relationship between this outcome and several independent variables (factors), logistic regression would be the more appropriate choice.
As for incorporating bootstrapping for more robust estimates, that's a good approach. By running simulations and generating multiple bootstrap samples, you can assess the stability of your model and obtain more reliable estimates of model parameters. This can be especially helpful when dealing with a limited number of observations.
  • asked a question related to Logistic Regression
Question
8 answers
Hi Everyone, 
I am working with data of SNPs, I want to do logistic regression analysis. 
In multinomial logistic regression, is it compulsory to choose most common genotype as reference? or I can choose any genotype as reference? 
In my one SNP (Genotypes: II, ID, DD), when I choose most common II genotype as reference than Odds ratio come out like 0.57, but on choosing ID genotype Odds ratio change to 1.67 with p <0.05. Is it fine to choose heterozygous genotype as reference?
Thanks,  
Relevant answer
Answer
Malik Olatunde Oduoye , it requires some level of abstraction to wrap your head around.
Consider this simple case: you have two groups in which you measured the value of some variable. Say you measured the weight of sample of male and of female bugs. You are interested to analyze the expected (mean) difference between the weights of males and females.
A statistical model tries to express the expected (mean) weight as a function of sex.
Now "sex" is a binary (categorical, nominal) variable. How to use this in a mathematical formula that operates on numbers? This is where we need to encode the different possible values of the variable "sex" by different values of a numerical variable that we can use in a formula.
There are no restrictions in how to encode names/groups/categories by numbers. However, some ways will give you results that are meaningful for your research question and that you can readily interpret. We will now first take a small detour to demonstrate this, before coming to the "usual solution".
You have two sexes, and you may think of using an indicator function for each of the sexes that take the values 1 or 0, depending on whether the sex is male or female. These indicator variables can be used to multiply the expected weights of the respective sex, and these products can be added to give the expected weight of the desired sex.
This is the mapping from sex to the values 0 or 1 based on two separate indicator functions:
Imale(sex) = {if sex = "male" then 1, otherwise 0}
Ifemale(sex) = {if sex = "female" then 1, otherwise 0}
The function of the expected value is then
µ(sex)= bmale*Imale(sex) + bfemale*Ifemale(sex)
Hence, µ(male) gives bmale*1 + bfemale*0 = bmale and µ(female) gives bmale*0 + bfemale*1 = bfemale
Although this function works, it is somewhat unwieldy. It is about two coefficients, bmale and bfemale, and a statistical model based on this function can estimate their values along with standard errors, confidence intervals, and p-values, if you wish. The problem with this formula is that it does not give you direct access to what you actually wanted to investigate: the difference between males and females.
There is, of course, a more convenient function to address this directly. This function needs only one indicator variable for one of the sexes and includes an intercept term that will represent the expected value of the other sex. You are free to chose which sex should be represented by the intercept term. This is the "reference". Let's say you decide using females as the reference, then the intercept is bfemale and indicator function would be Imale which is multiplied with a coefficient d which will then represent the mean difference between females and males:
µ(sex)= bfemale + d*Imale(sex)
Here, µ(female) gives bfemale+d*0 = bfemale and µ(male) gives bfemale + d*1 = bfemale+d, so
µ(male) - µ(female) = d
As is obvious here, the coefficient d represents the difference in the mean weight between males and females, and a statistical model will provide an estimate for this difference with corresponding standard error, confidence interval, and p-value.
Of course you can choose to let the intercept represent the mean weight of males as well. The indicator function must then be Ifemale and the coefficient d then represents the difference in the mean weight between females and males. You may convince yourself by actually calculating the results of the formula for the two possible sexes:
µ(female) gives bmale+d*1 = bmale + d and µ(male) gives bmale + d*0 = bmale, so µ(female) - µ(male) = d
This is it. Practically, statistical software will automatically encode categorical variables into so-called dummy variable by applying the respective indicator functions and use these dummy variables in the statistical models. But these are technicalities.
This interpretation of the coefficient d depends on what sex you have chosen as the reference. If you have chosen males as reference and d > 0, then female bugs are expected to be heavier than male bugs. If you have chosen females as reference and d > 0, then female bugs are expected to be lighter than male bugs.
  • asked a question related to Logistic Regression
Question
3 answers
I have retrieved a study that reports a logistic regression, the OR for the dichotomous outcome is 1.4 for the continuous variable ln(troponin) . This means that the Odds increase 0.4 every 2.7-fold in the troponin variable; but, is there any way of calculating the OR for a 1 unit increase in the Troponin variable?
I want to meta-analyze many logistic regressions, for which i need them to be in the same format (i.e some use the variable ln(troponin) and others (troponin). (no individual patient data is available)
Relevant answer
Answer
Just for the sake of completeness: it might be possible if there is a meaningful reference concentration of troponin you could refer to, but I doubt that there is such a value.
  • asked a question related to Logistic Regression
Question
11 answers
When conducting a logistic regression analysis in SPSS, a default threshold of 0.5 is used for the classification table. Consequently, individuals with a predicted probability < 0.5 are assigned to Group "0", while those with a predicted probability > 0.5 are assigned to Group "1". However, this threshold may not be the one that maximizes sensitivity and specificity. In other words, adjusting the threshold could potentially increase the overall accuracy of the model.
To explore this, I generated a ROC curve, which provides both the curve itself and the coordinates. I can choose a specific point on this curve.
My question now is, how do I translate from this ROC curve or its coordinates to the probability that I need to specify as the classification cutoff in SPSS (default: 0.50)? The value must naturally fall between 0 and 1.
  1. Do I simply need to select an X-value from the coordinate table where I have the best sensitivity/specificity and plug it into the formula for P(Y=1)?
  2. What do I do when I have more than one predictor (X) variable? Choose the best point/coordinate for both predictors separately and plug in the values into the equation for P(Y=1) and calculate the new cutoff value?
Relevant answer
Answer
Good! I'm glad to hear we got there in the end. ;-)
  • asked a question related to Logistic Regression
Question
4 answers
I have the OR of a logistic regresion that used the independent variable as continuous. I also have the ORs of 2x2 tables that dichotomized the variable (high if >0.1, low if < 0.1).
Is there anyway i can merge them for a meta-analysis. i.e. can the OR of the regression (OR for 1 unit increase) be converted to OR for High vs Low?
Relevant answer
Answer
Hello Santiago Ferriere Steinert. These two ORs are from different studies, right? How many ORs do you have in total? If I had only the two ORs you describe, I think I would just report them separately. If they were two ORs of a much larger number of ORs, and all but that one were from models that treated the X-variable as continuous, I might compare the OR from the 2x2 table to the pooled estimate of the OR from the other studies. But I think more information is needed. HTH.
  • asked a question related to Logistic Regression
Question
3 answers
Hello, I am looking for some guidance on how to calculate p-values for mixed-effects multinomial logistic regression using the npmlt() function from the mixcat package in R. I have fitted a model using this function and obtained the estimates and standard errors for each parameter. However, I am not sure how to derive the p-values from these values. I have tried to compute the Z values by dividing the estimates by the standard errors, and then compare them with the critical values from a standard normal distribution. Is this a valid method? I have yet found any documentation or examples on how to calculate p-values using npmlt() in R. I would appreciate any discussion or suggestion on this topic. Thank you very much.
Relevant answer
Answer
Welcome Xi Chen
  • asked a question related to Logistic Regression
Question
2 answers
Hello everyone! I'm seeking a comprehensive understanding of how to handle confounding variables when comparing two groups based on the presence of a specific variable. Should I use propensity score matching or multivariable logistic regression for this purpose?
  • asked a question related to Logistic Regression
Question
2 answers
Hello,
Please I need to perform a logistic regression analysis using 2 independent variables, each has multiple indicators using SPSS. For example, the independent variable perceived behavioral control (PBC) is measured using two indicators, which are self-efficacy and easy-to-start (each is binary). The other independent variable is the subjective norm, measured by 2 indicators (respect and motivation), each of which is also binary.
My question is: how to deal with the multiple indicators for one independent variable when performing the analysis?
In case that I want the outcome to appear like in the attached table, in which it includes only the independent variables (not each indicator individually). I assume that I need to compute each variable by summing its indicators but I am not sure if this is correct. So, I need the assistance of experts.
I hope that I am able to communicate my inquiry properly.
Thank you.
Relevant answer
Answer
How do you know that your multiple indicators really measure the same construct?
Your binary variables make me fear that you dichotomised a measurement scale score. Not a great idea – imagine taking a black and white picture and replacing each pixel with either white or black, depending on whether it was above or below the median brightness. Your intuition is correct. you can lose over a third of the information content of your variable as a result.
So my advice would be check your assumptions first.
  • asked a question related to Logistic Regression
Question
3 answers
Hello,
I estimated a mixed-effect logistic regression model (GLMM) and I need to evaluate it. Specifically, I tried a few combinations of the independent variables in the model and I need to compare between them.
I know that for a regular logistic regression model (GLM), the Nagelkerke R-squared fits (a pseudo R-squared measure). But does it fit also for a mixed-effect model? If not, what is the correct way for evaluating a mixed-effect logistic regression model?
Thanks!
Relevant answer
Answer
If your model outputs the likelihood, there are different pseudo r-square measures you can calculate, including Nagelkerke. See: https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
But there is a question as to what model you would consider the null model when you have random effects.
However, for logistic regression, you might consider the Efron's pseudo r-square described at that link. Or maybe, a count pseudo r-squared, that evaluates the proportion of observations the model predicts correctly.
Also, as Girma Beressa mentions, measures like AIC, BIC, or AICc may be more appropriate to decide among models.
  • asked a question related to Logistic Regression
Question
4 answers
I want to know whether i should include the non significant categorical variables in multinomial logistic regression.
Regards.
Relevant answer
Answer
Well, keeping the significance cut-off at p 0.05 is usually too stringent a criteria so what we usually do is to keep the sig. at .25 for allowing entry into MLR. Moreover, biological plausiblity is another criteria which overrides even the first one. It usually applies to categorical variables in epidemiological analyses.
In the end what experience has taught me is that the thinking and approach of the researcher is the most valuable asset that guide him/her in understanding the data set.
  • asked a question related to Logistic Regression
Question
5 answers
I am using SPSS to perform binary logistic regression. One of the parameters generated is the prediction probability. Is there a simple mathematical formula that could be used to calculate it manually? e.g. based on the B values generated for each variable in model?
Relevant answer
Answer
People have certainly done that, Nasir Al-Allawi. A Google search on <logistic regression scoring system> turns up lots of resources. Good luck with your work.
  • asked a question related to Logistic Regression
Question
5 answers
I have measured Irisin levels in plasma and now I'm trying to analyze the results. As far as I have read, I need to perform a 4 parameter logistic regression, should I use logarithm for absorbances and for concentrations?
Relevant answer
Answer
I use the program Prism GraphPad for that. The data used is only the concentration level of the standard curve and the optical density levels. Then I interpolate the values for the estimation of the concentration of the protein in my samples. After selecting the interpolate option, a parameters window is open, where I select the 4 parameter logistic regression.
I hope this can help you.
  • asked a question related to Logistic Regression
Question
8 answers
Hi everybody! :)
I wish to investigate a possible prediction between two variables I'm currently studying, and wanted to use a logistic regression model.
Is it possible to run a multinomial logistic regression having just one nominal independent variable with four categories? My dependent variable is also nominal with four categories.
Many thanks for your attention :)
Relevant answer
Answer
What is x and y and what is your hypothesis? We can't know what is the best thing to do before we know that.
  • asked a question related to Logistic Regression
Question
3 answers
Can someone suggest me the best method for meta-analysis of proportions where there is a high heterogeneity. I am using random effects model to estimate the pooled proportion. I have done the pooled proportion and subgroup analysis with both logistic regression and dersimonian Laird method. Both yielded a varying result. Which one should I take into consideration?
Relevant answer
Answer
Neither is ideal. See this paper:
  • asked a question related to Logistic Regression
Question
5 answers
I have been performed the multinomial logistic regression model in SPSS. The goodness of fit shows that the model fits the data. Based on my literature study, there are several methods can be performed to validate the model, but SPSS 23's window of performing Logistic Regression doesn't show the options. Kindly help me to inform that what particular method I can use to validate the model in SPSS.
Relevant answer
Answer
Hello Tanmoy Basu In a multinomial logistic regression model that you do using SPSS software, there are many tools and outputs to check the appropriateness of the model, I have sent some of them in the attached images. As far as I know, SPSS is not lacking in this field and it is a good software.
  • asked a question related to Logistic Regression
Question
5 answers
I run a multinomial logistic regression. In the SPSS output, under the table "Parameter Estimates", there is a message "Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing." How should I deal with this problem? Thank you.
Relevant answer
Answer
Hello Atikhom,
Which particular statistic in the output was omitted? Which version of spss are you using?
Are you willing to post your data and proposed model, so that others could attempt to recreate the condition?
Some obvious points to consider about your data set whenever unexpected results such as the one you report occur:
1. Are your data for any continuous variables suitably scaled? (so that the leading significant digit is not many orders of magnitude away from the decimal point)
2. For your categorical variables, do you have cases in each possible cell/level? (check frequencies for each such variable)
3. Do you have any instances of perfect relationships among categorical variables (perhaps due to empty cells)? (check cross-tabulations for variable pairs and sets)
4. Is one of the IVs redundant with another variable in the data set?
5. Do you have missing data (and are attempting some sort of imputation process)?
That may not cover the waterfront, but at least it may give you some ideas when checking your data.
Good luck with your work.
  • asked a question related to Logistic Regression
Question
3 answers
Seeking insights from the research community: Does the imbalance of textual genres within corpora, when used as an explanatory variable rather than the response variable, affect the performance of logistic regression and classification models? I'm interested in understanding how the unequal distribution of word counts across genres might introduce biases and influence the accuracy of these machine learning algorithms. Any explanations or relevant details on mitigating strategies are appreciated. Thank you!
Relevant answer
Answer
Indeed, the whimsical dance of textual genres within corpora can sway the fate of logistic regression and classification models. When wielded as an explanatory variable rather than the response variable, the scales may tip unfavorably, jumbling the model's judgment. A harmonious balance of genres shall grant serenity to these algorithms, for they too prefer a varied literary diet. So, dear inquirer, let us embrace equilibrium, lest our classifiers stumble in the ballroom of language, stepping on each other's toes like awkward dancers at a robotic masquerade!
  • asked a question related to Logistic Regression
Question
4 answers
Hi. I'm planning to conduct a multinomial logistic regression analysis for my predictive model (3 outcome categories). How can I estimate the sample size? I believe the EPV rule of thumb is suitable only for binary outcomes. Is there any formula/software that I can use?
Relevant answer
Answer
I'd use simulations. You can use any programming language for this. I recommend R, hat could also use later to analyze multinominal models.
  • asked a question related to Logistic Regression
Question
3 answers
Logistic Regression type
Relevant answer
Answer
Have a look at these free resources
Module 9: Single-level and Multilevel Models for Ordinal Reponses: Concepts
  • asked a question related to Logistic Regression
Question
3 answers
I have conducted some ordinal logistic regressions, however, some of my tests have not met the proportional odds assumptions so I need to run multinomial regressions. What would I have to do to the ordinal DV to use it in this model? I'm doing this in SPSS by the way.
Relevant answer
Answer
Hello Hannah Belcher. How did you determine that the proportional odds (aka., parallel lines) assumption was too severely violated? And what is your sample size?
I ask those questions, because the test of proportional odds is known to be too liberal (i.e., it rejects the null too easily), particularly as n increases. You can find some relevant discussion of this (and many other issues) in this nice tutorial:
HTH.
  • asked a question related to Logistic Regression
Question
4 answers
if there would be any literature on BLR Coefficient, that would be very helpful to understand.
Relevant answer
Answer
This software does this by iteration. You can try it for 25 days. Click "help" to see examples. Let me know if you need assistance.
  • asked a question related to Logistic Regression
Question
14 answers
I am running 6 separate binomial logistic regression models with all dependent variables having two categories, either 'no' which is coded as 0 and 'yes' coded as 1.
4/6 models are running fine, however, 2 of them have this error message.
I am not sure what is going wrong as each dependent variable have the same 2 values on the cases being processed, either 0 or 1.
Any suggestions what to do?
Relevant answer
Answer
Sure Bruce Weaver I was running logistic regression as a part of the propensity score matching technique. While watching the tutorial video on Youtube (https://youtu.be/2ubNZ9V8WKw) I realized I am including variables in the equation that I shouldn't have. So, I excluded them and magically, the error did not appear anymore. It was that simple.(: Good luck to everyone who's facing this error!