Questions related to Logistic Regression
I am working on a logistic regression model, with a binary outcome and my model look like this:
Intercept: coefficients is -2.034
Variable: coefficients are 0.031, and the P-value is 0.130. Not Significant.
However when I calculate the Odds Ratio and CI 95% I get
Odds ratio 1.031, Lower Limit = 0.991, Upper Limit = 1.073
However, since the variable is not significant, I shouldn't get a P-value greater than 1 correct?
Can anyone explain why this is happening?
I hope everything is fine.
I'm running various separate logistic regressions on my ordinal response variable (score). I did not include all my (continuous) independent variables (10) in one single model as the model wouldn't run. So, I have a separate model for variable 1 and the score, variable 2 and the score, variable n and the score.
So, my question is whether it is possible to use Bonferroni correction to adjust the significance level (p = 0.05). I would do so by dividing my significance level by the number of models I run.
Thanks a lot!
I'm trying to construct a model for binary logistics. The first model includes 4 variable of predictor and the intercept is not statistically significant. Meanwhile, in the second model, I exclude one variable from the first model and the intercept is significant.
The consideration that I take here is that:
The pseudo R² of the first model is better at explaining the model rather than the second model.
Any suggestion which model should I use?
Hello, Is it posible to use logistic regression on pooled panel data? The dependent variable is whether or not respondent has diabetes. The independent variables are income, gender, education. Should the individual income observations be adjusted to reflect increasing (average) income over time? Are there any specific considerations that should be addressed?
I am performing a meta-analysis of odds ratio per unit of a continuous variable with a dichotomous outcome (dichotomized continuous variable). One of the studies reports a mixed linear regression model with coefficient and standard error for the continuous variable regressed on the continuous outcome variable. Is there any acceptable method to estimate the odds ratio I need?
300 Participants in my study viewed 66 different moral photos and had to make a binary choice (yes/no) in response to each. There were 3 moral photo categories (22 positive images, 22 neutral images and 22 negative images). I am running a multilevel logistic regression (we manipulated two other aspects about the images) and have found unnaturally high odd ratios (see below). We have no missing values. Could anyone please help me understand what the below might mean? I understand I need to approach with extreme caution so any advice would be highly appreciated.
Yes choice: morally negative compared morally positive (OR=441.11; 95% CI [271.07,717.81]; p<.001)
Yes choice: morally neutral compared to morally positive (OR=0.94; 95% CI [0.47,1.87]; p=0.86)
It should be noted that when I plot the data, very very few participants chose yes in response to the neutral and positive images. Almost all yes responses were given in response to the negative images.
I got highly significant negative logit values of the logistic regression . Is Nagelkerke R Square R2=0.22 significant enough to explain the changes in the variables?
I understand the basic concept of forward and backward stepwise logistical regression. However, I am unsure of the indication to use a forward model in preference to a backward model, and vice-versa.
Should you not end up with a final selection of variables that are similar regardless if you started with all the variables and subtracted (backward), or if you had no variables then added (forward)? This is the reason I'm having difficulty in justifying/selecting forward vs backward.
Thank you very much
In my current study, I am identifying the association between some independent variables and a dependent variable. For which I am using bivariate analysis (Cross tab with p values) and multivariate analysis (multiple regression with adjusted Odds ratio). Some previous studies on my topic used different p value cut off points e.g. p<0.25, 0.05, and others included some variables without such restriction.
What should I do? Should I include the same variables in both of the bivariate and multivariate analyses?
Thank you in advance.
I am working on a pooled country dataset (of 19 countries) to look at association of linear growth (both as continuous and binary variable) with IFA intake during pregnancy using linear and logistic regression.
I followed the process for model building as suggested by Homer & Lemeshow but the final predictor model had a p-value of <.05 for goodness-of-fit test and a pseudo R-square of .06 (Almost the same R-square value for linear regression). The VIF was <10. I also tried the modification proposed by Nattino, Pennell, and Lemeshow of the Hosmer-Lemeshow test for large samples as well as calibration test (using Stata 15.1). Even these showed model fit issues.
I then went on to add interactions- it was pure data mining. Adding all combination of interactions one at a time and then adding all significant ones to the final predictor model, followed by the elimination process. Some of these were nothing but noise as evidenced by probability graphs but the model fit turned out to be good. The pseudo R-squared became 0.07. If I remove the interactions which look like noise (in graphs) or cause high multicollinearity (Individual VIF values>10) , the model fit goes away.
I understand that I should be using a multilevel model but there are multiple studies out there that have used pooled data with even larger sample sizes. Unfortunately, no one describes anything on model fit. Is it not necessary? Also the ICC value for the most basic model was .06.
Though R-square and pseudo R-square are not important statistics, but such low values are making me question the models. Omitted variables is what comes to mind but I have used all important predictors found by literature review.
I also understand that data mining is the wrong way to approach but if I try to use only the plausible interactions, none of them are significant. It doesn't make sense to use them and it brings me back to the final predictor model which was not a good fit.
I am also aware that some experts propose not using statistical significance as the basis to decide on predictors. Does that mean that we don't need to look at model fit either?
I am not sure what is the right way to decide on the final model. I have attached a sheet that shows all the fit statistics that I have used for my models. I will really appreciate guidance on this.
I am running 6 separate binomial logistic regression models with all dependent variables having two categories, either 'no' which is coded as 0 and 'yes' coded as 1.
4/6 models are running fine, however, 2 of them have this error message.
I am not sure what is going wrong as each dependent variable have the same 2 values on the cases being processed, either 0 or 1.
Any suggestions what to do?
suppose we split our data into training and test group, and we use the training data alone to create a logistic regression model using SPSS. does this model produce different results than the logistic regression model which is created by python??
and how could we change the probabilities in logistic regression model (SPSS model) for the test group, into interpretable categorical outputs (0 or 1)??
I made the following logistic regression model for my master's thesis.
FAIL= LATE + SIZE + AGE + EQUITY + PROF + SOLV + LIQ + IND.
Where I take a look if late filing of financial statements (independent variable) is an indicator of failure of small companies (dependent variable). FAIL is a dummy variable that is equal to 1 when a company failed during one of the researched years. I use data covering 3 years (2017, 2018 en 2019). Should I include a dummy variable YEAR, to account for year effects, or not. I have searched online but I don't understand what it exactly means and that is why I don't know if it is necessary to include it in this regression model. I hope you guys can help me. Thank you in advance!
For my bachelor thesis I'm conducting a study on the relationship between eye-movements and memory. One of the hypotheses is that the number of fixations made during the viewing of a movie clip will be positively related to the memory that movie clip.
Each participant viewed 100 movie clips, and the number of fixations were counted for each movie clip for each participant. Later participants' memory of the clips were tested and each movie was categorized as "remembered" or "forgotten" for each participant.
So, for each participants there are 100 trials with the corresponding number of fixations and categorization as "remembered" or "forgotten".
My first idea was to do a paired-samples t-test (to compare the number of fixations between "remembered" and "forgotten"), but I didn't find a way to do that in SPSS with this file format as there are 100 rows for each participant. I though of calculating the average number of fixations for the remembered vs forgotten movies per participant and compare and do a t-test on these means (one mean per participant for both categories) but this way the means get distorted because some subjects remember way more clips than others (so the "mean of the means" is not the same as the overall mean).
Now I'm thinking that doing a t-test might not be appropriate at all, and that logistic regression would be a better choice (to see how well the number of fixations predicts whether a clip will be remembered vs forgotten), but I didn't manage to find out how to do this in SPSS in for a within subject design with multiple trials per participant. Any help/suggestions would be highly appreciated.
Road blocked also......
I am baffled by the ZTC term. In this case it would be ZTC =1 and the course mode = 0 (traditional). So why is the odds ratio higher than the odds ratio for ZTC*hybrid? The graph shows you that the yield is greater for hybrid courses (red line). I see that there is a 78% increase in the odds that one is successful in ZTC for traditional courses.... but that is higher than the effect for ZTC*hybrid, which i thought itself would have been higher!!!
(any blanks you see were not a part of this model)
i am stumped. Please help me understand this. Data attached.
I want to draw a graph between predicted probabilities vs observed probabilities. For predicted probabilities I use this “R” code (see below). Is this code ok or not ?.
Could any tell me, how can I get the observed probabilities and draw a graph between predicted and observed probability.
analysis10<-glm(Response~ Strain + Temp + Time + Conc.Log10
+ Strain:Conc.Log1+ Temp:Time
predicted_probs = data.frame(probs = predict(analysis10, type="response"))
I have attached that data file
I want to estimate the half-life value for the virus as a function of strain and concentration, and as a continuous function of temperature.
Could anybody tell me, how to calculate the half-life value in R programming?
I have attached a CSV file of the data
I am doing a study using logistic regression, where we want to control for a few possible confounders. However, many of the confounders, as well as the independent variable are categorical and have been recoded in dummies. How do we use the 10% rule for dummies? Do they all have to differ 10%, or is something considered a confounder if one of the dummies causes a difference of 10%?
I have to investigate
1) how the response depends on the strain, temperature, time, and concentration.
I applied logistic regression (glm) and got the reduced model. When I tried to make the logistic regression line and confidence interval, it looks like that in the picture. (pic attached below)
Could anybody tell me, how to resolve this issue (I want only one logistic regression and two confidence interval lines, not many)?
I have attached the data
For confidence interval I use this
I'm attempting to build a binary response (single species presence/absence) logistic regression model with random intercept (for site variable level). I surveyed 30 sites 1-3 times; approx half of the sites were only visited once. Ideally, I would model site as the random effects variable and include one or two habitat variable(s) as the fixed effects variable(s). I recognize that my sample size is very low and suspect that I also have insufficient replication of observations per level of the random effects variable.
Is it possible to use random effects in this situation? If not what other approach would you recommend?
The only alternative that I can think of is to build a regular binary response logistic regression including only one observation per site, and repeat this for every possible combination of 30 sites. I figure this would allow me to use all of my data to infer which covariates are most influential, although it makes getting coefficient and coefficient confidence estimates, AICc values, etc. difficult as far as I can tell.
I ran a logistic regression of several variables on a dichotomous-dependent variable. One of the independent variables showed the following results: exponent B = 2.16, 95% CI for exponent B is 0.99 - 4.71, and p-value = 0.053.
I wonder if these results are treated as significant or not?
In time-stratified crossover study, a case day was matched to several control days, which was suitable to use conditional logistic regression to analysis. But how was the data formatted in excel and how to perform the model in R software? Are there any detailed papers that I can refer to?
Hello, fellow researchers! I'm hoping to find someone well familiar with Firth's logistic regression. I am trying to analyse whether certain emotions predict behaviour. My outcomes are 'approached', 'withdrew', & 'accepted' - all coded 1/0 & tested individually. However, in some conditions the outcome behaviour is a rare event, leading to extremely low cell frequencies for my 1's, so I decided to use Firth's method instead of standard logistic regression.
However, I can't get the data to converge & get warning messages (see below). I've tried to reduce predictors (from 5 to 2) and increase iterations to 300, but no change. My understanding of logistic regression is superficial so I have felt too uncertain to adjust the step size. I'm also not sure how much I can increase iterations. The warning on NAs introduced by coercion I have ignored (as per advice on the web) as all data looks fine in data view.
My skill-set is only a very 'rusty' python coding, so I can't use other systems. Any SPSS friendly help would be greatly appreciated!
1: In dofirth(dep = "Approach_Binom", indep = list("Resent", "Anger"), :
NAs introduced by coercion
2: In options(stringsAsFactors = TRUE) :
'options(stringsAsFactors = TRUE)' is deprecated and will be disabled
3: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
logistf.fit: Maximum number of iterations for full model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
4: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
logistf.fit: Maximum number of iterations for null model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
5: In (function (formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, :
Nonconverged PL confidence limits: maximum number of iterations for variables: (Intercept), Resent, Anger exceeded. Try to increase the number of iterations by passing 'logistpl.control(maxit=...)' to parameter plcontrol
I am conducting research on determinants of entry mode. Where equity vs. non-equity is my DV and I have a few IV's; family ownership (dummy), international experience, market competition and dynamism. Furthermore do I have a moderating variable which is host-country network (dummy variable). Is the interpretation of the interaction effect of Family Ownership x Host-Country Network on Entry Mode different than for the other variables? I haven't found any literature on the interaction effect between 3 categorical variables.
My question concerns the problem of calculating odds ration in logistic regression analysis when the input variables are from different scales (i.e.: 0.01-0.1, 0-1, 0-1000). Although the coefficients of the logistic regression looks fine, the odds ratio values are, in some cases, enormous (see example below).
In the example there were no outlier values in each input variables.
What is general rule, should we normalize all input variables before analysis to obtain reliable OR values?
Hello, I have a question related to multinomial logit model and conditional logit model. I have read a book (logistic regression using SAS Theory and Application), the book stated that multinomial logit model is a special case of the conditional logit model, while it also stated that the multinomial logit model and conditional logit model differ in two ways: conditional logit model can include characteristics related to the choice options; the set of available options can vary across individuals in the analysis.
Suppose I have a research in which different participants may be facing different options, therefore, which model (multinomial logit model and conditional logit model ) should I use? Can I keep using multinomial logit model since it is just a special case of conditional logit model?
Please, if I divided my data (patients) into two groups: (a (yes) and (b (No), and I need to examine the preoperative factors( category( 2or 3 types) and nominal) for group b only.
I tried to use logistic regression, but independent space, I cannot be specific for group b, which makes me put both groups in the dependent area. However, I need to examine which factor affects group b and make it Negative(No), ( I need to determine if there is any relationship between group b and the factors ).
I have all the data regarding landslide susceptibility mapping, and I want to analyze the data by using the logistic regression model, but still, I have no idea how to process it.
I have conducted a logistic regression (method: forced entry) in R to analyze a group of independent variables (n=4). My criterion is a disease (yes/no). My sample size is n=105. I selected these variables based on theoretical consideration (some of them are known risk factors). My aim was to examine how well these variables predict the outcome and which of them are significant. My results indicate that only one of the 4 variables is significant (one trending).
Now I am wondering what might be a reasonable next step:
1) Is it appropriate to report the model "as it is" with R^2 (Nagelkerke etc.), a goodness of fit statistic (chi-square test), even if only one variable is significant? My idea behind this is to demonstrate the status quo and show that some variables indeed have no predictive power (at least in our data set)
2) In the next step (after looking at the first model), should I run a stepwise backward regression to find a model that best fits the data and contains only significant predictors? If I understand correctly, stepwise methods are more useful for exploratory analysis with a large sample size.
3) Can I just exclude all non-significant predictors in model 1, rerun my logistic regression analysis and report model 2 with the one significant predictor and perhaps the one trending (I'm not sure if this is actually a stepwise backward regression).
I'm looking forward to hearing from you and thank ypu all for your help!
I ran a logistic regression model with PTSD, MDD, Nativity, and (PTSD*Born outside the US) interaction term predicting Nicotine Dependence (yes/no). The main effect of Born Outside the US (ref: born in the US) has OR=9.13, PTSD main effect OR=2.12. However, the interaction term of PTSD and Born outside the US has OR=0.31. I find it very strange that the OR changed direction. Can anyone advise on the potential explanations for such results?
I want to present the results of multinomial logistic regression at a conference in a visual way,
is it enough to present the table of the results in the power point,
I applied Ordinal Logistic Regression as my main statistical model, because my response variable is 7 Point-Likert Scale data.
After testing for Goodness of Fit using AIC, i got my best fit model, including 4 independent variables (3 explanatory and 1 factor variable).
However, I encounter 1 negative coefficient value (0.44 odds) of 1 explanatory variable (all explanatory variables are also 7 point Likert-scale).
My theoretical assumption is simple: the more frequency of explanatory variables (engage in activities) happen, the higher impact score on response variable (mutual understanding)
That's why I am confused when 1 independent variable has negative coefficient.
In this case, how should I interpret this IV?
Thank you very much,
I have a simple model with only one independent variable, and it is binary/categorical (as is the dependent variable). The log-odds estimate is 4.6821 with a standard error of 0.4978. The point estimate for the odds ratio is 108, with a confidence interval of 40.708 , 286.527.
I ran the same model, simply changing the reference and the log-odds estimate is -4.6821, same standard error, but the point estimate for the odds ratio is 0.009 with a confidence interval of 0.003 , 0.025. This seems reasonable.
Is an odds ratio of 108 valid? Since both my dependent and independent variables are binary/categorical, it isn't an issue of outliers. I only have 2 missing values. Sample sizes are sufficient (275 negative, 102 positive, 73 Y, and 304 N). For Y and negative there are only 5. Could that account for such an inflated odds ratio?
I would like to examine the association between a disease and psychological load. The disease can be determined by different diagnostic methods. In our study, we performed four accepted procedures to determine the disease. Each participant (total sample n=45) underwent all four procedures at different times. Additionally, we examined psychometric data, and these are the main variables I am focusing on.
My idea is to examine the association between the disease and psychological load, as a function of the diagnostic method chosen. In other words: Is the association between the disease - diagnosed by method A - and psychological load significantly different/stronger than the association between the disease - diagnosed by method B - and psychological load.
As for the statistical methods, I initially thought of logistic regression with the disease as criterion and the psychometric variables as predictors. This would lead to 4 regression models: SB diagnosed with A; SB diagnosed with B etc. as criterion. My idea is to compare the AICs of the four different models: Do the psychometric variables predict the disease better and explain more variance when diagnosed with method X or method Y.
I hope my question and concept is comprehensible.
Is this an appropriate approach or does anyone have another idea?
Thank you very much for your replies!
I am currently replicating a study in which the dependent variable describes whether a household belongs to a certain category. Therefore, for each household the variable either takes the value 0 or the value 1 for each category. In the study that I am replicating the maximisation of the log-likelihood function yields one vector of regression coefficients, where each independent variable has got one regression coefficient. So there is one vector of regression coefficients for ALL households, independent of which category the households belong to. Now I am wondering how this is achieved, since (as I understand) a multinomial logistic regression for n categories yields n-1 regression coefficients per variable as there is always one reference category.
I want to do univariate and multivariate binary logistic regression in SPSS. I am wondering about the timing of the Box-Tidwell test, is this both applicable to univariate and multivariate binary logistic regression? I am using a forward LR model, do I perform Box-Tidwell tests on all predictors that I placed in block 1 or just on the predictors that SPSS included in the forward LR model?
Moreover > what to do with a predictor if the linearity assumption is significant? Can I still include this predictor in the model in some way, or should I leave it out of the model?
I have a set of independent variables (from 1 to 8 depending) which are all continuous variables. My dependent variable of interest is an ordinal value that is a Likert-scale representation of an employee's intent to remain at their current job from 1 to 5.
I attempted to run a binary logistic regression but I appear to fail the proportionality conditions there and want to give Mlogit a try,
I believe.a downside to this is the loss of "rank", however, in any event, I am not entirely clear on how to do this in SPSS (or R). In particular I am struggling to interpret my results.
In the attached, Factor1-8 are my independent.
My dependent variable is the aforementioned ordinal. I chose 5 to be my reference. My questions are as follows
- Am I barking up the right tree here with this approach?
- How do I interpret the results?
Thank you for any help you can provide
Dear researcher, I have read a lot about Liker scale and Liker-like questions. However, it is always ''depends'', and needs to be evaluate from situation to situation.
My aim is to examine factors that correlate with attitudes among public health workers represented by 5-point Liker scale. Dependent variables are Q7s. Indepent variables should be all variables above?
Dependent variables are Liket like responds on these questions (only first two... In total there are nine questions):
1. I feel trained enough to ask the client about the use of psychoactive substances
2. I feel qualified enough to ask the client about the amount and frequency of use of psychoactive substances daily activities
Independet variables are: geneder, age (number of year), experience (number year), profession (4 group), training (yes/no), knowledge about different aspects of drug use (in 5-pont Likert like scale from no knowledge to excellent knoweldge)
The file is in the linke or in the attachemnt (no virusis, free to download, Translated by googleTranslate)
password is: RG%April2022
Thans for your help.
I am creating a risk score from some variables using the following steps:
1- Dividing data into training and validation cohorts.
1- Logistic regression (unadjusted then fully adjusted).
2- Selecting the variables (p<0.05) in the fully adjusted.
3- Transforming Bs into scores.
4- ROC curve.
5- Calibration using the validation cohort.
I have problems with the last 3 steps. I am using SAS. So I will be grateful if you can give me sources for the codes.
I would like to conduct a logistic regression, however, I have an independent variable that is not normally distributed and some values seem to be a bit extreme.
Shall I be worried about the effect of potential outliers in an independent variable that is not normally distributed or it will not impact the results of the logistic regression?
If yes, any advice on how to deal with these outliers would be very much appreciated.
Would it be better if I don't enter into logistic regression model those variables with extremely unbalanced distribution in the 2 groups, even if statistically significant (p<0.05 ) in the bivariate analysis, for example: a frequency 0% or 100% in one of the two groups?
For instance, demographics could be potential confounders. Would it be better if I include them as the predictors in the logistic regression model along with other predictors or I should first factor them out and then use the residuals for further analysis?
I used logistic regression model for analysis which has over 17,000 observations. Although, the model results in several statistically significant predictors, McFadden's Adj R squared/ Cragg & Uhler's R2 are very low! For my model McFadden's Adj R squared is 0.026 and Cragg & Uhler's R squared is 0.044. Can I proceed with these R squared? I would really appreciate your suggestion on accepted level of R squared, which has to be backed up by relevant literature. Thank you!!
I have dependent variable with 3 categories. When I performed Chi-square test, it showed association of DV with 9 independent variables (p-value less than 0.05). but when i run ordinal logistic regression on the same data, the p-value is totally different and it just shows 3 significant variables?
how can i interpret these results?
I am trying to adjust for the confounding effect of a third variable on the association between ethnicity (has multiple categories) and death (binary). I am using fixed effect conditional logistic regression to build multivariable model. I know that for a factor to be considered an important confounder it has to change the crude odds ratio by more than 10% (besides the other criteria of being associated with the exposure and outcome).
However, in case I have many categories for the exposure, how can I know if a third factor is an important confounder? Should it change the odds ratio of "ALL CATEGORIES" by 10% or more, or even a change in "one out of all categories" makes it an important confounder? or is there another more appropriate way to deal with the situation?
I am interested how to interpret odds ratio in logistic regression when OR is <1.
Lets say odds ratio for variable higher education = 0.34 3721
Now I calculated probabilities of staying and exit by applying formula P=Odds ratio/1+Odds ratio - P(staying) = 0.34 3721/1+0.34 3721= 0.2558
Then probability of exit will be 1 - 0.2558=0.7442 Can I interpret it in the following way:
Farmers with higher education (bachelor and above) are 0.34 times more likely to stay in agricultural sector in contrast to farmers with lower education, i.e. there is almost 74% less chance of staying in agricultural sector
I ran a logistic regression with continious IV in SPSS. In the table "variables in the equation" one variable is missing (despite using entry method) and without any message from SPSS . When browsing through the web I understood that this might happen due to collinearity. However Collinearity diagnostics did not return a clear sign for it. Highest VIF-values are 6.1 and the highest Conditionindex is 21.1.
So my question are:
1. Is my regression model still valid despite SPSS dropping one variable?
2. Are there other reasons than collinearity why the IV is missing in the model.
I am running a multinomial regression analysis between a categorical (3) dependent variable and a continuous independent variable. My independent variable is arranged into quartile. I want to know how to get relative risk ratio/odds ratio/ coefficients for each quartile while keeping quartile 1 as base/reference. I am using stata.
Based on my contingent valuation survey (double-bounded dichotomous choice), I have decided to run the models (e.g. Model 1: WTPa; Model 2: WTPb; Model 3: WTPc, and WTPmax) for only positive response. The question is how can I run these models separately?
The dependent variable is dichotomous WTP response (e.g. WTPa-first answer for offered price; WTPb- second answer for next question...). While the independent variables are sociodemographic info (e.g. age, income, occupation, etc.).
Should I separate first the data into a different model? Then, the problem is what is the dependent variable? Because there is only one answer (e.g. Yes).
Hello, I'm an undergraduate student completing my dissertation (using SPSS) so please bear with my very limited understanding of binary logistic regression.
My outcome variable = referral outcome for ADHD assessment (dichotomous: accepted or rejected)
Significant predictor variable = gender
However ExpB which I understand to be the log odds ratio, is 20.520 with confidence intervals LI= 4.139 UI= 101.731
I've been told by my supervisor that 20.520 is implausibly high - e.g., it wouldn't be right for me to report that males are at 20.520 higher odds of being accepted for ADHD assessment (which is what I've interpreted these results to mean).
I've done lots of research to try to figure out what went wrong
- there is no multicollinearity (VIF are all between 1 and 2)
- there are 106 cases so I don't think the sample size is too small?
- the other predictor variables are all on the same scale (only one other variable is sig)
- there are a 3 outliers but the data has been input correctly and I believe it's not ultimately helpful to just remove them without good reason? Also, when I tried removing them, the ExpB just got larger...
- gender is set as a nominal variable in SPSS
Have attached table for reference.
Any help would be hugely appreciated about how to interpret this number, whether it could in fact be plausible? Or if not, what I could do as a next step?
Hi We are comparing mortality of two therapies in COVID patients. We have identified 74 patients in our hospital records (details in the attached image). We also have data of vital signs and lab data for these patients taken at different intervals.
Our idea is using logistic regression with mortality/discharge as endpoint, adjusted by patient status on admission.
Is this sample size enough for this kind of analysis? If not, how do you suggest we analyze this data?
I am running a multinomial logistic regression. How do I change the reference category of independent variables that are categorical? I am familiar with changing the reference category of the dependent variable. But how does it work with independent variables? By default, the last category is taken as the reference group.
I am running multivariable confounding bootstrap (1000 iterations) logistic regressions to see how 5 ethnoracial groups (IV's) differ in terms of PrEP barriers (outcome).
For one of my ethnoracial identities though, there are no cases. Yet the bootstrap estimate was significant with a beta of -18, and a CI that does not cross zero. The other IV's were not significant for that barrier. I've never dealt with this scenario before; should I remove that IV and try to rerun the model?
Although I understand what bootstrap distributions are, I'm not sure how I could go from no cases and thus insignificant (with p > .999 before bootstrapping) to a p value in the bootstrapped analysis of .03. Any help is appreciated
Any one who can give me hint on how to determine the cut off point for p-value of bivariable logistic regression to take the variables to multivariable regression?
We have been conducting a retrospective cohort study. The variables we are examining are assumed to be very age-dependent and the exposed population is very small (~40 patients), therefore we have considered matching for age and sex at a 1:2 or 1:3 ratio to increase statistical power and limit confounding.
Which statistical test would be most appropriate for calculating risk ratios for dichotomous categorical variables?
This article ( https://academic.oup.com/epirev/article/25/1/43/718675 ) suggests conditional Poisson regression, which I have attempted in Stata, but it appears to work only for 1:1 matched pairs.
It also suggests an adjustment of Cox regression so as to yield the same results as conditional Poisson regression (" if the time to death or censoring is set to some arbitrary constant value and if the Breslow or Efron methods are used to account for tied survival times, the results will be the same as those from conditional Poisson regression, as the likelihoods for these methods are identical when the data come only from matched pairs ").
I have recently attempted a similar adjustment (as described here: https://www.ibm.com/support/pages/conditional-logistic-regression-using-coxreg ) to yield the same results as conditional logistic regression (odds ratio) for a 1:N matched case-control study using Cox regression.
If such an adjustment is possible, how exactly could it be implemented in SPSS? If not, what other alternatives are available to us in this juncture?
Thank you in advance.
I have a slope data classified based on weather an area is vulnerable to erosion. Below is an example: vulnerability is a categorical variable (yes/no) and slope is a continuous variable ranging from 0 to 90.
I want to know if the slope of vulnerable areas is significantly different from that of non-vulnerable areas. At first, I performed unpaired two-samples t-test by classifying slope data into two groups based on vulnerability. Then, while I was looking into statistics for other dataset, I realized this dataset might be interpreted in a different way: one continuous variable (i.e., slope) and one categorical variable (i.e., vulnerable or non-vulnerable). If it is correct, ANOVA or logistic regression can be used? Also, I'm wondering which analysis (two continuous variables VS one categorical and one continuous variables) is more appropriate in my case. Thanks.
I have developed a logistic regression based prognostic model in Stata. Is there any way to develop an app using this logistic regression equation (from Stata)?
Most of the resources I found require me to develop the model from scratch in Python/R and then develop the app using streamlit/Shiny etc.
However, I am looking for a resource where I could use the coefficients and intercept values from Stata based model rather than build model from scratch in python.
Gender is a negative predictor contributing to the model in an ordinal logistic regression predicting pornography.
How can we interpret this.
-.555 std .E
For logistic regression, the model may be specified as:
Pr(yi = 1 | Xi ) = G(X), where G(x) = ex/ (1+ex)
What would the corresponding model be for Firth Logistic Regression?
Pr(yi = 1 | Xi ) = G(X), where G(x) = ...
How (if) would the penalty feature?
Any help would be most appreciated!
I'm writing my thesis and performed first an multinomial logistic regression,
I found out this was wrong since my dependent variable is ordinal. Now im trying to perform an ordinal logistic regression but end up with dots in my table, can someone please explain to me why these appear?
In order to test the assumptions of a logistic regression I tried to conduct the Box-Tidewell test. So far so good... I encountered the problem, that I have quite often the value 0 in my independent variables which leads to no values for the x*ln(x) term. This means a very considerable reduction in includable cases. (19 instead of 198!!!) Any ideas how I could deal with it?
Many thanks and kind regards Ilka
Let us suppose we have a new cheap and simple diagnostic test we want to evaluate against the expensive and complex gold standard for a highly lethal disease.
The gold standard test is dichotomous (positive or negative), but the new test returns two continuous results: let's call them "Result A" and "Result B".
Assuming the disease can be accurately diagnosed with the gold standard test, we want to
1) estimate the posterior probability of disease given the prior and the new test results A and B, i.e. P(D+|A,B)
2) define the best threshold values for both A and B
Given the high lethality, we're more interested in avoiding false negatives.
Let's suppose we have data like the ones in figure 1 (randomly generated data). Big red dots and small grey dots are patients whose gold standard test did result respectively positive and negative.
Which is the best model to evaluate such a test?
Logistic regression and ROC curve?
Clustering in machine learning?
I'm a fish biologist and I'm interested in assessing the uncertainty around the L50, which is the (sex-specific) length (L) at which you expect 1 fish out of 2 (50%) to exhibit developed gonads and thus, participate in the next reproductive event.
Using a GLM with a binomial distribution family and a logit link, you can get the prediction from your model with the predict() function in R on the logit (link) scale, asking to generate too the estimated SE (se.fit=TRUE), and than back-transform the result (i.e., fit) on the response scale.
For the uncertainty (95%CI), one can estimate the commonly-used Wald CIs by multiplying the SE by ± 1.96 on the logit scale and then back-transform these values on the response scale (see the Figure below). From the same logistic regression model, one can also estimate the CI on the response scale with the Delta method, using the "emdbook" package and its deltavar() function or the "MASS" package and its dose.p() function, still presuming that the variance for the linear predictors on the link scale is approximately normal, which does not always hold true.
For the profile likelihood function that seems to better reflect the sometimes non-normal distibution of the variance on the logit scale when compared to the two previous methods (Brown et al. 2003), it unfortunately seems that no R package exists to estimate CIs of logistic regression model predictions according to this approach. You can, however, get the profile likelihood CI estimates for your Beta parameters with the confint() function or using the "ProfileLikelihood" package, but regarding a logistic regression prediction, it seems that one would need to write its own R scripts, which we will likely end up doing.
Any suggestion would be welcome. Either regarding specifically the profile likelihood function (Venzon & Moolgavkar 1988) or any advice/idea on this topic.
Briefly, we are currently trying to find out which of these methods (and others: parametric and non-parametric bootstrapping, Bayesian credible intervals, Fieller analytical method) is/are the most optimal at assessing the uncertainty around the L50 for statistical/biological inferences, pushing a bit further the simulation study of Roa et al (1999).
Thanks, Merci, Obrigado
Brown, L. D., T. T. Cai, and A. DasGupta. 2003. Interval estimation in exponential families. Statistica Sinica 13:19-49.
Roa, R., B. Ernst, and F. Tapia. 1999. Estimation of size at sexual maturity: an evaluation of analytical and resampling procedures. Fishery Bulletin 97:570-580.
Venzon, D. J., and S. H. Moolgavkar. 1988. A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94