Science topic

Cox Regression - Science topic

Explore the latest questions and answers in Cox Regression, and find Cox Regression experts.
Questions related to Cox Regression
  • asked a question related to Cox Regression
Question
3 answers
The Analyse for Pilot study should be based on the descriptive stat and ideally not involved inferential stat.
what about exploratory study? can I do inferential stat like cox regression or logistic regression for exploratory study?
Thanks
Relevant answer
Pour une étude exploratoire, il est tout à fait approprié d'utiliser des statistiques inférentielles. Une étude exploratoire vise à identifier des relations potentielles et des hypothèses qui peuvent être testées plus rigoureusement dans des études futures. Vous pouvez utiliser des méthodes statistiques inférentielles telles que la régression de Cox ou la régression logistique pour examiner les associations et les effets entre les variables.
Cependant, il est important de noter que les résultats d'une étude exploratoire doivent être interprétés avec prudence et considérés comme préliminaires jusqu'à ce qu'ils soient confirmés par des études ultérieures.
  • asked a question related to Cox Regression
Question
1 answer
From my understanding, the baseline hazard or baseline survival function is unkown because cox regression is semi-parametric model. So why and how can we use it as a prediction model, for example using it to predict the 10 years survival probability.
Relevant answer
Answer
We can. If you use Stata you can make predictive model after cox regression analysis by applying generalized linear model in poisson distribution.
  • asked a question related to Cox Regression
Question
2 answers
I am working on a meta-analysis where i have extracted the data directly from KM curves via web plot digitizer to calculate HRs for the studies that reported only KM Curves. One of the study has three curves and web plot digitizer would give me a total of three groups. I was wondering if it is appropriate to combine the data for two of those groups and calculate an overall HR for a meta-analysis? Keeping in view that it is a time-event data and there's censoring too. I tried using the method elaborated by Cochrane but it gave me a really wide confidence Interval.
Anyone having any lead of how to deal with this?
Relevant answer
Answer
In my experience, estimated data (e.g.: data retrieved from KM) will usually give you a "really wide" confidence interval as you say. Personally, I would have computed all three groups as three individual studies for the meta-analysis (in addition to other studies).
Cheers!
  • asked a question related to Cox Regression
Question
1 answer
I'm planning to use a cox regression in a future study exploring time-to-event or survival analysis comparing a control with an experimental group. I've seen sample size calculated through several packages, but I prefer G*Power and wanted to know if anyone's done this. Any resources would be appreciated.
Relevant answer
Answer
Hello Eden,
No, g*power doesn't have a direct method to estimate N for Cox regression models. However, here are some links that will get you on your way:
1. A previous RGate reply to a similar question, with links to formulae as well as a couple of online sample size calculators:
2. Guidance for degree to which a target covariate is independent of any other covariates in a model, and general sample size estimation:
3. Guidance for minimum events per variable, based on a simulation study performed on a large, real data set:
Good luck with your work.
  • asked a question related to Cox Regression
Question
3 answers
Hi,
If there is a binary/nominal dependent variable and one wants to do a logistic regression analysis. In that case, for a given categorical predictor, how many minimum observations are required per category to conduct analysis? Should I include independent variables that have low frequencies in some categories of response?
For eg, if dependent variable is treatment received on time (1=yes, 0=no) and a categorical predictor of education status has 3 categories ( illiterate=0, primary=1, >secondary=2). In this case, if there were only 12 illiterate women who were treated on time but the rest categories across the DV had a decent sample (>=30). Can we still use education as a predictor for this analysis? Please share any references of rules/guidelines.
Relevant answer
Answer
Hello Marian,
Here's the glib rule: More is better as regards numbers of cases observed for each level of a categorical variable. Your resultant parameter estimates will be better as a result. There are various guidelines offered, but these are generally guidelines and not commandments. Here are some general observations:
1. You want at least some cases in each level of a categorical variable. Otherwise, you have insufficient information to make inferences about one or more of the categories/levels. If some levels are genuinely rare in the population, and your sample reflects this, then you may find alternative models (e.g., Poisson) to be preferable.
2. Whether it makes sense to use an unordered, k-level categorical variable (when k > 2) in your analysis without re-expressing as (k - 1) dummy variates hinges on the type of analysis you plan to conduct. In a chi-square test, it could work; in a logistic or ordinal regression model, it will not.
3. It can make sense to over-sample rare categories in order to help with precision of parameter estimates.
4. In the context of chi-square contingency tables, the usual advice is to have sufficient frequencies such that expected cell frequencies are at least 5.
5. One option would be to run bootstrap estimates of standard errors for each estimated coefficient, and if that SE exceeded some threshold, then judge that variable's contribution as indeterminate (e.g., more data required).
Good luck with your work.
  • asked a question related to Cox Regression
Question
4 answers
Dear Fellow Researchers,
We have analyzed the effect of various parameters on bladder cancer recurrence. Among included parameters, we had various continuous parameters.
In the literature, continuous parameters are often split into two groups based on the cut-off values. What is the advantage of doing this instead of keeping these parameters continuous?
For now, we determined cut-off values and repeated cox regression analysis for three parameters. The results in univariate and multivariate analyses are similar to when these parameters were continuous. However, we still have around 20 parameters only analyzed as continuous. Should we stay with continuous parameters or change to categorical?
I am looking forward to your suggestions,
Best,
Malgorzata
Relevant answer
Answer
Binning them into 2 categories is helpful if you want to compare above/below a set threshold, otherwise you can simply include continuous variables in a cox regression model. The hazard ratio you get out will then correspond to one unit change in your continuous variable.
only caveat is that you are assuming the relationship increases linearly which may not be the case e.g. if age is your continuous variable and increase from age 20 to 21 has a very different effect from increasing age from 47 to 48 then this is a nonlinear relationship and would not be modelled well as a continuous variable.
alternatively you can catergorise it into multiple bins and then use the binned variable giving a hazard ratio for each bin compared to a reference bin e.g. HR for age 40-50 compared to group on 20-30 if 20-30 is your reference bin
  • asked a question related to Cox Regression
Question
2 answers
Greetings Fellow Researchers,
I am a newbie in using survival analysis (Cox Regression). In my data-set 10-40% cases have missing values (Depending on the variable I include in my analysis). Based on this I have two questions,
1- there are any recommendations on accepted percentage of cases dropped (missing values) from the analysis?
2- Should I impute the missing values of all the cases that were dropped (lets say maximum of 40%).
Thank you so much for your time and kind consideration.
Best,
Sarang
Relevant answer
Answer
I have no first-hand experience to offer you either. But this systematic review article may give you some ideas.
  • asked a question related to Cox Regression
Question
6 answers
Dear RG members.
I conducted cox regression analysis and unfortunately the HR of some of the variables turned out extremely large, like 1.17e+09, 1.31e+10, and extremely low for some others (1.87e-21).
FYI: 24 variables were included in the regression, and before running the regression, I have checked interaction and around three variables were excluded because of this. So, why I am encountering this problem and any solution please!
Thank you in advance.
Relevant answer
Answer
If you have one predictor that is near perfect and it makes scientific sense, why do you need anymore?
  • asked a question related to Cox Regression
Question
1 answer
Hi, I am currently conducting a survival study to investigate the role of several potential biomarkers as prognostic factors in certain cancer. First, I perform Kaplan-Meier analysis for all the biomarkers and other relevant clinicopathologic data. However, only one biomarker fulfilled the proportional hazard criteria from the Kaplan-Meier curve. Other biomarkers and clinicopathologic variables do not fulfill the criteria.
I am wondering, do I still need to proceed to Cox Regression analysis? Can I include the other biomarkers and relevant clinicopathologic data in Cox Regression, even though they do not fulfill proportional hazard criteria during Kaplan-Meier analysis? Thank you.
Relevant answer
Answer
Your question does not make sense.run the model that you wish to run.then look at the schoenfeld residuals for lack of pattern. See the attached screenshot reference for full details. Best wishes David Booth
  • asked a question related to Cox Regression
Question
4 answers
I need to perform cox regression hazards ratio analysis, currently only having graphpad v8.0
Can anyone kindly help me to sort this issue?
#coxregressionanalysis #statistics
Relevant answer
Answer
Hi,
You can download the trial version of IBM SPSS and do some tests.
IBM SPSS Statistics Trial. But yuo need to learn to use it.
Better to consult Research and Stats department in the Institute.
  • asked a question related to Cox Regression
Question
2 answers
Dear Colleagues
I am struggling to understand stratified cox regression ?
In Google it says: The “stratified Cox model” is a modification of the Cox proportional hazards (PH) model that allows for control by “stratification” of a predictor that does not satisfy the PH assumption
However, I dont understand what stratification do?
My understanding is that the variable we stratify on is not included in the cox Re model? Is this correct?
So what happens then? Is a cox regression runs when this variable we stratify on is negative and run once again when it is positive ? And then the hazard ratios are averages of all the models?
If this is wrong, so what happens?
What if the variable we stratify on is really important factor for predicting survival?
How to interpret it in this case?
Relevant answer
Answer
Take a look at some of the materials in the attachment they should be helpful. Best wishes David Booth
  • asked a question related to Cox Regression
Question
4 answers
Hi,
I developed probability of default model using cox PH approach. I use survival package in R. I have panel data with time-varying covariates. I made prediction with following code: predict(fit, newdata, type='survival'). However, predicted survival probability is not decreasing over time for each individual (see picture).
I wonder if prediction is marginal survival probability or cumulative survival probability?
If this is cumulative survival probability, why are it not decreasing across time?
Thanks
Relevant answer
Answer
David Eugene Booth I did not know the answer when I asked question. Then I conduct some research and find out the answer. I wrote the correct answer in the comment section, since, if someone else will be interested in such a issue, this answer will help them.
  • asked a question related to Cox Regression
Question
3 answers
Which of the two models is better to analyze factors that influence the appearance of a certain event when the data is not censored? Cox Regression or Logistic Regression? Let´s add that time to the event is not really relevant.
Relevant answer
Answer
Cox Regression.
  • asked a question related to Cox Regression
Question
3 answers
Hi, I am currently performing survival analysis using SPSS. I wanted to determine which of the following factors: (expression of several proteins from immunohistochemistry and several clinicopathologic parameters) influence the survival of cancer patients. However, during univariate analysis, most of the factors did not fulfill the proportional hazard assumption (the Kaplan-Meier curve have multiple overlaps). Questions:
1) Can I still include the factors (especially the protein expressions) in Cox Regression analysis? It is a fairly small study with about 30 patients.
2) If not allowed, are there any potential solutions? Some of the curves overlap at multiple time points. (I have attached an example of the curve)
3) During Cox Regression, which method is preferred? Can I use 'Enter' or 'Backward LR'? If I use Backward LR, do I still need to perform univariate analysis before and filter the variable to be included into the regression? How about the 'Enter' method?
4) When I tried including all factors into Cox Regression, there are a couple of variables where the hazard ratio is < 1 during univariate analysis and non-significant, but becomes > 1 and significant during Cox regression. What are the possible explanations?
Thank you for your help.
Relevant answer
Answer
Youi might try this approach it worked for me. Best wishes David Booth
  • asked a question related to Cox Regression
Question
15 answers
Automated statistical inference in medical research meets the criterion of artificial intelligence. John McCarthy, widely recognized as the father of Artificial Intelligence, defined the term AI as “the science and engineering of making intelligent machines”. These would be computer tools, modeled on the functioning of the human mind, as well as technologies and research in the field of fuzzy logic, evolutionary computing, neural networks or artificial life, robotics.
In the field of clinical medicine the concept of AI is widely used , e.g. computer-assisted diagnosis, computer consultation systems (Stanford University, a leading center of AI in medicine), pattern recognition, machine learning, electronic health record EHR, clinical guidelines in a computerized form, diagnosis of the individual diseases and can be further listed.
We have carried out research focused on statistical elaboration of clinical data and some software solutions that automate statistical works.
In research projects defined as epidemiological and statistical, two people (or groups) are involved: an epidemiologist specifying the subject and scope of the analysis, the research plan, and statistician translating the research assumptions into the language of statistics, including the selection of statistical analyzes, finding libraries, correct execution and interpretation of the results analyzes. In the AI approach, an "intelligent computer program" works similarly to a human statistician. The epidemiologist plans the analyzes, saves the assumptions in metadata files for computer aims. Specialized computer program manages the statistical program in every step . First, it creates rational command texts based metafile , as if a programmer would do it using a screen interface and his mind (knowledge, skills, experience). Second, it "uses" a statistical package to execute prepared scripts. Third, he transforms the results of statistical program into a text for epidemiologist in plain tekst.. Statistical program is hidden from the user.
The project currently covers GLM, GLMM, one- and two-level linear and logistic regression models, Cox regression , KM, as well as preliminary data analysis.
If you would like to read the computer implementation of this issue I write in the discussion "AI Statistical Analysis of Clinical Data for Non-Statisticians - part 2: computer implementation"
Relevant answer
Answer
Respected Aref Wazwaz Sir,
These all are must helpful and fruitful for Hanna Mielniczuk Ma'am and others also.
Sir, Salute You!!!!!!!!!!!!!!!!!
  • asked a question related to Cox Regression
Question
3 answers
I seem to be getting really exponentially high HR on multivariate analysis by cox-regression. I only have 70 or so patients and 5 poor outcomes. Any ideas why my results may have turned out this way?
Relevant answer
Answer
Interested
  • asked a question related to Cox Regression
Question
4 answers
Dear biostats community,
I am trying to build a Cox (Proportional Hazards) Regression and have a dataset with several variables. I am trying to decide which variables are the most useful to use as covariates in my model. Which method do you use and recommend for doing this? I thought at first to do a univariate analysis and see which variables don't have a significant survival difference to exclude them but as I understand such procedures have the issue that they don't take into account the interaction between the different variables.
Thank you very much!!
Gabriel
Relevant answer
Answer
Hi,
Intuitively selecting based upon already associated covariates from literature. another way is step-wise selection.
This is a reference link for different methods which can be used.:
and
Garcia, R. I., Ibrahim, J. G., & Zhu, H. (2010). Variable selection in the cox regression model with covariates missing at random. Biometrics, 66(1), 97–104. https://doi.org/10.1111/j.1541-0420.2009.01274.x
For a theoretical discussion:
Handbook of Survival Analysis by John P. Klein, CRC Press Taylor & Francis Group
  • asked a question related to Cox Regression
Question
2 answers
Dear Colleagues,
In the Stata manual for Lasso, they came up with a new way to do inferential models using Lasso for linear and logistic regression.
It helps calculating standard error, thereby calculating odds ratio and confidence interval
The equation is as follows:
where d is the covariate of interest.
The double-selection solution Double selection is the easiest of the three to explain. Its algorithm is the following:
1. Run a lasso of d on x.
2. Run a lasso of y on x.
3. Let xe be the union of the selected covariates from steps 1 and 2.
4. Regress y on d and xe.
I wonder if the same equation can be applied on cox regression as cox regression is a special type of linear regression?
I wonder if you guys tell me your opinion ?does any one have reference for this?
or is it just theoretical thoughts?
Relevant answer
Answer
There's another approach you can use. See the attached paper it works pretty well. Best wishes David Booth
  • asked a question related to Cox Regression
Question
2 answers
Hi,
I have been performing survival analysis using Cox regression models, and I encountered situation when after adding time-varying effect for a variable X (X*time; variable violated the PH assumption), the added interaction with time was significant in the model, but the main effect of variable X was not, as illustarted below:
Model without interaction with time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 0.4633 1.5894 0.1625 2.852 0.004 **
Model with interaction between X and time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 -0.3978 0.6718 0.4444 -0.895 0.371
tt(factor(X)) 0.6230 1.8645 0.2816 2.212 0.027 *
In the study we are interested in the effect of the X variable on the survival outcome, and after inclusion of the time-varying effect X*time, I am no longer sure about the value of the variable X in describing the risk of the outcome, as the main effect is now not significant.
Is the significance of time-varying effect of variable X enough to assume that the variable X is significant for the outcome risk, even though the main effect is no longer significant in such scenario?
Or, do both of them, the main effect of X and the time-varying effect of X have to be significant in the model to be able to say that X is significant for the outcome?
Any help in interpreting these is very welcome.
Relevant answer
Answer
Thanks David,
I ran a Kaplan-Meier: the two lines cross early in the study, and only then we can see a quite ood separation. If I understand well, perhaps it isn't surprising that the variable has a time-varying effect (increasing with time).
I also inspected the proportional hazard (PH) assumption for this variable using the cox.zph function in survival R package - the red horizontal line represents the averaged coefficient (beta) as in the Cox model (model that operates under assumption of PH), while the thin, black line is the "real" beta for the variable. The test for proportional hazard for variable X did not indicate significant departure from PH (p = 0.079, PH is violated when p < 0.05), but the added time-varying effect is significant, which is reflected by the Kaplan-Meier and the plotted coefficient over time.
Regarding the interpretation of significance of the terms in the model, I received some advise from a biostatistician: in a model where a time-varying effect is included as an interaction term (as opposed to splitting the time and calculating HR for time intervals), the main effect represents the HR where the interaction term is equal to 0 (if it's a simple X*time interaction, the inyteraction is equal 0 when time=0; it might differ in situations where there is a time transformation, for example log(time), etc.). Bottom line, even if the main effect isn't significant in the model, the (time-varying) effect of the variable is still interesting.
  • asked a question related to Cox Regression
Question
2 answers
I have performed a Cox regression model on patients' overall survival for both univariate and multivariate. A variable I interested most is only significant in univariate but not in multivariate. 1) How can I explain this result in my article? This variable is also significant in log-rank test.
2) Is there any other methods for me to try to make it more valuable? I am using R software.
Thanks a lot.
Relevant answer
Answer
Just as for any other regression model: that some of the co-variables provide similar information about the response as your variable of interest. After considering the information from the covariables, the sample size is not large enough to conclude the direction of its (co-variable adjusted) effect.
  • asked a question related to Cox Regression
Question
4 answers
Dear Colleagues
I hope all is well.
I am interested in doing penalized cox regression using PYTHON.
I came across the following way to do penalized cox regression:
Penalized Cox Models — scikit-survival 0.17.1
The code blocks 10 till 13 are doing the following:
Choosing the penalty strength alpha by :calculating the best value of penalty strength among all the values of penalty strength (alpha) by reducing the value by 1% in each iteration
then it perform 5 fold cross-validation to estimate the performance – in terms of concordance index – for each αα.
I am confused and dont fully understand the codeblock 13. I found that after codeblock 13, it executes the model that has the best alpha and calculate the coefficients. Is this right?
I wonder in codeblock 13, the how was the model executed? is it by doing gridsearch?
What is the name or type of this model? Is it elastic net regression?
  • asked a question related to Cox Regression
Question
11 answers
I found in statistical books that to verify the linear assumption of a Cox model I need to plot Martingale residuals.
However, I cannot find any explanation about interpretation of the plot!
So, if I plot predicted values versus Martingale residuals what have I to expect if linearity is satisfied?
Thank you in advance... please help me!
Relevant answer
Answer
This has been useful to me too. Thanks Emilio
  • asked a question related to Cox Regression
Question
6 answers
Greetings,
We have been conducting a retrospective cohort study. The variables we are examining are assumed to be very age-dependent and the exposed population is very small (~40 patients), therefore we have considered matching for age and sex at a 1:2 or 1:3 ratio to increase statistical power and limit confounding.
Which statistical test would be most appropriate for calculating risk ratios for dichotomous categorical variables?
This article ( https://academic.oup.com/epirev/article/25/1/43/718675 ) suggests conditional Poisson regression, which I have attempted in Stata, but it appears to work only for 1:1 matched pairs.
It also suggests an adjustment of Cox regression so as to yield the same results as conditional Poisson regression (" if the time to death or censoring is set to some arbitrary constant value and if the Breslow or Efron methods are used to account for tied survival times, the results will be the same as those from conditional Poisson regression, as the likelihoods for these methods are identical when the data come only from matched pairs ").
I have recently attempted a similar adjustment (as described here: https://www.ibm.com/support/pages/conditional-logistic-regression-using-coxreg ) to yield the same results as conditional logistic regression (odds ratio) for a 1:N matched case-control study using Cox regression.
If such an adjustment is possible, how exactly could it be implemented in SPSS? If not, what other alternatives are available to us in this juncture?
Thank you in advance.
Relevant answer
Answer
Maybe not useful for the original question anymore, but still good to know: a very detailed exposition on how to perform a matched cohort analysis is present in Kleinbaum's Logistic Regression 3rd edition. It gives the specifics on how to do it in SPSS using the Cox regression module in the Appendix. I used it some years ago, and it works (obviously). Hope this helps.
  • asked a question related to Cox Regression
Question
147 answers
Example 1: i want to test if Diabetes is a predictor of myocardial infarction. The result is this:
Covariate    b         SE        Wald       P      Exp(b) 95% CI of Exp(b)
Diabetes 1,1624 0,3164 13,4996 0,0002 3,1976 1,7254 to 5,9257
How can i interpret this result? the p is less than 0,05 but i don't understand if it is in favor of patients with diabetes or without diabetes.
Example 2: And with continuos variables, for example:
Covariate      b       SE        Wald        P       Exp(b) 95% CI of Exp(b)
RVD      -1,0549 0,1800 34,3351 <0,0001 0,3482 0,2451 to 0,4947
how can interpret the results?
Could someone help me, please?
Relevant answer
Answer
Thanks for gracious participation respected Dr deepay Bause
  • asked a question related to Cox Regression
Question
3 answers
Dear All,
I hope all is well.
I am trying to do a penalized cox regression analysis in Python.
However, in order for the function to work, the "event" and "time"
need to be stored as a structured array
similar to:
array([( True, 72.), ( True, 411.), ( True, 228.),.........,dtype=[('event', '?'), ('time', '<f8')])
I am struggling to do a structured array of the event and time.
I wonder if you guys can help me with that?
Thank you very much
Relevant answer
Answer
Python is extremely efficient in handling any arbitrarily complex structures of data (matrices, vectors, arrays of any dimensions).
In Python you can have lists of dictionaries, dictionaries of lists, dictionaries of dictionaries and lists of lists, and so on.
I am not sure what you're trying to do, but to track time (duration actually), you can use the module time (import time).
Here are some examples of a list of list:
L=[ [1,2,3], ['a',b','c'], [5,6,-10], {"Monday":2}]
As you can see, a list is simply a set, with elements that can be anything. The last element of this list is a dictionary with a single element. Dictionaries are useful because they are the data structure used by Json (JavaScript output notation) files.
Here's a longer dictionary:
d = {}
d['Monday']=1
d[Tuesday']=2 =>
d= {'Monday':1 , 'Tuesday':2}
A dictionary is nothing but a function (better yet, a relation in mathematics, like a Venn diagram).
Hope that helps on your first steps.
  • asked a question related to Cox Regression
Question
4 answers
Hello!
Has anyone tried plotting survival probabilities for a cox regression using Firth's penalized likelihood? I am using the coxphf() function in R. Any advice would be appreciated!
Relevant answer
Answer
Hi David,
Thanks for this! Unfortunately, in the CRAN manual, they only provide an example of how to plot the penalized likelihood as compared to the wald.
I am interested in creating a graphic that illustrates survival probabilities over time (like you would for a regular cox ph regression). I would also love to figure out how to write code that predicts survival probabilities for new observations (i.e. at specified time points) based on the model. So, for example, I would like to pull the predicted survival for a specific covariate at 1 year, 2 years, etc. Does this make sense? Can you help? Any advice would be appreciated!
  • asked a question related to Cox Regression
Question
1 answer
Hello
I have 10,000 data related to Covid patients who have 2,000 blood pressure and 8,000 do not have high blood pressure.
My main goal is to study the risk factors for death in patients with high blood pressure with Covid 19. On the other hand, I want to find the risk factors in patients who do not have high blood pressure and compare it with the group that has high blood pressure. As in the table below.
I want to make two separate coxs for both data. Do you think this comparison are correct? Do I need to make adjustments to compare the final risk factors so that I can make the right comparison?
(I can not match here.)
Relevant answer
Answer
A couple of questions :
First – I presume that you know the status of each person, that is whether or not they died. With relatively small death rates you might think of Poisson regression (with robust standard errors) or negative binomial regression (ditto).
Second, do you want to see if hypertension modifies the effect of other risk factors? If so, you don't want separate regressions. You want to use an interaction term.
Third : hypertension is a tricky variable because it takes a continuous variable like BP and divides it into two categories, while the underlying effect on risk is, of course, continuous. Have you got BP measurements? Why not start with looking at them before you prematurely lose all that variation by collapsing into categories.
Fourth : People with previously detected hypertension are a problematic group made up of treated and untreated hypertensives and of those treated there will be those with well and poorly controlled BP. Given the interest in the effects of statins and metformin on the course of Covid, the nature of the hypertensive medication is also of interest. As you well know, there are several types of drug here.
Finally, I hope you are not planning on a data-dredging exercise like the one you showed above. It looks horribly as if someone did a stepwise regression.
Please tell us a little more. It's an interesting area but one that I think needs a certain amount of thought to define what question(s) the analysis should answer. Once those are clear, the methodology should be easier to work out.
  • asked a question related to Cox Regression
Question
2 answers
The main independent variable is A, but I want to adjust the model with a covariate B. When I check the PH assumption, A holds but B does not. Do I need to run a Proportional odds or an AFT model for that?
Relevant answer
Answer
Here are some notes you may find useful:
The output was generated using Stata, in case you're wondering.
  • asked a question related to Cox Regression
Question
5 answers
There is a problem I faced computing multivariable risk factor analysis using Cox regression in SPSS. I have 5 variables that are significant in univariate Cox regression with time-dependent variable. I do not know how can I run multivariable analysis using these 5 time-dependent variables in a multivariable Cox regression simultaneously.
Huge thanks in advance for solutions and advices.
Relevant answer
Answer
I would recommend same as Prof. David. Check the latest manual for SPSS, and on multivariate Cox regression. "Multivariate analysis, using the technique of Cox regression, is applied when there are multiple, potentially interacting covariates. While the log-rank test and Kaplan-Meier plots require categorical variables, Cox regression works with continuous variables.", and if this looks like what you wanted then use the help menu for spss: see https://www.ibm.com/support/pages/cox-regression-statistical-tutorial-spss
Perhaps, YouTube.
  • asked a question related to Cox Regression
Question
12 answers
Hi everyone, I would like to run a Cox regression progressively including potential confounding factors in the models (Model 0: no confounding factors; Model 1: 1 confounding factor; Model 2: 2 confounding factors; ...)
Since I never did it on my own, I am wondering if you could suggest a practical statistics software for this purpose.
PS I usually use Stata or GraphPad Prism.
Thank you for your collaboration and time.
Relevant answer
Answer
I have the answer : the approximation methods are different!
"When there are failure time ties (note that censor ties are not a problem), the exact likelihood is very cumbersome.
NCSS allows you to select either the approximation proposed by Breslow (1974) or the approximation given by
Efron (1977). Breslow’s approximation was used by the first Cox regression programs, but Efron’s approximation
provides results that are usually closer to the results given by the exact algorithm and it is now the preferred approximation (see for example Homer and Lemeshow (1999)."
  • asked a question related to Cox Regression
Question
1 answer
I am getting this error"flat region resulting in a missing likelihood r(430)"while running cox regression in STATA. I have tried different settings for changing ties but I still get the same error. How can I fix this error using STATA. Any help will be useful. Thanks in advance!
Relevant answer
Answer
try to use various method of ties(e.g., Efron, exactp) or can use other flexible survival models(e.g., Weibull, exponential)or do a Poisson regression
  • asked a question related to Cox Regression
Question
3 answers
Hello advisors. I am glad to be here in this Community. If you allow me, I would ask if anyone of the members here know about Databases of cutting tools. Nowadays, I am working on a project and the main goal is to research about lifetime of cutting tools using the Cox proportional hazards model. At the moment, I just have one Database that contains cutting speed, feed rate, tool failure time and depth of cut. My research question is: Which of these variables are the most representative that give us the time of failure? Any kind of help, comments and questions are welcome.
Relevant answer
Answer
The attached paper solves a problem similar to yours. I would suggest that you get a copy of Jared Lander, R for everyone available from the z-library. If you need programs,
or have questions, please contact me. Best wishes, David Booth
  • asked a question related to Cox Regression
Question
3 answers
Dear Colleagues,
I hope all is well. I am sorry for very Naive question as I am very new to using R package.
I have imported a dataset from STATA to R package, installed the package for survival as follows:
library(haven)
data <- read_dta("C:/Users/user/Desktop/data.dta")
View(data)
install.packages(c("survival", "survminer"))
library (caret)
library (glmnet)
library (mlbench)
library (psych)
library("survival")
library("survminer")
I changed some of the varaibles in my dataset so to be recognised by R as factor variables (some has 2 levels, some has more than 2 levels)The factor variables are: ethnicity, sex, diabetes, HLA mismatches, steroids, donor type..
Examples:
data$DIAB=as.factor(data$DIAB)
data$sex=as.factor(data$sex)
data$ETHCAT=as.factor(data$ETHCAT)
I managed to define my time and event varaiables as follows:
time=data$finaltime
event=data$GSTATUS_DTHCNS_KI
I combined the time and event in one matrix, also combined all the independent variables in another matrix
as follows:
Y=cbind ( time, event)
X=cbind (data$sex, data$BMI_TCR, data$COLD_ISCH_KI, data$SERUM_CREAT, data$finalpra, data$AGE, data$STEROIDS_MAINT, data$induction, data$DIAB, data$dgf, data$timeondialysis, data$ETHCAT, data$KDPI, data$finalcmv)
I ran a cox regression model using the following syntax:
coxph<-coxph(Surv(time,event) ~X, method="breslow")
However, I found that R treated all the variables as continuous numeric variables. (even the ones I identified as factor variables !). I checked on these variables again and found that R recognise them as factor variables !
I wonder if any of you guys can help me how to fix this problem.
The second question:
I tried to fit a glmnet model , however I got error:
> Fit=glmnet (X,Y, family=”cox”)
Error: unexpected input in "Fit=glmnet (X,Y, family=”"
I am not sure why it is giving me error. I wonder if anyone can help me with this
Thank you very much
Kind regards
Relevant answer
Answer
Dear Prof Booth and everyone,
I hope all is well
I have improved my syntax alot, Howeverm still getting an error when trying to fit a glmnet
I have downloaded R 4.1.
The following is what happened:
setwd("C:/Users/user/Desktop/R")
library(haven)
install.packages("caret")
install.packages("ggplot2")
library (caret)
install.packages("glmnet", repos = "https://cran.us.r-project.org")
library (glmnet)
install.packages("mlbench")
library (mlbench)
install.packages(c("psych"))
library (psych)
library (survival)
install.packages("survminer")
library (survminer)
install.packages("BiocManager")
library (Biobase)
data <- read_dta("data for Farzan.dta")
str(data)
data$DIAB=as.factor(data$DIAB)
data$sex=as.factor(data$sex)
data$ETHCAT=as.factor(data$ETHCAT)
data$AMIS=as.factor(data$AMIS)
data$BMIS=as.factor(data$BMIS)
data$DRMIS=as.factor(data$DRMIS)
data$STEROIDS_MAINT=as.factor(data$STEROIDS_MAINT)
data$donortype=as.factor(data$donortype)
data$dgf=as.factor(data$dgf)
data$induction=as.factor(data$induction)
data$finalcmv=as.factor(data$finalcmv)
data$finalpra=as.factor(data$finalpra)
summary(data)
time= (data$finaltime)
event=data$GSTATUS_DTHCNS_KI
Y=cbind ( time, event)
X=cbind (sex = data$sex, BMI_TCR=data$BMI_TCR, COLD_ISCH_KI=data$COLD_ISCH_KI, SERUM_CREAT=data$SERUM_CREAT,
finalpra = data$finalpra, AGE = data$AGE, STEROIDS_MAINT = data$STEROIDS_MAINT, induction = data$induction,
DIAB = data$DIAB, dgf = data$dgf, timeondialysis=data$timeondialysis, ETHCAT=data$ETHCAT,
KDPI =data$KDPI, finalcmv =data$finalcmv)
fit <- glmnet(y = Surv(time,event), x = t(exprs(X)), family = "cox")
The error I got is:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': unable to find an inherited method for function ‘exprs’ for signature ‘"matrix"’
I wonder if you can help me with this and how to overcome this error
Thank you very much
Looking forward to hearing back from you
  • asked a question related to Cox Regression
Question
4 answers
I would like to identify the predictors of seizure recurrence after a first seizure. Should I use a Cox regression model yielding hazard ratios or should I use a log-binomial regression yielding risk ratios?
From what I understand, I shouldn't use logistic regression because it yields odds ratios, which often overestimate risk ratios.
Relevant answer
Answer
This comes down to whether the time-to-event component is clinically relevant or not. If it is, then you should do Cox regression, and otherwise logistic or as you suggest log-binomial.
For rare events, odds and risk ratios will be almost identical. For more common events, odds ratios will be higher but it isn't a case that they are 'overestimating' the likelihood, they are just a different effect measure. There is nothing wrong with using logistic regression in this instance either.
  • asked a question related to Cox Regression
Question
1 answer
Hi there. I'm a medical student and now I'm working on survival prediction models. But I encountered some difficult problems about feature selection. The original sequencing data was huge, so that I relied on univariate COX regression to obtain a subset (subset A), and I'd like to perform Lasso regression to further select features for the final survival prediction model construction (using multi-variate COX regression). However, the subset A was still huge than I expected. Can I further obtain subsets by limiting the range of Hazard Ratio (HR) for Lasso regression? Or, could I perform Random Survival Forest to obtain subsets from subset A, for final survival prediction model construction? Is there anything I need to pay special attention to during these processes?
Relevant answer
Answer
apologies if I’ve misunderstood the questio. I’m not sure what you mean by using a univariate Cox model to create a subset? Did you simply look for features that were significant on their own? If so I’m not sure that’s appropriate…
in terms of Lasso regression, why do you need to subset the features before hand? Why not just fit A lasso regularised cox model with all features?
  • asked a question related to Cox Regression
Question
3 answers
I'm using cox regression model to study the association of variable A with coronary heart disease. One of my covariates (variable B) has a clear non-linear relationship with predictor (Variable A), assessed by scatter plots; linear regression. My question is that does this cause any violation if the Schoenfeld residuals of the cox-model are fine? I'm using R.
Relevant answer
Answer
Therneau and Grambsch use a simple approach based on martingale residuals for assessing the functional form. Plot the martingale residuals from an intercept-only model (in R: resid(coxph(Surv(t,d)~1))) against the predictor values. Adding a LOESS curve is helpful in determining the functional form. Heavy censoring and collinearity can affect the usefulness of this approach. There is an example of how to do this in Therneau & Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer.
  • asked a question related to Cox Regression
Question
1 answer
Hello,
I am trying to show if there is any relationship between the different variables (weight, age, risk score) and freedom of re-intervention after a stent placement. Freedom of reinterevntion is a time depending value, so the more I think about it I think I should use cox's regression model. But on the other hand I am thinking weight/age/risk score and freedom of re-intervention are both continuous values so if I want to show a correlation between the two I should use the pearson/spearman? What do you think? Also if I am reporting the cox model, should I just report the hazard values and the p-values or may be more?
Thank you in Advance
Relevant answer
Answer
You might find the attached google search to be of some interest should you wish to use Cox regression. Best wishes, David Booth
  • asked a question related to Cox Regression
Question
8 answers
I'm using a cox proportional hazards regression to analyze latency data, i.e. time until behavior x occurs. The model I'm running is fitted with the "coxme" package in R (which is a wrapper to the "survival" package if I'm not mistaken) because I wanted to fit a mixed model for the survival analysis. The model converges without problems, however, when I'm trying to test the assumptions of the model I get a fatal error, and my R session crashes. Specifically, when I try to test the "proportional hazard" assumption of the model using the "cox.zph" call, the R session crashes and I don't know why, because the function is supposed to work with both a mixed model (coxme) and a standard, non-mixed, model (which is a "coxph" object in the package terminology). I've tried the non-mixed version of my model and it provides the desired output, but it won't work for my intended model. I've also tried updating RStudio, so I have the latest version, but it didn't help. Finally, I've tried to manually increase the working memory dedicated to RStudio, in case the function was memory demanding, but it didn't help. Looking around at different forums has provided no answers either, both with general search parameters like "causes for R session crash" and more specific, like "cox.zph cause R session crash", but I could not find any help.
Has anyone experienced this error? Were you able to solve it, and if so, how?
I appreciate any advise I can get on this issue.
Relevant answer
Answer
I am not familiar with that package but you may wish to look at a similar process In our attached paper. Best wishes, David Booth
  • asked a question related to Cox Regression
Question
6 answers
Dear Collegaues,
I hope all is well.
Thank you very much for your help
I am used to STATA and very new to using R package.
STATA has lasso inference for linear and logistic regression. However, it doesnt have LASSO features for cox regression.
I wonder if I can use R to do LASSO inference for cox regression model?
I am literally very new to R and would appreciate if you can help me do syntax in R for my model.
I am sorry that I am very Naive in R
If I am using STATA, I would do the following to produce the cox model:
1)stset PTIME, failure(PSTATUS)
2)stcox i.sex BMI_TCR COLD_ISCH_KI SERUM_CREAT END_CPRA i.ETHCAT AGE AMIS BMIS i.STEROIDS_MAINT AGE_DON i.DIAB i.dgf
3)estat phtest (to test proportional hazard assumptions)
I wonder what is the syntax to do the same in R ?
Also, what are the syntaxes to use this model to perform LASSO inference for cox harazrd regression?
Finally how to do the post-estimation tests after fitting in the LASSO inference for cox harard regression?
Thank you vey much for your help
Looking forward to hearing back from you
  • asked a question related to Cox Regression
Question
8 answers
Dear research gate members, would you please let me know the specific assumptions and conditions to use these phrases?
Relevant answer
Answer
Kaplan–Meier provides a method for estimating the survival curve, the log rank test provides a statistical comparison of two groups, and Cox's proportional hazards model allows additional covariates to be included. Both of the latter two methods assume that the hazard ratio comparing two groups is constant over time. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065034/
  • asked a question related to Cox Regression
Question
3 answers
Hello,
I built cox proportional hazard model with time-dependent covariates. when I predict survival probability, it wasn't monotonically decreasing for each ID. what is problem and how to handle it?
thanks
Relevant answer
Answer
As professor David Eugene Booth said, use the plots and check the presumptions of the Cox model about your data.
  • asked a question related to Cox Regression
Question
3 answers
How to add all variables of ONE group (as categorical variables ) in Cox regression analysis without losing a single group. SPSS considers the first group as a reference/ control variable , because of this out 5 variables hazard ratio I am getting only 4 categorical variables answered.
Please suggest ?
Relevant answer
Answer
If you want each group separately, I would create dummy variables for each.
  • asked a question related to Cox Regression
Question
4 answers
Hi everyone!
I am wondering whether the prentice-weighted cox regression I need to do for a specific study design (case-cohort) can be done within the framework of multilevel modelling?
Is there any R packages for mixed-effects Cox model with prentice-weighting?
Relevant answer
Answer
Hello,
I am not very clear about what you mean by multilevel modelling. If you are concerned on variables that may behave different depending on factors such as locations, sex or other conditions you could use strata() with coxphw, as suggested by Chellai Fatih , since it provides " estimation of average hazard ratios (Schemper et al., 2009) using Prentice weights with censoring correction and robust variance estimation; " with the default template "AHR".
But If you are refering to a study where the selection of sub-cohorts is stratified according to some criteria I recomend you to have a look at cchs package:
The package also comes with a dataset cchsData where the subcohort selection is stratified based on another variable, in this case, the result of an histological examination.
If neither of these approaches satisfies you, you could subset the data in the diferent levels and do a regression for each one sepparately and then a meta-analisys to get a overall HR but I would not recomend it.
  • asked a question related to Cox Regression
Question
1 answer
Dear all,
I am working on gene expression and Kaplan Meier curve dividing the patients in " high" and "low" using SPSS, then I want to do the cox proportional hazard analysis combining i.e the mutational status of one gene. I am naive in using SPSS Software How I can set up the analysis in order to find the Hazard ratio of specific combinations i.e X gene(high) and Y (gene) "mut" or "Wt".
Relevant answer
Answer
Good Morning,
I've never used SPSS and the last time I used SAS for survival analysis was 20 years ago when I was still in University. It is relatively simple to set up a Kaplan Meier analysis in R using the Survival package. Granted, you have to learn a bit of code, but there are lots of resources available online to walk you through it. On the positive side, once you get your code working, you can re-use it, and, while it is more work at the beginning, you are forced to think about and understand your data and each step in the analysis, which leads to a better understanding of the outcomes. A good introduction tutorial on survival analysis using R can be found here: https://www.emilyzabor.com/tutorials/survival_analysis_in_r_tutorial.html
Good Luck!
  • asked a question related to Cox Regression
Question
7 answers
Is it possible to merge effect sizes expressed as hazard ratio and odds ratio in the same dataset? Of course, the construct underlying the outcome of interest is the same, but in some articles was explored with logistic regression and in other articles was explored with Cox regression.
Relevant answer
Answer
I think formalists would be hesitant to merge ORs and HRs. Yes, they are both relative measures of effect, but ORs are estimations at a specific time point while HRs are averages of time-dependent effects over the follow-up. Logistic and cox regression modelling are based on different assumptions and merging the outcomes will introduce uncertainty about how the results should be interpreted.
One way to handle it is to present results of pooled ORs and HRs separately and then reason about their similarity/differences in the discussion section in your manuscript.
  • asked a question related to Cox Regression
Question
3 answers
Hello everybody.
I am facing problem in selecting reference category of independent variable (categorical) in cox regression analysis using SPSS. There are option to select first or last category as reference. However, when I want to select first category as reference category, it ultimately become last one as reference category. I am not sure whether the last category is set as reference category by default in SPSS. I want to select the first category as reference. Would anyone help to solve this problem?. Regards
Relevant answer
  • asked a question related to Cox Regression
Question
3 answers
We are trying to find if there is an association between postoperative pulmonary complications (PPCs) and overall survival in a cohort of 417 patients. In Kaplan-Meier there is a significant difference in overall survival between patients with and without PPCs. After testing the proportional hazards assumption in cox regression (both through visual analysis and through log minus log plot) we found that our data failed to meet the assumptions. The way I interpret this, it means that the hazard of dying due to a postoperative pulmonary complication varies over time? I'm trying to figure out how to perform a survival analysis now that I can't use the standard cox regression.
From what I understand I could use time dependent coefficients (the same as variables in this example?) but I don't really understand what is meant by that or how I would do it in SPSS. Does it mean I turn my PPCs variable into a time dependent variable and then run the cox regression analysis the way I would if my data would have met the assumptions or how do I do it?
I would be really thankful for guidance or corrections if I have misunderstood something! I'm a PhD student and I don't have much experience in statistics so explain it to me like I'm 5 (a smart 5-year-old!)
Relevant answer
Answer
Dear Olivia Sand . The attached article provides step-by-step instructions on how to run Cox regression analysis in SPSS and I believe it would be a perfect fit for your needs. Please check it out!
  • asked a question related to Cox Regression
Question
4 answers
We are interested in calculating the readmission rate after 12 months, and uncertain how much data to include in our analysis. Since it is a 12 months readmission rate that we are interested in, should we then include data 12 months after each hospitalisation, so they have a chance to get a readmitted within 12 months?
Relevant answer
Answer
I agree with @Babak Sara I in that what you include in your research question determines what is in your model. D. Booth
  • asked a question related to Cox Regression
Question
3 answers
Hello everyone
I am using the patients' survival times in an open cohort study of cancer patients as a measure of the life expectancy
I would like to know the most important predictors for life expectancy considering clinical covariates and genomic covariates. I did a genomic analysis of my patients and I have all genes up-regulated and down-regulated before and after treatment.
Does anyone know what the best fit model in this case considering Overal Survival and the Event? Both target, not just one.
Relevant answer
Answer
Thank you David Eugene Booth Is it possible to have the R script you used in that papers? I couldn't find them online. Thank you in advance.
  • asked a question related to Cox Regression
Question
3 answers
DEAR members of RG, I need your kindly statistical suggestion!
I was aiming to run Cox-regression, but I doubt the possibility given that the event to sample size ratio is <1:10. In my case, the events are 11, while sample size is 130. My question is, is it possible to run cox-regression under such circumstance?
Waiting for your statistical suggestion!
Thank you in advance.
#KevlynJones#AdrianEsterman#RogerJelliffe#SafaAounti
Relevant answer
Answer
For cox regression I don't know. But you can below choices instead.
Zero inflated Poisson
Negative binomial GLM
Firth Logistic
  • asked a question related to Cox Regression
Question
3 answers
I have data from an experiment where we looked at the time-to-death of pseudoscorpions under three treatment conditions: control, heat, and submersion underwater. Both the control and heat groups were checked daily and those that survived were right-censored. However, the pseudoscorpions in the underwater group were unresponsive while submerged and would only become active again when they were removed from the water and allowed to dry. Therefore, we removed 10 individuals per day and checked how many of the 10 were dead. All individuals removed from the water on a given day were also removed from the study on that day i.e. we did not re-submerge individuals that were observed to be alive after they were removed from the water the first time. The underwater group is therefore left-censored. We have run Kaplan-Meier curves for all three groups and the underwater group has a much steeper curve and non-overlapping 95% confidence intervals compared to the control and heat groups.
Is there a way to further analyze all three groups in one model given that one level of the treatment is left-censored and the other two levels are both right-censored? Can a Cox regression be run on the left-censored group by itself to produce a hazard rate with 95% CI for the rate? I am a biologist so try to make your answer intelligible to a statistical novice.
Relevant answer
Answer
In Stata you can perform a simultaneous right and left censoring.
For Poisson regression:
cpoisson outcome independent1 independent2 independent3, ll(?) ul(?)
For Linear regression:
tobit outcome independent1 independent2 independent3, ll(?) ul(?)
ll --> is the lower limit for left censor (in number)
ul --> is the lower upper for left censor (in number)
For cox regression I have never used, please check:
  • asked a question related to Cox Regression
Question
7 answers
SPSS, Data-anaylsis
Relevant answer
Answer
  • asked a question related to Cox Regression
Question
4 answers
I fitted a Cox proportional hazard model and checked the proportionality assumption using R's cox.zph function, with the following output:
chisq df p
Var1 0.0324 1 0.857
Var2 0.1972 1 0.657
log(var3) 4.1552 1 0.042
Var4 4.6903 1 0.030
Var5 0.6472 1 0.421
Var6 1.2257 1 0.268
Var7 4.9311 1 0.026
Var8 0.3684 1 0.544
Var9 2.0905 1 0.148
Var10 0.0319 1 0.858
Var11 4.0771 1 0.043
GLOBAL 14.2625 11 0.219
In the study, Variable 1 and 2 are the ones I'm actually interested in, whereas the others are only controls. As you can see, the PH assumptions seems to hold for these two covariates, but most prominently not for Var3. Can I still interpret my findings on Var1 and 2 and talking about Var3 add, that this is displays the average effect over the study-time? Or will my coefficients for Var1 and 2 be biased/incorrect?
Thanks in advance!
Relevant answer
Answer
If what you are interested in is the effect (hazard ratio) for var 1 and var 2 while adjusting for others (covariates) then you are fine. You don't need every covariate to also fulfill the assumption.
  • asked a question related to Cox Regression
Question
6 answers
I fitted a Cox PH Model and upon examination of the Schoenfeld residuals it became apparent that the proportionality assumption was violated by several variables. Following Box-Steffensmeier & Jones (2004) I included interactions with time for these covariates. However I'm not sure how to interpret this. So its obvious that this indicates time-dependency of the covariate in the sense that the effect inflates/deflates over some function of time, but I work with sociological data and my theory indicates no time-dependency of the effects in whatever direction (Also it would not make sense in any way). So if I get that right I should therefore consider the time-dependency to come from some kind of unobserved heterogeneity? Due to the nature of the data I can also not implement frailty or fixed effects to account for this. So how do I interpret a coefficient that increases/decreases as time progresses given that the theory does not indicate this?
Relevant answer
Answer
Variables with time-varying effects and the Cox model: Some ... In such case, the interpretation of the models is conditional on the length of the survival time, and results should thus be ... https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-10-20
  • asked a question related to Cox Regression
Question
5 answers
I'm currently working with event history data studying how long after implementation countries abolish certain policies. Regarding the policies I also have an index on how far the countries went with their policies ranging from 0 to 100.
I wanted to control for this, in order be able to control for their point of departure. However the coefficient violates the proportionality assumption.
Can I stratify for the continuous variable of that index? I understand it so, that this would allow every country to have a different baseline hazard with respect to their point of departure. Playing around with the data this didn't produce an error.
Could anyone tell me if I can trust these results or if I have to categorize the variable first?
Relevant answer
Answer
The PH assumption relates to the entire model, including all predictors and covariables deemd interesting or relevant. "Restoring" PH by irgnoring or removing a covariable is not ok, as it likely demonstates that the inclusion of the covariable is relevant because it explains so much of the variance in the data that deviation from the PH assumption becomes clear.
If you have non-PH, you might start investigating if the effects of the predictors/covariables in the model are not linear. If this is not successful, an appropriate partition of the time axis might be the key (early effects are different from late effects, but the hazards are proportional within the early as well as within the late phases). If this also does not help, you might really think of stratification (what, by definition, is possible only for categorical variables). I don't consider it a good idea to categorize a continuous variable just to be able to use it for stratification. But if nothing else works, this might be a last rescue. But I would then check if the violation of the PH assumption really does more harm than the categorization of the continuous variable.
  • asked a question related to Cox Regression
Question
3 answers
I want to perform a GEE analysis with repeated-measured factors and survival analysis (like Cox regression but with repeated measured variables) in SPSS. How do I proceed with this dataset? I had transformed into long form format but I can't work properly.
Relevant answer
Answer
I’m not sure you can. I’ve used multilevel modeling for censored regression using brms in R which is the closest I’ve encountered.
  • asked a question related to Cox Regression
Question
4 answers
For instance, I have A treatment modality and I want to analyze overall and progression-free survival rates in 2 groups of patients : young and old ones. These two groups have two corresponding subgroups of patients according to disease-related factor. Is Cox-regression analysis appropriate in this case?
Relevant answer
Answer
It may be that you will need to use a competing risk analysis if any patients have died during the time of observation.
  • asked a question related to Cox Regression
Question
2 answers
I have two groups including 100 patient. In first group there is one endpoint , in second group there are two endpoints. However SPSS did not give results of CI and HRs. Is there a solution to get HR and CI results.
Relevant answer
Answer
Hi,
hazard ratios in SPSS are referred to as "Exp(B)", you will find them in the Cox Regression output table called "Variables in the equation". However, in order to obtain the confidence interval, you will have to select the "CI for Exp(B)" checkbox in the "Options..." menu.
  • asked a question related to Cox Regression
Question
4 answers
I have noticed that when using the proc phreg in SAS and the coxph in R in the same data, the model should be different in order to get the same results. In proc phreg I model the time with the censor variable (where censor=1 means the patient withdrew from the study) in order to obtain a HR. In R I use the same model but this time censor=1 means the patient experienced the event, in order to obtain the same HR. Is there a difference in the definition of the censor variable or the models are different?
Relevant answer
Answer
Between SAS and R, the difference lies in the definition of the censor, not in the model.
  • asked a question related to Cox Regression
Question
9 answers
Hello dear professors and colleagues,
I have a data set where I want to applied three methods: Logistic regression , linear regression and survival models, of couse each method focalise on an information part of the data set, my questions are:
  1. It's coherent to incorporat all in the same study?
  2. are they complementary methods ?
  3. After estimation, can we select the best method and if so, which criterion we should use?
Relevant answer
Answer
The Logistic Regression is appropriate for binary data, and Survival Analysis is appropriate for time to event data. There are similarities between the two and Logistic Regression can be used to analyze time-to-event data although it's not ideal.
  • asked a question related to Cox Regression
Question
3 answers
I'm doing survival analysis using cox regression model. to that end I have 2 different dataset, one for training and one for testing. once I apply cox regression on training dataset I'll have a set of coefficients each related to the corresponding feature and a C-index to report. when I want to test my model in the test dataset, should I use the same coefficients and extract the C-index or should I apply cox regression model again on the test dataset and extract the C-index??
Relevant answer
Answer
No that equation is the definition of everything. The calculation of estimates is by partial liklihood so you need a computer. See the link I gave you. R is opensource freeware. Here is a link that helps you with a manual.that should explain R, just about everything. http://sgpwe.izt.uam.mx/files/users/uami/gma/R_for_dummies.pdf
I'm attaching a recent paper of ours that used a similar approach. Note the Kaplan-Meier plot. You must do one of these to interpret results.. You should have a Biostatistician at your University. That person can explain all of this if the readings I sent aren't helpful enough. If you have further questions, PLEASE ask on of us.
Best of luck, David Booth
  • asked a question related to Cox Regression
Question
6 answers
Hi, Very new to survival analysis here. I am now trying to correlate the gene expression level with survival and prognosis for patients with lung cancer, and I want to run a cox regression analysis on it. However most of the example I've encountered so far are based on discrete covariate such as sex and I know we can analyze continuous covariate using the coxph function, but I can't see how the actual plot would look like for continuous variable? For instance, for discrete variables you would have the number of regression lines correspond to the number of discrete variables. eg. for gender you'd have two lines on the graph. But what about continuous covariate? Should we first turn the continuous covariate into discrete by assigning quantiles to them? Or else I don't know how to visualize the graph. What are the pros and cons for doing so?
Thanks!
Relevant answer
Answer
I think that could be of your interest to perform survival analysis with determination of optimal cutpoint on continuous covariate. It does the survical analysis with calculation of hazard ratio etc dividing patients into two groups according to the most significant cutpoint chosen from continuous covariate like gene expression. Check this paper out and corresponding software, maybe it will fit your needs https://www.sciencedirect.com/science/article/pii/S0169260718312252
  • asked a question related to Cox Regression
Question
6 answers
Dear All,
Your advice is urgently needed!
I want to determine the predictors of death among women with breast cancer. My sample size is 437, and total with the event (failure)=127.
In the uni-variate Cox regression I found about 30 variables eligible (p-value<0.25) to multi-variable Cox regression.
Can I include all these many variables to the multi-variable Cox regression?
Regards,
Alem,
Relevant answer
Answer
I believe you can but your variable selection approach is not the best. See the attached for my recommendation. Best, David Booth
  • asked a question related to Cox Regression
Question
6 answers
Cox Model has the proportional hazard and the log-linearity assumptions that a data must satisfy. What if the data fails to satisfy the assumptions?
Relevant answer
Answer
Hi Chukwudi,
When the assumptions are not met then the model would be invalid and there'll be loss of power. Are your independent variables continuous? If so you may try to transform them into categorical variables using plausible cut-off values and then add to the model. For example instead of adding age as a continuous variable you might categorize it as age <65 and age>=65 then continue with the analysis. This might help with the PH assumption. As for the non-linearity, the same applies or you can do some transformations (like log, sqrt, etc.) on your independent variables and then run the analysis.
  • asked a question related to Cox Regression
Question
4 answers
Hi everyone, I have a question regarding the interpretation of the hazard ratio in a Cox regression analysis:
I'm analyzing in an oncological 2 factors with the influence on survival. Each of the two factors shows a significantly worse survival in the case of expression (1) than for patients where the factor is not detectable (0).
Now something happens in the cox regression that I do not understand. If I add a factor of both of them next to the usual variables (sex, age, tumor stage, lymph nodes) it is shown as an independent factor with an Exp(B) > 1 (p < 0.05). But if I insert factor A and B together into the cox model, Exp(B) is > 1 for factor A and < 1 for factor B. How can this happen? Is there a good explanation?
Thanks for the help.
#spss #cox regression #statistics #survival analysis
Relevant answer
Answer
Hi Florian,
After adding both factors (Factor A and B) together to the model are both factors still significant? If not so, maybe the two factors you added are correlated (the problem of multicollinearity). For example if you add N stage and TNM staging together to the model one of them might not be significant although independently they're both significant prognostic factors on survival.
  • asked a question related to Cox Regression
Question
4 answers
Hello! I’m conducting a meta-analysis of non-randomized comparative studies looking at length of hospital stay.
Some studies report hazard ratios (obtained from cox regression) while others report mean differences (obtained from linear regression).
Is there anyway these data can be pooled? e.g. by transforming these data into a common estimate?
Thanks a lot!
Relevant answer
Answer
Meta- analysis get its power from analysis of randomized clinical comparative trial so using non randomized clinical trial will increase the bias that affect your final results
  • asked a question related to Cox Regression
Question
1 answer
I have a cohort of 1948 cancer patients with overall survival data and germline genotyping that i have sub-grouped into different oncogenic molecular pathways. I wish to calculate the power to detect a SNP association at a significance threshold of 5e-8 for a GWAS using cox-proportional hazard regressions. How can i calculate this? what information do i need? and are there any simple to use packages available?
Relevant answer
Answer
simulation is the best way. write the python script and run the simulation analysis using cox-proportional hazard regressions. so that your p-value is less than 5e-8 for alfa persay 0.05.
  • asked a question related to Cox Regression
Question
3 answers
Be it logistic or survival analysis/cox regression, there is utility in determining cutoff points to categorise a continuos risk factor into various risk strata.
However there is a plethora or methods to go about defining the 'optimal' value depending on the nature of the model outcome. Generally they fall into methods that look at the model's sensitivity and specificity or methods that look to maximise the p value of significant difference between the survival curves of the resulting stratas.
Currently, i'm modeling the risk of body mass index on diagnosis of certain medical conditions through survival analysis to identify at risk individuals since the current bmi risk groupings prove inadequate to stratefy risk groups. For my purposes, i'm thinking of going with a method to maximise sensitivity with maintaining a certain threshold of specificity ,instead if going eith the standard procedure to maximise the youden index , since the cost of misclassifying in the model isn't a great deal.
The issue im facing is how to justify the specificity threshold chosen? Or is there a better method of determining the best cut off to use for catorising bmi risk groups. Also , how to determine the number of categories?
The implementation of this in R is also shrouded in difficulty and the packages i found are unclear in the innerworkings of the functions.
Anyone able to shed some light on this grey situation?
Relevant answer
Answer
I need to ask about what is minimal number of cases needed to perform multivariate cox analysis
  • asked a question related to Cox Regression
Question
6 answers
When reporting hazard ratios for Cox regression analysis, is it common to report the hazard ratio for the interaction term itself?
For example, I have a model with 3 terms:
a
b
a*b
Using hazard ratio statements in SAS 9.4, I get a hazard ratio for 1) a at the mean of b, and 2) b at the mean of a. My understanding is that these hazard ratios are hazard ratios for the main effect variable (variable a or b) while holding the interacting variable constant. Is my understanding correct?
Is it possible to get a hazard ratio for the interaction term?
Thank you
Relevant answer
Answer
Is it possible to calculate the CI for gender in that model by hand?
  • asked a question related to Cox Regression
Question
2 answers
I can find some articles describing R codes for drawing Nomogram after logistic regression or cox regression. But if in my logistic regression model, penalized maximum likelihood has to be used to resolve the failure of maximum likelihood estimate to converge, is it still possible to draw a Nomogram?
If the answer is yes, how to write R codes for this condition?
Thanks for any reply.
Joan
  • asked a question related to Cox Regression
Question
2 answers
I have generated various models with an identical core of fixed covariates with each model differing only in the study covariate (a categorical variable that I change in each model by altering the cutoff values).
i want to indirectly compare the study covariates by comparing the various models.
How do i do that? Is C-Statistic an option?
Relevant answer
Answer
Hello,
usually, We use Akaike's information criterion (AIC) to evaluate the goodness of fit,; so the model who give us the min value of AIC we choose it as best model; furthermore for effeciency we can deal with an AUC analysis.
Best luck
  • asked a question related to Cox Regression
Question
4 answers
Hello dears,
In regression modelling process, somtimes we deal to make a categorization of a continuos variable ( DVs or IDVs), What are really the potential problems inherent of such transformation, on:
  1. Estimation results
  2. Precesion and accuracy
  3. Hypothesis tests...
Thank you so much for any response and clarification
Relevant answer
Answer
Hi,
Please see these attached links:
Best
  • asked a question related to Cox Regression
Question
3 answers
Hello, everyone
I am doing cox proportional regression with a cohort of more 10000 participants. For the purpose of my study, I need to select a subgroup of 300 people in the cohort as reference group, and select another three subgroups (500-1000 participants each) to form a hierarchical four groups together with the reference group. After that, I will do a cox regression with this new viable (four groups) as independent variable.
My question is:
Is this right? If not, how can I do to achieve the research goal?
Thank you so much,
Xiaofan
Relevant answer
Answer
Well first you didn't include your research question. second statistics depends on having some type of probability sample or a population. You can't just grab a few when you feel like it. third I don't see a research question. What you do always depends on that. A better description of what you want to do is required for decent answer. Best, D. Booth
  • asked a question related to Cox Regression
Question
1 answer
I was reading about using the multivariate cox proportional hazards model at this website: http://www.sthda.com/english/wiki/cox-proportional-hazards-model, which uses the Survival package for cox regression. The summary of a cox regression object outputs a bunch of information about the model, including a concordance index.
Is all of the data used to train the cox regression model? If so, is the concordance index found on that same training data? Does this cause overfitting?
Relevant answer
Answer
Yes the entire dataset is used in model fitting as there is nothing to be tuned unless you are doing penalized Cox regression and need to tune the penalty.
  • asked a question related to Cox Regression
Question
5 answers
Hi! I am currently working on a statistical analysis on a prospective cohort population. We are looking at a specific protein concentration at baseline and relate this to a certain outcome. The outcome group have been matched with twice as many control individuals from the same cohort. The statistical analysis we are using is a cox regression. We've received HRs and p-values that show a significant correlation between this protein and outcome but here comes my question: we've gotten lower HRs and better (lower) p-values after adjustment for risk factors connected to this exact outcome which to us is surprising. With unadjusted model the HR is 0.693 (0.555-0.865, p=0.001) and with adjusted model 0.55 (0.405-0.747, p=0.0001). Does anyone have a clue how this came to be? We've looked at all risk factors in both case and control groups and they are as you would except. Higher smoking prevalence in case group for example (we're looking at CVD as outcome). From what I understand, the HRs should rise a bit and p-values become less strong when you adjust for risk factors (i.e only the protein concentration is taken into consideration). I'm using SPSS and here's my syntax (with protein, variables and status/time variables changed to just y, x and z respectivley:
COXREG z /STATUS=z /METHOD=ENTER y /PRINT=CI(95) /CRITERIA=PIN(.05) POUT(.10) ITERATE(20). COXREG z /STATUS=z /CONTRAST (x)=Indicator /CONTRAST (x)=Indicator /CONTRAST (x)=Indicator /CONTRAST (x)=Indicator /CONTRAST (x)=Indicator /METHOD=ENTER y x x x x x x x x x x x x /PRINT=CI(95) /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Grateful if somebody could help me wtih this one. Am I doing something wrong in SPSS? Am I doing things right but need to interperate it some other way? Have anyone heard about this phenomenae, i.e HRs/P-values going "the wrong way" after adjustment for risk factors ? Best regards, Filip
Relevant answer
Answer
I have a couple of questions: 1. what was the reason that you adjusted for risk factors and not the usual age, sex,etc.? 2. Why didn't you report a Kaplan-Meier plot? K-M plots are usually very helpful in understanding what is going on. I attached a paper that we have appearing this month that describes our ideas on these things. Best wishes, David Booth
  • asked a question related to Cox Regression
  • asked a question related to Cox Regression
Question
3 answers
HI!
My author asks me to regulate p-value!
But I don't think so.Multivariable Cox Regression already has some factors to limit.They are not multiple hypothesis tests.
It's first time for me to use this web.Through reading some question and answer,people are so sweet in there!!!!
THANK YOU!
Relevant answer
Answer
If the model has multiple coefficients and the tests of these coefficients are of interest, then there are multiple tests. Under the omnibus null hypothesis, the chance to get at least on significant coefficients increases with the number of coefficients in the model.
Note that coefficients for covariables that are not interesting but required to adjust the interesting coefficient(s) do not need to be considered for multiple testing correction.
  • asked a question related to Cox Regression
Question
1 answer
Hi, I am new to SPSS and trying to figure out if I am using the syntax correctly to run a multivariable cox regression survival analysis.
On univariate survival analysis, we tested 12 variables, eg A-L and 4 of them, A,B,C, and D, were statistically significant for worsened survival. We wish to include them a new model, so that the survival for the 12 variables are adjusted for A-D.
This is the syntax I am using to see survival for E, adjusted for A-D.
Please let me know if this makes sense! Thank you.
COXREG time variable
/STATUS=death variable(1)
/Contrast (E)
/METHOD=ENTER A B C D E
/PRINT=CI(95)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Relevant answer
Answer
Hi Taylor,
There is not much information about the time variable, covariates or the research setting (what is your sample size?) to start with, but I assume that time is a continuous measure (e.g. number of days/years until death), so that using the Cox model makes sense. Are the covariates continuous or categorical variables? At least E seems to be treated as a categorical variable as it used in a contrast-statement (you do not specify the type of the contrast, so SPSS will use the deviation contrast as a default). If the other covariates (A-D) are also categorical note that their relationship to survival is estimated in a linear fashion, so it might be a good idea to check that this assumption can be supported in your data. Alternatively you could use dummy variables instead. Also, did you check that covariates E-L - although not significant in the proportional hazards model - did not have non-proportional (a.k.a. time-dependent) effects? We found that this made a big difference when studying fractures in an elderly population. Did you check the form of the relationship between the predictors and survival over time? Most of these issues are more efficiently studied in the R programming environment (free), SAS or Stata (the latter both commencial) due to the fact that SPSS (still after all these years) provides skimpy diagnostics for Cox models. David Collett's "Modelling Survival Data in Medical Research" is an accessible source, if you need to compute the necessary residuals by yourself. Also, see Therneau & Grambsch's "Modeling Survival data: extending the Cox model" for these and other issues that may be relevant for your data (for this book you might consider consulting your statistician as well).
Your syntax looks to be technically correct for conducting proportional hazards regression using covariates A, B, C, D and E given that "death category = 1" is the event you need to model and the assumptions mentioned above are met, but this does not mean that this syntax is or isn't a productive way to proceed in light of the goals of your research. What makes sense for your analysis situation depends on the context of your research: what do you aim to show with your data (what are you research questions and what kind of conclusions do wish to make) and what is your actual data.
To me the main issue in your post seemed to concern selection of covariates into the models. What is your overall strategy for deciding what variables go into the models? You do covariate selection based on the univariate models, and you plan to conduct analyses for the following additional eight models:
A-D,E
A-D,F
:
A-D,L
Is this an intermediate step in your overall analysis plan or is it the final step? Do you plan to show the results from all model steps or just a selection of some (final) models? How do you plan to control for multiple testing?
This seems to be a form of stepwise modelling using the p-value as a variable selection criterion. Literature shows that generally stepwise model building tends to lead to highly data-driven results, so it might be useful (at least educational) to conduct a simulation to see how your conclusions are affected by this strategy. Another useful book to consider at this point might be Frank Harrell's Regression Modelling Strategies with applications to linear models, logistic and ordinal regression, and survival analysis. This book summarizes some of the problems related to stepwise strategies (you can easily google some more).
Regards,
Timo Törmäkangas, PhD
Senior researcher
Gerontology Research Center
University of Jyväskylä, Finland
  • asked a question related to Cox Regression
Question
2 answers
I would like to calculate under-5 mortality from data from a survey and it is difficult to find a coherent resource that gives a step by step guide on calculation under 5 mortality rate using cox regression model in SPSS. Anyone with any resource recommendations or perhaps ready to work through with me on this project? Cheers!
Relevant answer
Answer
Dear Emmanuel Nene Odjidja,
If you familiar with R, then you can try with R survival package for cox regression model. Please follow the link: https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/. I hope the above link would be helpful for you.
  • asked a question related to Cox Regression
Question
2 answers
Hi ,
i performed multivariate Cox regression analysis and one of the covariates had a p-value of 0.08. since it is a replication study and i have a hypothesis, I can have a directional hypothesis. Is it legitimate to divide the p-value by 2 to have a significant p-value?
Also, how do I report 95% CI in such a case?
Thanks
Relevant answer
Answer
First, I think you are confused. Let me recommend the references in the links.:
and
Now I notice that a) you didn't list your research question and associated hypotheses. b) Also I notice that 0.08/2=0.04 <0.05 but 0.08 > .05. I am thus suspicious that this is an attempt to get a p-value < 0.05 by any means possible. If so you are very close to dishonesty. In statistics We ask a question then collect data which either supports or does not support a chosen answer to our question. You do not seem to be doing that. The above books are about how to get answers to questions that you can TRUST. Is that REALLY your goal? D. Booth
  • asked a question related to Cox Regression
Question
4 answers
It will be great if I can get a book or video where application of cox regression along with its assumptions is lucidly explained in context for social science research.
Relevant answer
Answer
Thank you so much sir. I am sure these sources will be of great help for me.
  • asked a question related to Cox Regression
Question
3 answers
Dear everyone,
I am performing a cox proportional hazard regression on survival, in a sample in which almost everyone dies in the follow up period. I am far from an expert, and want to be sure to thoroughly check the PH assumption. I can get some help from a statistician, but it takes 2 weeks til i can get an appointment (I will!)
I found 2 methods for checking the PH assumption that i can easily perform in SPSS: visually I can inspect stratified log minus log plots (and scatterplots of residuals for continuous variables). My question is about the statistical method: checking if the product of time and my variable becomes significant in the cox regression (if yes, not fulfilling PH assumption). I havenoticed that it is quite common to first make an univariable cox regression for each covariate. I have been reading on the subject but see that different methods are used when it comes to check individual Time dependent covariates.
Some people check the product of time*variable (T_COV) univariable in the cox-regression, others put both the T_COV and the original variable in the cox regression (example for age: T*Age and age would both be taken into the cox-regression). In the second method, one does noet acquire univariable cox regression for the T_COV.
Why is it important to me? There is one T_COV variable that is not significant in univariable cox- regression, but becomes significant in the regression with only the T_COV and the original variable that it is a product of.
I hope you want to give me your thoughts on the topic. I am extra happy with reading recommendations/reliable sources!
If you have other remarks, questions, or if my methods are all wrong: please comment!
Relevant answer
Answer
Hi Josje,
Your question relates to time-dependent effects of covariates (such as age). Testing PH-assumption via T_COV tests only linear relationships over time, but the change over time may also follow other forms. A test is available that is based on scaled Schoenfeld residuals that test for violation of the PH-assumption. More importantly, these residuals can also be used to produce a plot that shows where in the time-axis the violation occurs and how problematic it is. As a practical example see Figure 1 in https://academic.oup.com/psychsocgerontology/article/67/6/765/612992. Violation of the PH assumption may also be an indication of non-linear effecy of the covariate, so you should check this also. There is useful guidance on these issues and several other assumptions of the Cox model in: Therneau & Grambsch (2000). Modeling Survival Data: Extending the Cox Model. New York: Springer.
Unfortunately SPSS is lagging in the diagnostics for the Cox model, so many of the useful features are not available in SPSS. Computing the residuals is possible via syntax, but it can be error-prone, and constructing the test can likewise be challenging. More advanced options are available in R (https://cran.r-project.org/). The book has good examples with code for R on how to perform diagnostics for your model.
As for your problem, you may need to discuss with your statistician about the approach to take for the analyses, because the useful solution is generally related to the study design, setting and data. Mostly in standard cox regression modeling scenarios I would use both the covariate and its time-interaction in the model.
Best regards,
Timo Törmäkangas
Gerontology Research Center
University of Jyväskylä
Finland
  • asked a question related to Cox Regression
Question
9 answers
I am using the complex sampling analysis method within SPSS. I would like to use the cox regression for my variable under complex sample, as my variable has a prevalence rate of greater than 10%, thus logistic regression should not be used. When using cox regression under the complex sampling analysis - is robust variance already controlled for?
Relevant answer
Answer
Thank you for your comment
In fact, since 2015, things changed about heteroskedastic dealing, and now it has become almost mandatory (so I think that it was why you was asked by peer reviewer). Happily, the last versions of SPSS integrate it in cox regression through sandwich estimators and, more important, HC in general linear models.
Hope it helps,
Kind regards,
  • asked a question related to Cox Regression
Question
3 answers
I have been performing a regression using ad dependent variable an ordinal variable. I later realized I had to use an ordinal regression instead. Nevertheless, I wondered what could be the consequences of that mistake for my analysis. Anyone can explain me that?
Relevant answer
Answer
Besides the mentioned loss in efficiency, I think you are loosing an easy interpretation of your model. The ordinal model parameters allow you to calculate the predicted height of your outcome value given your covariates. In the Multinomial model the coefficients allow you to derive the probability of being in an (unorderd) class of your dependent variable. This is much more difficult to interpret if you want to recover notion about the ordinal structure of your dependent variable.
  • asked a question related to Cox Regression
Question
2 answers
I want to perform cox survival analysis but I'm not sure how to handle situations where there is a threshold value.
To keep it vague but simple,
Lets say I have FactorA and I have a threshold of 10. I believe the risk of having an outcome in increased with having a value under 10. (ie low scores are bad)
In another case I have FactorB and values above 10 are believed to have higher risk.
How do I do cox survival analysis on the program spss?
Relevant answer
Answer
You have succeeded. This is so vague I have no idea what you are talking about.
D. Booth
  • asked a question related to Cox Regression
Question
3 answers
Hello everyone!
I am using cox regression in my search work. For a univariate cox model I got a log(p)= -inf. Can anyone explain why is the log(p)=-inf? Is it because the value of p is zero.
The variable that I am using is a categorical variable with values o and 1. I am using python lifelines library. The following are the results produced by the model.
Concordance = 0.53
Likelihood ratio test = 2373.89 on 1 df, log(p)=-inf
Relevant answer
Answer
If all the other requirements are met yes. Best, D. Booth
  • asked a question related to Cox Regression
Question
2 answers
Hello!
I want to detect whether various diagnosis groups (1,2,3,4) benefit differently from a treatment (1=absent, 2=present).
I have used SPSS and i chose amongst other things "diagnosis group", "treatment" and the interaction "diagnosis group" * "treatment" as predictive markers.
I have selected "last" as reference category and I chose "forward:Wald" as method.
As result I've got among other things these lines:
- Therapy*Diagnosis group Wald:15,368 df:3 significance:0,002
- Therapy*Diagnosis group(1) B: -,151 SE: ,399 Wald:,144 df:1 significance:,704 Exp(B):,860
- Therapy*Diagnosis group(2) B: ,979 SE:,345 Wald:8,064 df:1 significance:,005 Exp(B):2,662
- Therapy*Diagnosis group(3) B:1,084 SE:,371 Wald:8,522 df:1 significance:,004 Exp(B):2,955
I'm not sure how to interpret these results/interactions.
Does it mean that people without therapy who are in diagnosis group 2 have a 166% higher risk to die in the next year than people who received therapy and were in diagnosis group 4?
Does it mean that people without therapy who are in diagnosis group 3 and didn't receive therapy have a risk 195% higher to die in the next year that people in group 4 who did receive therapy?
I would be very grateful for help!
Greetings
Relevant answer
Answer
SPSS has terrible explanation s of these concepts. Look at the Cox regression material in the attached file. Best wishes, David Booth
  • asked a question related to Cox Regression
Question
3 answers
Instead of using elapsed time in survival analysis, I wanted to use cross-sectional data over a fixed period to examine increasing % poverty-to-income levels in relation to a hazard that is associated with lower incomes
Would it be applicable to use an increasing poverty level-dependent model for cox regression in lieu of time?
Relevant answer
Answer
Generalized Linear Model for Longitudinal data using GAMLSS is what are you look for.
  • asked a question related to Cox Regression
Question
11 answers
-Hello
I want to examine which factors are affecting the drug survival of infliximab and etanercept (i.e. the time until discontinuation of a drug). Is it correct if I use linear regression or cox regression is my only choice?
Thank you in advance
Relevant answer
Answer
A Cox proportional hazard model seems appropriate here because your dependent variable of interest being survival time of the drug. You would be able to obtain hazard ratio estimates of how various factors/covariates impact survival.
  • asked a question related to Cox Regression
Question
3 answers
I have a longitudinal retrospective data set of human medical records. They feature CONDITION and DRUG. There is no way of saying why a drug was prescribed other than observing the conditions/diseases present at the time.
I would like to know whether taking drug X has an outcome on a particular disease. The outcome will be the duration between repeated visits to the doctors. I have used a recurrent cox regression to classify whether a particular drug (as a covariant) is associated with a change in risk to the disease outcome.
The predictor/independent could be time to a particular reoccurring disease record (remember, this is recurrent so a little bit like migraine so the patient sees the doctor often) and the dependent/outcome variable would be some measure of the disease outcome. If I take e.g., 1250,000 patient records, align them so that the index date is defined by the particular drug of interest, I could be able to get a before-after effect.
I would appreciate any links, papers, tutorials, on an approach similar to what I am trying to do.
Thanks
Relevant answer
Answer
The study
J ohn Edmeads, Helen Findlay, Peter Tugwell et al.:
Impact of Migraine and Tension-Type Headache on Life-Style, Consulting Behaviour, and Medication Use: A Canadian Population Survey.
Can J Neurol Sci Volume 20, Issue 2 May 1993 , pp. 131-137.
hits Your question: "Alleviate migraine remedy doctor consultations?"
Why do You need so many records for a study of clinical impact according to drug studies?
  • asked a question related to Cox Regression
Question
3 answers
I am after some suggestions on what statistical analysis I can perform to show a before-and-after effect in a longitudinal electronic healthcare record (EHR). I have N number of EHRs, of varying sizes/time-spans. Each record has a history of recurrent disease records (for the one disease). To see whether a particular drug has had an effect on the disease outcome (duration before the next relapse), I have used time-gap recurrent cox regression.
However, I would now like to see whether the disease outcome (a series of remissions into relapses, good = long durations in between, bad = short durations in between) is immediately clear from the first prescription of a particular drug. In my head I imagine, taking all of the records (of vary time-span sizes -- very important to remember), and adjusting so every record overlaps when the drug of interest is first prescribed. Y axis is disease prevalence or risk, and x axis is time. From before the initial drug prescription event, disease prevalence/risk should be high, then after crossing the initial prescription time, disease prevalence/risk should drop. This would help demonstrate the efficacy of the drug.
Some points to remember: 1) Each medical record maybe unique in timespan. 2) The first prescription event of a particular drug will happen at different times across the record set. 3) Some records may have no medical events before the drug was prescribed (as all the diseases of interest feel after the drug prescription of interest). 4) The number of medical events either before or after the first prescription of the drug may be sparsely populated (making binning by time very difficult) or richly populated.
Is there a name for this kind of analysis? I am using R. Any suggestions are very welcome.
Relevant answer
Answer
Look like the data description fits into an interrupted time-series analysis
  • asked a question related to Cox Regression
Question
4 answers
I would appreciate a sanity check of whether I am using Cox PH regression in R correctly to analyse recurrent events. First of all, I am sorry for the formatting of this question, ResearchGate is a terrible platform for posing difficult and complex question. I have posted this question up on Stackoverflow:
My work has used the instructions proposed in "Modelling recurrent events: a tutorial for analysis in epidemiology." Leila DAF Amorim and Jianwen Cai, International Journal of Epidemiology, 2015, 324-333.
I have implemented the PWP-GT (Prentice, Williams and Peterson-gap time) version of Cox PH regression to determine the risk of headache in a longitudinal cohort of headache suffers. My time-to-event is the next headache diagnosis within 365 days, otherwise there is no recurrent event, and then finally lost to follow up.
tstart - start of gap
tstop - end of gap
status - a recurrent event (0), a non-recurrent event (1) which can also mean lost-to-followup
event - the event count per subject
codetype - a covariate to indicate a general medical diagnosis from a doctor (m) from a referral diagnosis by a specialist (r).
id tstart tstop status event gender patientIMD codetype age
20001 0 4101 1 1 2 1 m 68
25001 0 91 0 1 2 1 m 44
25001 91 98 0 2 2 1 m 44
25001 98 159 0 3 2 1 r 44
25001 159 392 0 4 2 1 r 53
25001 392 1509 1 5 2 1 r 55
44001 0 7 0 1 2 1 m 44
44001 7 6041 1 2 2 1 r 61
45001 0 2622 1 1 2 1 m 66
106001 0 3288 1 1 2 1 m 51
119001 0 5737 1 1 2 1 m 56
129001 0 5911 1 1 2 1 m 75
146001 0 2348 1 1 1 1 m 51
159001 0 5897 1 1 2 1 m 45
173001 0 3938 1 1 2 1 m 58
202001 0 3015 1 1 2 2 m 53
207001 0 1383 1 1 2 2 m 24
228001 0 1693 1 1 1 1 m 29
292001 0 144 0 1 2 1 m 35
292001 144 194 0 2 2 1 m 37
292001 194 3173 1 3 2 1 r 52 .... ....
The code I am running is:
coxph(Surv(as.numeric(tstart),as.numeric(tstop),as.numeric(status))~ /
codetype+gender+age+patientIMD+ /
cluster(id)+strata(event),method="breslow", data=coxModel)
My question is quite a simple one, and probably reflects more on my relatively new understanding of recurrent survival analysis.
Notice how some of the patient records have a r entry in the codetype after switching from an m. This is when a patient has gone from a general doctor (m) to a referral/specialist (r). In theory, once a patient starts seeing a specialist their risk of the disease recurring with the previous frequency should drop. Is this something that would be captured in the Cox regression? Will recurrent Cox regression know how to handle a covariate switching value over-time in the same patient?
I also ask as I would like to see how patient disease frequency (risk) changes when a patient switches from drug A to drug B. I've seen this using recurrent survival curves, but only with a single covariate at a time.
Relevant answer
Answer
Following
  • asked a question related to Cox Regression
Question
4 answers
Hi,
In my research I have got exposure (drug A=1, drug B=0), gender (1=female, male=0) and outcome (1,0). I created product term gender * exposure and created Cox model (gender, exposure, product term). However, I am not sure how to interpret attached results. Product term has significant p-value. Does it mean that female sex is not independent predictor of outcomes and is modified by the use of drug A? How should I describe this results in paper?
Relevant answer
Answer
Yes, the above answer is good. Notice that the important point here is that as the interaction term between sex and drug is significant, you should include the main effect in the model. If your interest lies in the effect of the drug on gender, then you should investigate further which direction the effect are. For large samples, split the data into male and female and re-run the model. This will give you a clearer understanding of what is going on. It may be that the drug is good for one gender and bad for the other. I hope this helps. All the best.
  • asked a question related to Cox Regression
Question
3 answers
I understand how survival methods can be used to determine the probability of survival given a dataset of 'time-to-events' with almost all examples considering cases of Alive/Dead e.g., cancer. However, how can I factor in cases of multiple remission and relapse events per person in a disease that will not take a life?
For example. Remission is defined by the absence for more than 90 days from medication or disease in a patient record. A relapse is returning to a similar medical/drug state any time after a single day beyond the 90-day disease/remit cut-off. To be considered as having ongoing treatment, there will be a continuous record of either drug prescription or disease code for less than 90 days at intervals (more than 90 days and we assume that the patient is in remission). e.g., visiting a doctor (or repeat prescription using the NHS model) at least once every three months.
Using these definitions, I can take an individual's medical history and table the number of days until time-of-event of diseased and remission. A good drug will mean remission was longer than diseased, or at least diseased is kept as short as possible even if there is then only a very short remission time. For example, Bob gets disease X at t=0 (and a drug prescription) and I start counting the number of days until there has been a 90-day absence of either, at which point I start counting the number of days as remission until the same drug or same disease appears and then I start counting again but for a diseased state.
patid days event
1 200 D (diseased)
1 450 R (remit)
1 340 D
1 500 R
2 ... D
2 ... R
I am using R and providing this data into the Cox regression function as though patid 1 (the first patient) is actually 4 people! Similar to how 4 people would be alive/dead in a cancer model.
I have coded all the logic to break down a group of individual's records into stages of diseased or remission. However, is it correct in a cox model (in R) to provide this information as is?
Relevant answer
Answer
Unless I am missing something (which is totally possible), observations should be independent which would be difficult when using a single patient as though they are four different patients. So you will most likely need to change the way you have entered the information in to the cox model.
  • asked a question related to Cox Regression
Question
12 answers
Cox regression modelling describes the first time to an event, but later events are discarded. I need a model to describe the impact of recurrent events on disease progression. Preferably applicable in SPSS.
Relevant answer
Answer
The answers is yes. You are correct that the Cox model was designed to model the occurrence of a single event. To solve this you can use
1.The Andersen–Gill model which basically extends the Cox model
2.The Prentice, Williams and Patterson (conditional risk set) model
3.The Wei, Lin, and Weissfeld (marginal risk set) model, and
4. Multi-state models.
I have done a review on this, see one of the papers on my pr-prints titled;
Modelling and forecasting recurrent recovery events on consumer loans
  • asked a question related to Cox Regression
Question
3 answers
I wanna use some variables (T-stage, M-stage, N-stage, Staging) in the Cox proportional hazard (or Cox regression) to investigate their impact on survival of colorectal cancer patients.
These variables are categorical and I know they are similar together. When I use univariate analysis, they are significant. But when I use multivariate analysis, they are not significant. and the standard error is big.
Do you know a method which handle this problem. I wanna investigate the impact of these variable on survival, simultaneously.
Relevant answer
Answer
Mehdi, Just added that to my original answer. Best, David
  • asked a question related to Cox Regression
Question
3 answers
I'm trying to investigate factors which can influence duration of drug use (days). Thanks in advance
Relevant answer
Answer
A good reference is by Frank Harrell. See the link below for the cite:
Prof Popa has given a good but terse summary. The book by Harrell Gives more details and examples. I would look there first. Notes from the book are at:
These are very good.
Best, D. Booth
  • asked a question related to Cox Regression
Question
4 answers
Hi,
I am looking at a longitudinal study of two ethnic populations. they were first seen/recruited at time T1. The total number of participants was 30k. out of these 15k were then followed up at T2. however this T2 is not same for all participants, like some participants were seen after 5 years, some after 6, some after 7 and so on. My aim is to look at the progression of chronic disease in these two populations and compare the results. I want to conduct survival anaylsis but because of the variable follow-up time I don't know how to proceed.
Can you please help me understand how to apply the tools for survival analysis and cox regression in this case?/
Kind regards
Saima
Relevant answer
Answer
you may not need survival analysis if you can score disease severity.
A very simple initial model would be to assume the chronic disease severity score , (including death as highest score) increases( or decreases) with time.
then a simple regression (initially assume linear) of disease score on time and a binary (0-1) variable for ethnicity would estimate difference in rate of progession between ethnic groups.
eg Disease score = beta0 + beta1 x ethnicity (1/0) + beta2 x time of follow up.
  • asked a question related to Cox Regression
Question
3 answers
My study is time to event study. I have determined cutoff points by logistic regression model in association with renal outcome. Is it correct to run cox model using the cutoffs determined by logistic regression? if not how can I determine cutoffs using cox model in stata or spss?
Relevant answer
Answer
It depends on application. For example in medical field we can not set cut off value to 0.05 but in social science we can set it.