Science topic

Survival Analysis - Science topic

A class of statistical procedures for estimating the survival function (function of time, starting with a population 100% well at a given time and providing the percentage of the population still well at later times). The survival analysis is then used for making inferences about the effects of treatments, prognostic factors, exposures, and other covariates on the function.
Questions related to Survival Analysis
• asked a question related to Survival Analysis
Question
Please take a look at the attached file.
I irradiated cells using a fractionation regime of 3 x 1 Gy after exposure to a substance in different concentrations.
I made an XY table with the determined SFs and plotted a graph using the LQ-model.
The equation I used was Y=exp(-3*(A*3*X + B*3*X^2)). Its an edition of the provided equation Y=exp(-1*(A*X + B*X^2)) in regard to the fractionation regime.
To determine the AUC I used the standard analyzing tool that Graphpad provided.
Could someone tell me, if this is right or if I mistaken somewhere?
Tank you very much in advance!
There are two, very different, ways to model an LQ model. The first assumes that the fractionated curve continues along the single dose curve. The second assumes that there is full recovery from each fraction and therefore the initial curve is repeated from the previous dose SF. The area between these curves is called the "envelope of additivity". See G.G. Steel or Peckham and Steel for more on this addition of survival curves for multifractionated doses. Interestingly, ionizing radiations (with shouldered survival curves) tend to repeat the initial portion of the curve (so-called repair of sublethal damage, but actually split-dose recovery), while some alkalizing agents, such as Bleomycin, have their curve continue along the single-dose curve (no split-dose recovery).
• asked a question related to Survival Analysis
Question
Hi, I am currently conducting a survival study to investigate the role of several potential biomarkers as prognostic factors in certain cancer. First, I perform Kaplan-Meier analysis for all the biomarkers and other relevant clinicopathologic data. However, only one biomarker fulfilled the proportional hazard criteria from the Kaplan-Meier curve. Other biomarkers and clinicopathologic variables do not fulfill the criteria.
I am wondering, do I still need to proceed to Cox Regression analysis? Can I include the other biomarkers and relevant clinicopathologic data in Cox Regression, even though they do not fulfill proportional hazard criteria during Kaplan-Meier analysis? Thank you.
Your question does not make sense.run the model that you wish to run.then look at the schoenfeld residuals for lack of pattern. See the attached screenshot reference for full details. Best wishes David Booth
• asked a question related to Survival Analysis
Question
Hi there,
I am doing survival analysis, and I know that some of the variables with a significant impact on survival on univariate analysis are closely related to each other, as they have been calculated from one other. Can I include them all in a Cox PH model to see which of the variables is/are an independent risk factor?
It sounds like you think some of your predictors may be confounded. Including multiple predictors is a way for adjusting for confounding between a predictor and the outcome. however, when the predictors themselves are confounded including multiple related predictors may not necessarily highlight THE most important ones.
as a simple example what happens if you include the same predictor twice(maybe with a little noise added in), do they get the same hazard ratio as if you only include it once?
• asked a question related to Survival Analysis
Question
Hi,
I developed probability of default model using cox PH approach. I use survival package in R. I have panel data with time-varying covariates. I made prediction with following code: predict(fit, newdata, type='survival'). However, predicted survival probability is not decreasing over time for each individual (see picture).
I wonder if prediction is marginal survival probability or cumulative survival probability?
If this is cumulative survival probability, why are it not decreasing across time?
Thanks
David Eugene Booth I did not know the answer when I asked question. Then I conduct some research and find out the answer. I wrote the correct answer in the comment section, since, if someone else will be interested in such a issue, this answer will help them.
• asked a question related to Survival Analysis
Question
In my study, I am using propensity score matching to balance the effects of covariates on the impact of prednisolone on death outcomes among COVID-19 patients. 92 covariates have been considered, including demographic factors, signs and symptoms, other drugs, laboratory tests, and vital indicators. A problem arises when I remove one of the variables (the ALT). This changes the final results significantly.
How can I ensure whether I should remove that variable?
I'm not sure I comprehended your query. This is the article that may be of assistance to you. "Adelson JL, McCoach DB, Rogers HJ, Adelson JA, Sauer TM. Developing and applying the propensity score to make causal inferences: variable selection and stratification. Frontiers in psychology. 2017 Aug 17;8:1413."
• asked a question related to Survival Analysis
Question
Hi,
I have been performing survival analysis using Cox regression models, and I encountered situation when after adding time-varying effect for a variable X (X*time; variable violated the PH assumption), the added interaction with time was significant in the model, but the main effect of variable X was not, as illustarted below:
Model without interaction with time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 0.4633 1.5894 0.1625 2.852 0.004 **
Model with interaction between X and time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 -0.3978 0.6718 0.4444 -0.895 0.371
tt(factor(X)) 0.6230 1.8645 0.2816 2.212 0.027 *
In the study we are interested in the effect of the X variable on the survival outcome, and after inclusion of the time-varying effect X*time, I am no longer sure about the value of the variable X in describing the risk of the outcome, as the main effect is now not significant.
Is the significance of time-varying effect of variable X enough to assume that the variable X is significant for the outcome risk, even though the main effect is no longer significant in such scenario?
Or, do both of them, the main effect of X and the time-varying effect of X have to be significant in the model to be able to say that X is significant for the outcome?
Any help in interpreting these is very welcome.
Thanks David,
I ran a Kaplan-Meier: the two lines cross early in the study, and only then we can see a quite ood separation. If I understand well, perhaps it isn't surprising that the variable has a time-varying effect (increasing with time).
I also inspected the proportional hazard (PH) assumption for this variable using the cox.zph function in survival R package - the red horizontal line represents the averaged coefficient (beta) as in the Cox model (model that operates under assumption of PH), while the thin, black line is the "real" beta for the variable. The test for proportional hazard for variable X did not indicate significant departure from PH (p = 0.079, PH is violated when p < 0.05), but the added time-varying effect is significant, which is reflected by the Kaplan-Meier and the plotted coefficient over time.
Regarding the interpretation of significance of the terms in the model, I received some advise from a biostatistician: in a model where a time-varying effect is included as an interaction term (as opposed to splitting the time and calculating HR for time intervals), the main effect represents the HR where the interaction term is equal to 0 (if it's a simple X*time interaction, the inyteraction is equal 0 when time=0; it might differ in situations where there is a time transformation, for example log(time), etc.). Bottom line, even if the main effect isn't significant in the model, the (time-varying) effect of the variable is still interesting.
• asked a question related to Survival Analysis
Question
Hi there, i am using the survival package in R studio with the following code to make a Kaplan-Meier-analysis:
surv_object <- Surv(time = cabos\$sg, event = cabos\$ï..status)
survfit(Surv(time = cabos\$sg, event = cabos\$ï..status) ~ 1, data = cabos)
fit1 <- survfit(surv_object ~loc , data = cabos) #p=0.97
ggsurvplot(fit1, data = cabos, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",risk.table.y.text.col = T,
risk.table.y.text = F, xlab= "Months", ylab= "Overall survival",legend.labs =
c("C500", "C501", "C502", "C503","C504", "C505",
"C508", "C509"), title = "Location")
This resulted in a Kaplan-Meier Curve with a p value of 0.97.
Yesterday, I added some labels so i could translate it to english, but the p value changed to 0.87. There were no changes made to the code or the dataset. This concerned me so I ran the statistical analysis in SPSS and obtained the same result as my previous KM curve (p=0.97).
I have tried to run the analysis without the translation, without converting my variables to factors, converting my variables to factors, runing new code from scratch but i can't get the previous value obtained.
What could be the problem with R studio?
PS don't you have everything that you started with??????????
• asked a question related to Survival Analysis
Question
I recently aimed to plot a kaplan meier curve with spss however although I have done all the steps according to standards, the software seems to have a problem with ploting the curve and doesnt plot it at all. (Is shown in the picture) Does anybody know how I can solve this issue?
I tried it with a different PC and it worked fine. Seems like its a problem with my system.
• asked a question related to Survival Analysis
Question
I'm currently working on survival analysis for gendered effects of executive exit. The aim is to investigate if female leaders have higher chances of turnover compared to their male counterparts. I've ran all of the basic analysis, but am now stuck on an issue of interaction in Cox models.
The issue: I'm trying to find out if any (or all) of the control variables in my Cox model have different effects by gender. For example: In my original models executive age is a control variable, but maybe the hazard of leaving is more related to age for women than for men. To do this, I wanted to run ALL of the control variables with an interaction term of gender. My questions:
1. Should I do this within the same model (e.g. fit1 <- coxph(Surv(Time, Censoring) ~ age:gender + nationality:gender + ....)) or in separate models for each interaction? What makes more sense here?
2.  In both cases, results look something like the attached picture (the variable American measures whether a person is american (1) or not (0))
Table shows coef, exp(coef)=HR, se(coef), z, Pr(>|z|)
How should I interpret this?
The model should include the simple effects, as otherwise the interaction is not the difference in the effects. So the model should be like
full = coxph(Surv(Time, Censoring) ~ (age+nationality+...)*gender)
and there you would test the interaction terms via the analysis of deviance for the Cox models with and without the respective interaction term:
restricted = update(full, .~. - age:gender)
anova(restricted, full)
To check if there is a global interaction:
restricted = coxph(Surv(Time, Censoring) ~ age+nationality+...)
anova(restricted, full)
Note that it is important that the relationships between all the predictors and the log hazard rate is linear. If this is not the case, then the interaction will only catch non-linearities and this will lead to a misinterpretation.
• asked a question related to Survival Analysis
Question
Hello Researchers,
I apologize if my question is a bit too simple
I'm currently working on a study concerning patients with myelodysplastic syndromes (MDS)
most patients were followed up for a good period of time (median was about 3 years)
but since MDS isn't that deadly of a disease, most patients (thankfully) lived, so not many events occured which led to a lot of censoring
in this case
is it suitable to do a survival analysis?
and maybe patients also were followed up for a lot of years (10+), but they aren't the majority
in this case, could i cut the data and interpret it for 3 years for example ? like the survival at 3 years was 70%
or how else could i deal with the censoring
(about 30# of patients died in the study)
Hi,
Any event occurring decreases the probability of the survival curve. The event is defined by the investigator. A patient improves and gets out of the treatment can be an event, likewise worsening or mortality..We can also compare across different survival curves.
• asked a question related to Survival Analysis
Question
The main independent variable is A, but I want to adjust the model with a covariate B. When I check the PH assumption, A holds but B does not. Do I need to run a Proportional odds or an AFT model for that?
Here are some notes you may find useful:
The output was generated using Stata, in case you're wondering.
• asked a question related to Survival Analysis
Question
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
The survAUC R package provides a number of ways to compare models link: https://stats.stackexchange.com/questions/181634/how-to-compare-predictive-power-of-survival-models
• asked a question related to Survival Analysis
Question
Good morning,
I am doing survival analysis to predict the risk of specific complication after surgery. We did KM analysis, and the risk was clearly higher immediately after surgery and then flattened after 120-160 days. Using the lognormal model, I tried the parametric survival analysis to consider the model as accelerated failure time. So my questions:
1. Is it appropriate to use the parametric model in this condition?
2. How to get the Hazard Ratio from the parametric survival analysis? I was able to get estimates, but no clear HR.
3. How to interpret the hazard vs. time plot? As a shape, it is very nice looking to tell that the hazard is decreasing over time, but the hazard value on the Y-axis ranged between 0.00015 to 0, I don’t know how to interpret it.
4. Can I get cut off point on the time where the hazard will change significantly?
Thank you very much,
Amro
It sounds like your data demonstrate some feature of piecewise HR, I would suggest running analysis to accommodate this feature.
A very good reference is the book: Survival Analysis A Self-Learning Text
You can check the example on Section 6 (Starting from Page 593 to Page 597) to fit the piecewise HR Cox model, also, you can check the Stanford Heart Transplant Data on Page 265-269. The transplant may play a similar role in your data of surgery.
• asked a question related to Survival Analysis
Question
Hello everyone,
Hope you all are doing well.
I am trying my hand on the Fine-Gray model to calculate subdistribution hazard ratios for cardiovascular death using the NHANES dataset in R. It was not that difficult to calculate cumulative incidence function(CIF), but since it is a nationally representative sample I am having issues accounting for the survey weights and Stratum.
I tried different using the Survival and 'cmprsk' packages but none has a provision to incorporate weights and stratum in the regression model.
Any suggestion will be appreciable.
Thank you
Fine-Gray model accounting for clusters apologies this is the one that I meant to send earlier. Best wishes, David Booth
• asked a question related to Survival Analysis
Question
Hi everyone,
This is my first time of attempting to run survival analysis. I have a data on patients with End-stage Kidney Diseases (ESKD). I would want to run survival probability by treatment method (eg conservative therapy vs dialysis), duration of diagnosis etc
Thank you
@dorcas before you start your analysis, always make sure KM assumptions are met in your data.
Best
MB
• asked a question related to Survival Analysis
Question
Let us start to discuss survival analysis uses particularly in medicine and health sciences.
To which extent it will be helpful...
Its application using SPSS
Let's see this video first:
• asked a question related to Survival Analysis
Question
I'm using a cox proportional hazards regression to analyze latency data, i.e. time until behavior x occurs. The model I'm running is fitted with the "coxme" package in R (which is a wrapper to the "survival" package if I'm not mistaken) because I wanted to fit a mixed model for the survival analysis. The model converges without problems, however, when I'm trying to test the assumptions of the model I get a fatal error, and my R session crashes. Specifically, when I try to test the "proportional hazard" assumption of the model using the "cox.zph" call, the R session crashes and I don't know why, because the function is supposed to work with both a mixed model (coxme) and a standard, non-mixed, model (which is a "coxph" object in the package terminology). I've tried the non-mixed version of my model and it provides the desired output, but it won't work for my intended model. I've also tried updating RStudio, so I have the latest version, but it didn't help. Finally, I've tried to manually increase the working memory dedicated to RStudio, in case the function was memory demanding, but it didn't help. Looking around at different forums has provided no answers either, both with general search parameters like "causes for R session crash" and more specific, like "cox.zph cause R session crash", but I could not find any help.
Has anyone experienced this error? Were you able to solve it, and if so, how?
I appreciate any advise I can get on this issue.
I am not familiar with that package but you may wish to look at a similar process In our attached paper. Best wishes, David Booth
• asked a question related to Survival Analysis
Question
I have encountered a problem that I would like to ask for help: My purpose is to use competing risk (cr) model to train a survival data that contains multiple records within one patient. I tried to use a bunch of R packages and funtions that are able to train mixed-effect cr model with a cluster(ID) (e.g. patient ID) added.
so far the R functions I tried and the results are: (1). FGR() worked fine for fixed-effect only cr model but cannot train a mixed-effect cr model; (2). CSC() is able to train a mixed-effect model but is unable to predict the risk probability using predict(); (3) comp.risk() worked fine for training and predicting a mixed-effect cr model, but it doesn't allow cindex calculation using cindex() or concordance.index().
Now the question is, how can I calculate the cindex for my validation set from the result of comp.risk() and predict(). I've gotten a fitted model and predicted risks in certain times (for instance, 3, 5, 10 year). Do I need to have predicted risks in all possible times in order to calculate the cindex? Is there a better way or simple solution for this?
Thank you very much for you all help.
• asked a question related to Survival Analysis
Question
Dear research gate members, would you please let me know the specific assumptions and conditions to use these phrases?
Kaplan–Meier provides a method for estimating the survival curve, the log rank test provides a statistical comparison of two groups, and Cox's proportional hazards model allows additional covariates to be included. Both of the latter two methods assume that the hazard ratio comparing two groups is constant over time. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065034/
• asked a question related to Survival Analysis
Question
Lifetime data with bathtub-shaped hazard rate function are often encountered in survival analysis and the classical lifetime distributions mostly used cannot adequately describe the data. Is there any parametric survival model suitable for modelling such type of data?
• asked a question related to Survival Analysis
Question
Hi!
I'm trying to develop a prediction model for a type of cancer. My end-point is cancer specific survival. I have around 180 events, and an event-per-parameter analysis gave that I can consider about 9-10 parameters to include in the model.
Now, I have dozens of candidate variables, many of which are collinear and some are more truly independent from each other. Some variables are well known risk factors for cancer death, some are more novel but still merit more research.
In my field there is a well-established risk classification system that uses cancer histological grade, stage, lymph node status and tumor size. This classifies cases into 4 risk categories, with increasing risk of death. These four variables have not previously been included in a survival model and there is no published survival regression formula/function with beta-coefficients and intercept for a model with this variables included. Instead, the four risk categories are based on mostly clinical reasoning, expert opinion, experience, and survival rates of the four groups.
My question is if when I'm developing my model if I should include this four variables as separate independent variables, and also add another 4-5 candidate variables that I want to investigate or can/should I include these four variables as a singel composite independent four-tier categorical variable and thus save up degrees of freedom to include more candidate variables? What pros and cons are there with each approach?
Hi Take a look at this method. It really works well for your application.. program available from my co-author Ozgur. Best wishes, David Booth
• asked a question related to Survival Analysis
Question
Hi
At the current point, I am wondering about which survival analysis to use for my data. My data is consisting of one control group and three exposure groups. One of the exposure groups is known to have a difference in toxicity over time (in my case, taken into account by making a gradient over distance). Therefore it is necessary to divide the groups into three stages, making 12 survival curves (with eight individuals per stage) per experiment. The data is only right-censored, but the amount of censoring in the different groups is wearing much. That being, the control and one of exposure group has almost no censoring (three of the six stages in the two groups has only censored data, the rest of the stages only one or two deaths) The survival curves between two of the exposure groups are crossing.
Also, many of the survival curves inside the different exposure groups (two of the exposure groups and the control) are crossings, as they do not have a difference in toxicity over time.
What would be a soothing survival analysis for such an experiment? Moreover, how problematic is the crossing of the curves if it's expected?
sorry for typo '"h."
• asked a question related to Survival Analysis
Question
I'm trying to work on a meta-analysis of survival data (PFS, OS).
I'm aware that the most common practice for analysis of time-to-event data is hazard ratio,
but some articles provide median survival with chi-square value (for example: χ2 =4.701，P=0.030 as a result of Log-Rank test).
Would there be some way to integrate this into meta-analysis of HR?
When doing a meta-analysis using survival data, you want to establish the Hazard Ratio, but you also want to give context for how that effect in the model translates into the survival of the examined patients via the risk scores that most survival models give each patient.
I would advise to report the HR for the models that you are analysing with a 95 % confidence interval for each HR and then include the p-values of the log-rank tests of the Kaplan-Meier curves. This gives the reader an idea of the effect of the features on the probability of having an event and how that effect translates then to the whole cohort.
There is this paper I found from the PLOS One journal where the authors try to do a meta-analysis of the reproting styles of differenty Phase 3 medical trials with different endpoints and methods. Maybe it could prove useful to you :
best regards, and hope this helps!
• asked a question related to Survival Analysis
Question
Hello,
I built cox proportional hazard model with time-dependent covariates. when I predict survival probability, it wasn't monotonically decreasing for each ID. what is problem and how to handle it?
thanks
As professor David Eugene Booth said, use the plots and check the presumptions of the Cox model about your data.
• asked a question related to Survival Analysis
Question
Hello,
I have a methodological question regarding time-to-event analysis (TTE).
I want to compare several medication therapies (4 groups with different drugs) in a TTE in a cohort study. My colleague wants, that I only include those subjects, who suffered from the event/endpoint. At the time point of suffering, the group assignment should be chosen, means, if a subject suffer from the event I have to look for its current medication and use the start point of that medication to define the person-time until the event. Actually, the aim is to compare the time to event among those medication groups. It is not a drug efficiency study.
To be honest I'm not sure whether the above described approach is methodological correct. In my opinion there will be no censoring as only patients with events would be included and then just compared due to their time to event under certain medication.
I cannot find a lot of information in the literature. All resources I reviewed only addressed the issue of the censoring type but not how to deal with the approach above.
I would be very grateful, if somebody could give advice to this issue.
Thanks a lot,
Florian Beese
You're right! I was confused, too and I could convince my collague that it would be an efficacy comparison, indeed. In the meantime, we're decided to handle it another way.
Thank you very much for your response, anyway!
• asked a question related to Survival Analysis
Question
Hi,
During the analysis, I get the following error during the drawing of the survival curve in R. Do you have a similar problem or suggest a solution?
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 232, 0, 464
Package : Survival. Function : ggsurvplot
I have faced this error as I forgot to convert the "survival status" variable to numeric. This error happens if we input "survival status" variable as categorical instead of numeric. Thanks for the above discussion.
• asked a question related to Survival Analysis
Question
Event History Analysis
It is possible. This paper covers an extension of your question:
• asked a question related to Survival Analysis
Question
I’m conducting a meta analysis for my dissertation and have issues running my data. It utilises median overall survival (months) and does not have usable controls. I’m counteracting that by using a second treatment method fir comparison. I’ve noticed some studies use historic controls, and form hazard ratios from them. Is it possible to treat the secondary treatment as a historic control and form hazard ratios across studies?
Otherwise single arm survival studies are awful to try and run analysis on. (Oncology is a pain).
Mathematically, nothing should stop you from doing so. Methodologically and clinically, however, I think that such a move can bring about more questions than answers in your analyses.
First and foremost, you have to justify convincingly that your uncontrolled sample and the historical comparator you are planning to benchmark it against are sufficiently similar in clinical context. However, that may even not be enough in light of unknown/unconsidered confounding.
If it is an option, perhaps a systematic review without meta-analysis, rather than with meta-analysis, may be a safer bet to synthesize the evidence.
• asked a question related to Survival Analysis
Question
At first please see the attachment.
This experiment was performed for a period of 7 days (n=100). After 7 days, we found different level of survival rate (%) for each treatment. Now, I want to visualize this result with Cox proportional hazard analysis or Kaplan-meier curve survival analysis in SPSS where we want to report the cumulative hazard (OR with a 95% CI) and the time period assessed for survival as part of the analysis.
How can I analyze this result?
Thank you.
See attached for an example of a comparison. Details in the second. Best wishes, David Booth
• asked a question related to Survival Analysis
Question
I am currently faced with the challenge to measure integration speed of an acquisition or a merger with secondary data. Usually companies do not communicate once a target has been integrated successfully. They only announce for instance that an M&A deal has been signed (this would be the starting date and easy to find online). However, how can I find out when the company was fully integrated? I am focusing on Dutch M&A deals and it seems that Dutch companies do not communicate about it much apart from when a deal has been signed. Sometimes it is possible to find out when a target ceased to exist but this too is quite difficult. I saw in most papers that surveys were used but due to my time constraint it is a too high risk to send out surveys to Dutch corporate M&A teams due to the high possibility of a low response rate. I also saw in one paper that they have used survival analysis. However, I could not make sense in that paper how they specifically calculated integration speed for each deal. If anyone has an idea or knows a paper that has done that with secondary data, I would appreciate any help.
There is route how to tackle this problem. You take all public data of merges with known internal data (stock market, profit, ...) and apply complexity measures.
The same will be done for companies without known internal data.
You will try to somehow decode the internal development from just external data using just complexity measure and statistics by comparing the above provided two classes of companies. Alternatively, you can use AI/ML as extension of statistics when it fails.
• asked a question related to Survival Analysis
Question
I am conducting a Meta-analysis where I have data of only survival plots. I am stuck on performing meta-analysis on it because there is no control group for the studies. How should I perform pooled survival analysis on it. Are there any studies that I could use for reference?
Whenever you have a common comparator, there are methods to derive a valid group estimate by indirect comparison. You may find useful guidance here:
• asked a question related to Survival Analysis
Question
I have sequentially measured the time-series Electronic Health Record patient dataset. This is 7 years of research and each patient's blood pressure, Hba1c, LDL, Hb, Baseline Age,...were measured (as the independent variable for modeling) every 4 months and at the same time, each patient was classified as healthy/not healthy (output/dependent variable). Before modeling, we assigned the output of each patient only as 0 /1 (0= if the patient is assigned 0 at least once, 1= otherwise). So we have time-series (HBA, fpg, ...) and not time-series measures (race, baseline_age) for independent variables , but only one output (0/1) for each patient. I want to model this data, there are some methods used for such kind of dataset, such as; using baseline values or using the mean value of each independent variable. In addition to these two methods, I want to use time-series analysis, but I am not sure what am I going to use? Looks like a survival analysis, but it is not, since the research didn't end when we see 0 value. You can see the visualization of the data structure below. Thanks for all your responses in advance.
Elvis Munyaradzi Ganyaupfu , yes we will use some ML algorithms in our analysis and we are also looking for alternative models. We use both Python and R. Our research question is on the long-term effects of hypoglycemia.
• asked a question related to Survival Analysis
Question
Is someone familiar with a way to plot survival curves for a model created with the coxme function?
(i.e. ggsurvplot but for coxme objects instead of coxph objects)
I am relatively new to survival analysis so correct me if I am misunderstanding something. I know it is more complex because the "frailty" (i.e. the random effect) modifies the baseline hazard function, but is there a way to for example produce a "predicted" survival curve?
Sorry for the delay and that I have to disappoint you. I wasn´t able to solve the problem. I ended up basically just plotting raw data (of course declaring it as such) using the survfit function from the survival package. Then I just extracted the surviving fraction for each treatment combination and plotted it with ggplot. I hope this also works in your case.
• asked a question related to Survival Analysis
Question
In survival analysis, some data are censored. How do we incorporate it into ANN
• asked a question related to Survival Analysis
Question
We are trying to find if there is an association between postoperative pulmonary complications (PPCs) and overall survival in a cohort of 417 patients. In Kaplan-Meier there is a significant difference in overall survival between patients with and without PPCs. After testing the proportional hazards assumption in cox regression (both through visual analysis and through log minus log plot) we found that our data failed to meet the assumptions. The way I interpret this, it means that the hazard of dying due to a postoperative pulmonary complication varies over time? I'm trying to figure out how to perform a survival analysis now that I can't use the standard cox regression.
From what I understand I could use time dependent coefficients (the same as variables in this example?) but I don't really understand what is meant by that or how I would do it in SPSS. Does it mean I turn my PPCs variable into a time dependent variable and then run the cox regression analysis the way I would if my data would have met the assumptions or how do I do it?
I would be really thankful for guidance or corrections if I have misunderstood something! I'm a PhD student and I don't have much experience in statistics so explain it to me like I'm 5 (a smart 5-year-old!)
Dear Olivia Sand . The attached article provides step-by-step instructions on how to run Cox regression analysis in SPSS and I believe it would be a perfect fit for your needs. Please check it out!
• asked a question related to Survival Analysis
Question
Data with 40 eyes and 20 subjects. Plan to do a survival curve (KM curve). Question is how to cluster the eyes. I tried using the same ID for the same subjects. But the thing is, for few subjects the begin time is "0" i.e. time0=0 and the end time is say for example 2 months (i.e. event happened at month2). While running the below command
stset time, fail(outcome) id(ID)
it excludes the observations for the subjects (both eyes) with same start time and end time. what is the option to include that both eyes with same study time while clustering between eyes?
Take a look at these: they may help: file:///C:/Users/user/AppData/Local/Temp/3-31106271.pdf
Best wishes, David
• asked a question related to Survival Analysis
Question
I would like to know whether it is possible to model competing events while including the type of observation period as a covariate? For example, if I wanted to model competing events in two different tasks, one that lasted 3 mins and the other 5 mins. These are two different observation periods, but can events from both be included in the same analysis?
Second-by-second observation should be more than adequate to characterize hazard rate functions. You will still need to have a sufficient number of events of each type to characterize hazard rate functions with any precision, particularly as you add parameters to convey time dependence. Hopefully this is helpful to you.
• asked a question related to Survival Analysis
Question
For a prognostically relevant gene (HR<1 or HR>1 with p<0.05) in terms of survival, is it necessary that the overall survival time and gene expression have a good positive/negative correlation?
We are using TCGA RNA-seq data and clinical information though what we observe is bad (pearson) correlation and/or an insignificant p-value for genes having a significant HR. We have also tried normalising the data and employing spearman correlation.
The survival analysis is based on longitudinal time data. the expression of the genes should be correlated to the gene expression. The only thing I wonder about is the misunderstanding between Kaplan-Meier analysis and univariate Cox regression analysis. Are they the same? In fact, the Cox Proportional-Hazards Model is originally a multivariate statistical modeling.
• asked a question related to Survival Analysis
Question
I am using survival analysis for repeated events and I want to see if the effect of the time-dependent covariate differs by group.
For fixed categorical covariates, such as a group membership indicator, Kaplan-Meier estimates (1958) can be used to display the curves. For time-dependent covariates this method may not be adequate. But, Simon and Makuch (1984) proposed a technique that evaluates the covariate status of the individuals remaining at risk at each event time.
• asked a question related to Survival Analysis
Question
I have data from an experiment where we looked at the time-to-death of pseudoscorpions under three treatment conditions: control, heat, and submersion underwater. Both the control and heat groups were checked daily and those that survived were right-censored. However, the pseudoscorpions in the underwater group were unresponsive while submerged and would only become active again when they were removed from the water and allowed to dry. Therefore, we removed 10 individuals per day and checked how many of the 10 were dead. All individuals removed from the water on a given day were also removed from the study on that day i.e. we did not re-submerge individuals that were observed to be alive after they were removed from the water the first time. The underwater group is therefore left-censored. We have run Kaplan-Meier curves for all three groups and the underwater group has a much steeper curve and non-overlapping 95% confidence intervals compared to the control and heat groups.
Is there a way to further analyze all three groups in one model given that one level of the treatment is left-censored and the other two levels are both right-censored? Can a Cox regression be run on the left-censored group by itself to produce a hazard rate with 95% CI for the rate? I am a biologist so try to make your answer intelligible to a statistical novice.
In Stata you can perform a simultaneous right and left censoring.
For Poisson regression:
cpoisson outcome independent1 independent2 independent3, ll(?) ul(?)
For Linear regression:
tobit outcome independent1 independent2 independent3, ll(?) ul(?)
ll --> is the lower limit for left censor (in number)
ul --> is the lower upper for left censor (in number)
For cox regression I have never used, please check:
• asked a question related to Survival Analysis
Question
I fitted a Cox proportional hazard model and checked the proportionality assumption using R's cox.zph function, with the following output:
chisq df p
Var1 0.0324 1 0.857
Var2 0.1972 1 0.657
log(var3) 4.1552 1 0.042
Var4 4.6903 1 0.030
Var5 0.6472 1 0.421
Var6 1.2257 1 0.268
Var7 4.9311 1 0.026
Var8 0.3684 1 0.544
Var9 2.0905 1 0.148
Var10 0.0319 1 0.858
Var11 4.0771 1 0.043
GLOBAL 14.2625 11 0.219
In the study, Variable 1 and 2 are the ones I'm actually interested in, whereas the others are only controls. As you can see, the PH assumptions seems to hold for these two covariates, but most prominently not for Var3. Can I still interpret my findings on Var1 and 2 and talking about Var3 add, that this is displays the average effect over the study-time? Or will my coefficients for Var1 and 2 be biased/incorrect?
Check it.
• asked a question related to Survival Analysis
Question
I fitted a Cox PH Model and upon examination of the Schoenfeld residuals it became apparent that the proportionality assumption was violated by several variables. Following Box-Steffensmeier & Jones (2004) I included interactions with time for these covariates. However I'm not sure how to interpret this. So its obvious that this indicates time-dependency of the covariate in the sense that the effect inflates/deflates over some function of time, but I work with sociological data and my theory indicates no time-dependency of the effects in whatever direction (Also it would not make sense in any way). So if I get that right I should therefore consider the time-dependency to come from some kind of unobserved heterogeneity? Due to the nature of the data I can also not implement frailty or fixed effects to account for this. So how do I interpret a coefficient that increases/decreases as time progresses given that the theory does not indicate this?
Variables with time-varying effects and the Cox model: Some ... In such case, the interpretation of the models is conditional on the length of the survival time, and results should thus be ... https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-10-20
• asked a question related to Survival Analysis
Question
Can anybody guide me regarding minimum required events (recurrence) for a successful recurrence free survival analysis in oral SCC? I'm expecting 30-35 events in my cohort of 108 patients at 3 years. Will it be good enough?
The study power depends on the number of events, the total follow-up time in person years, and the ratio between the sizes of the groups being compared.
Simulation studies have tended to the conclusion that you need ten events or more per predictor variable in your model. More recently, bigger and more comprehensive simulation studies have cast doubt on this hard-and-fast rule. Vittinghoff and McCulloch (2007), in a very widely-cited paper, concluded that “problems are fairly frequent with 2–4 events per predictor variable, uncommon with 5–9 events per predictor variable, and still observed with 10–16 events per predictor variable. Cox models appear to be slightly more susceptible than logistic. The worst instances of each problem were not severe with 5–9 events per predictor variable and usually comparable to those with 10–16 events per predictor variable.”
Since then, further simulation studies where prediction models are validated against new datasets tend to confirm that 10 events per variable is a minimum requirement (see Wynants 2015) for logistic regression. These studies are important because they are concerned with the generalisability of findings.
The second factor that will influence sample size is the nature of the study. Where the predictor variables have low prevalence and you intend running a multivariable model with several predictors, then the number of events per variable required for Cox regression is of the order of 20. As you might imagine, increasing the number of predictor variables and decreasing their prevalence both require increases in the number of events per variable.
Based on current research, the sample should have at least 5 events per predictor variable ideally 10. Sample sizes will need to be larger than this if you are performing a multivariate analysis with predictor variables that have low prevalences. In this case, you may require up to 20 events per variable, and should probably read the paper by Ogundimu et al.
• Courvoisier, D.S. et al., 2011. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. Journal of Clinical Epidemiology, 64(9), pp.993–1000.
• Kocak M, Onar-Thomas A. A Simulation-Based Evaluation of the Asymptotic Power Formulas for Cox Models in Small Sample Cases. The American Statistician. 2012 Aug 1;66(3):173-9.
• Ogundimu EO, Altman DG, Collins GS. Adequate sample size for developing prediction models is not simply related to events per variable. Journal of Clinical Epidemiology. Elsevier Inc; 2016 Aug 1;76(C):175–82.
• Peduzzi, P. et al., 1996. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), pp.1373–1379.
• asked a question related to Survival Analysis
Question
I'm currently working with event history data studying how long after implementation countries abolish certain policies. Regarding the policies I also have an index on how far the countries went with their policies ranging from 0 to 100.
I wanted to control for this, in order be able to control for their point of departure. However the coefficient violates the proportionality assumption.
Can I stratify for the continuous variable of that index? I understand it so, that this would allow every country to have a different baseline hazard with respect to their point of departure. Playing around with the data this didn't produce an error.
Could anyone tell me if I can trust these results or if I have to categorize the variable first?
The PH assumption relates to the entire model, including all predictors and covariables deemd interesting or relevant. "Restoring" PH by irgnoring or removing a covariable is not ok, as it likely demonstates that the inclusion of the covariable is relevant because it explains so much of the variance in the data that deviation from the PH assumption becomes clear.
If you have non-PH, you might start investigating if the effects of the predictors/covariables in the model are not linear. If this is not successful, an appropriate partition of the time axis might be the key (early effects are different from late effects, but the hazards are proportional within the early as well as within the late phases). If this also does not help, you might really think of stratification (what, by definition, is possible only for categorical variables). I don't consider it a good idea to categorize a continuous variable just to be able to use it for stratification. But if nothing else works, this might be a last rescue. But I would then check if the violation of the PH assumption really does more harm than the categorization of the continuous variable.
• asked a question related to Survival Analysis
Question
I want to perform a GEE analysis with repeated-measured factors and survival analysis (like Cox regression but with repeated measured variables) in SPSS. How do I proceed with this dataset? I had transformed into long form format but I can't work properly.
I’m not sure you can. I’ve used multilevel modeling for censored regression using brms in R which is the closest I’ve encountered.
• asked a question related to Survival Analysis
Question
.
The following article shows how to calculate the hazard ratio and its interval from a K-M curve, it includes a spreadsheet to guide the process
( Tierney JF, Stewart LA, Ghersi D, Burdett S, Sydes MR. Practical methods for incorporating summary time-to-event data into meta-analysis. Trials. 2007;8:16. Published 2007 Jun 7. doi:10.1186/1745-6215-8-16 )
• asked a question related to Survival Analysis
Question
Is multivariable logistic regression instead of Cox proportional hazards model acceptable for survival analysis with 2 year follow-up? Some papers are based on such methodology. I wonder whether this makes sense.
Cox proportional hazard risk model is a method of time-to-event analysis while logistic regression model do not include time variable. For example, we can imagine an intervention in a randomized trial that only delays the onset of an endpoint and the number of events in the two groups is the same. In such a situation, logistic regression will not reveal the benefits of the intervention in the study, while the Cox model does. Of course, the Cox proportional hazard model has advantages over logistic regression in this respect, but it cannot be concluded that logistic regression is not a good method of analysis. The logistic regression result can be presented in addition to the Cox model, e.g. to better visualize the differences in the number of events between groups. However, when a survival analysis is performed, the Kaplan-Meier curve is usually also presented, so it is difficult to omit the time variable. Perhaps the studies you mention are comparing survival at the start of the study and at the end of the study (after 2 years), where the exact time of the endpoint is not known. In such a situation, logistic regression will be a better choice than the Cox model.
• asked a question related to Survival Analysis
Question
We're working on meta-analysis of studies that provide median OS data. We conducted other meta-analyses using objective response rates (ORR) and used the inverse-variance method. But we're not sure what method to use for OS. I'm comfortable using R on packages like "meta" or "metafor" and I can use a new package if necessary.
Second problem is that some studies did not provide a confidence interval, and I read a systematic review (attached below) and they dealt with this problem by "repeating the analysis 1000 times on a bootstrap sample", I need some explanation on how we can do this.
Thanks a lot!
For your specific problems, I think you could consider these below papers:
2) In case your single-arm studies provided the survival curve, you can consider another approach:
PS: I need to say something:
- The pooled results from single-arm studies maybe not make sense in some journals due to the absence of the control group.
- I don't know that the methods described by McGrath et al. are suitable for time-to-event data or not because the time-to-event data is different than standard continuous data.
• asked a question related to Survival Analysis
Question
For instance, I have A treatment modality and I want to analyze overall and progression-free survival rates in 2 groups of patients : young and old ones. These two groups have two corresponding subgroups of patients according to disease-related factor. Is Cox-regression analysis appropriate in this case?
An important question, I am waiting for the answer, too.
• asked a question related to Survival Analysis
Question
Hi all,
TCGA brings data regarding head and neck cancer...does anyone know if it's possible to analyze subtypes separately such as tongue or oropharynx?
My second question is: is there any simple tutorial for beginners on how to run survival analysis using this dataset? I would appreciate any tips and thoughts.
Thanks!!
I'm interesting
• asked a question related to Survival Analysis
Question
For example, I have a dataset of HE4 in the form of a numerical variable. and I want to apply it to a survival analysis for ovarian cancer prognosis. How can I find a cut off of HE4 value which is the best for prognosis of survival outcome?
• asked a question related to Survival Analysis
Question
I'm running survival analysis on SPSS and the CI for the median survival time doesn't appear sometimes
I think this post might be useful:
I'll copy and paste its content anyways (in case it gets lost):
No standard error or confidence interval for median survival time estimate in the SPSS Kaplan-Meier procedure
Troubleshooting
Problem
I'm running the Kaplan-Meier procedure in SPSS. I'm getting an estimate of the median survival time, but the standard error and confidence interval bounds are missing. Why would this be happening?
Resolving The Problem
The method used to compute standard errors and confidence intervals for percentiles of the survival time distribution in the SPSS KM procedure requires that for the pth percentile, one have an estimate of the survival distribution at the value p-5 as well. The median is handled simply as the 50th percentile. This means that you also need to be able to estimate the 45th percentile in order to obtain a standard error and confidence interval bounds for the median or 50th percentile. Thus if the survival function does not reach .45, you will not be able to obtain a standard error or CI bounds for the median
• asked a question related to Survival Analysis
Question
Hello dear professors and colleagues,
I have a data set where I want to applied three methods: Logistic regression , linear regression and survival models, of couse each method focalise on an information part of the data set, my questions are:
1. It's coherent to incorporat all in the same study?
2. are they complementary methods ?
3. After estimation, can we select the best method and if so, which criterion we should use?
I agree with everyone else about needing to know your research question and the outcome measure(s) you want to study. I'll add that with your mention of OLS regression and logistic regression, I wonder if ordinal logistic regression would also be something to consider because it is midway between the two. You could divide your outcome into tertiles or quantiles, say, and look at which factors predict upward or downward movement through them. I'll finish by saying that both methods -- transforming a continuous regression into either and ordinal logistic regression or logistic regression -- lose information, so you should have a good reason for doing them that outweighs that con.
• asked a question related to Survival Analysis
Question
I am performing a univariate survival analysis (cox-ph regression) wherein I am trying to find the association of expression of some genes with patient survival in a specific cancer. For some gene 'X' I am getting a very good HR value (~5) but the p-value is insignificant (p=0.19). What does this imply?
I have many similar cases
Tadesse Fikre Teferra: "Insignificant p-value means that the observed effect is a kind of "just by chance" and not a natural one. "
It means that you don't have enough data to distinguish signal from noise (for whatever reason!). You can fail to reach significance even when there is a very relevant or large "true" effect (depending on the sample size and the noise in your data).
• asked a question related to Survival Analysis
Question
Hello everybody!
I've run a cox proportional hazards model for survival analysis in a cohort of pancreatic cancer patients with SPSS v. 25 and I want to
1. compare the accuracy (with the Harrell's C) of my model with classical staging
2. to measure the Cs of my model after bootstrapping
I've tried the macro available in the IBM web site, but it does not work (many errors)
...anyone can help?
THANKS
I found this useful, maybe you can also try! Tarek Alsaied and Nipun Verma
Good luck :)
• asked a question related to Survival Analysis
Question
I am doing a survival analysis of insects to various sub-zero temperatures where I want to correct mortality in my treatment groups by mortality in my control groups (not exposed to low temperatures). I have corrected mortality in all of my treatment groups to the control using the Henderson-Tilton formula (as opposed to Abbott's because some of my groups have unequal numbers of individuals) where:
Corrected % Mortality = (1-((n in control before treatment*n in treatment group after treatment)/(n in control after treatment*n in treatment group before treatment)))*100
I next want to analyze the data using probit regression, so I need to convert these corrected %s back into counts of binary 1s or 0s. The problem is that most of my "normalized count data" are not whole numbers (eg. in one treatment group I have 0.27 dead and 9.729 alive). The only way I can think of to correct this issue is by rounding to the nearest whole number, so in my example I would have 0 dead  and 10 alive. Is this consistent with the standard of how this analysis is done? Am I missing something here? Any advice would be greatly appreciated!!
following
• asked a question related to Survival Analysis
Question
I'm doing survival analysis using cox regression model. to that end I have 2 different dataset, one for training and one for testing. once I apply cox regression on training dataset I'll have a set of coefficients each related to the corresponding feature and a C-index to report. when I want to test my model in the test dataset, should I use the same coefficients and extract the C-index or should I apply cox regression model again on the test dataset and extract the C-index??
No that equation is the definition of everything. The calculation of estimates is by partial liklihood so you need a computer. See the link I gave you. R is opensource freeware. Here is a link that helps you with a manual.that should explain R, just about everything. http://sgpwe.izt.uam.mx/files/users/uami/gma/R_for_dummies.pdf
I'm attaching a recent paper of ours that used a similar approach. Note the Kaplan-Meier plot. You must do one of these to interpret results.. You should have a Biostatistician at your University. That person can explain all of this if the readings I sent aren't helpful enough. If you have further questions, PLEASE ask on of us.
Best of luck, David Booth
• asked a question related to Survival Analysis
Question
Hi, Very new to survival analysis here. I am now trying to correlate the gene expression level with survival and prognosis for patients with lung cancer, and I want to run a cox regression analysis on it. However most of the example I've encountered so far are based on discrete covariate such as sex and I know we can analyze continuous covariate using the coxph function, but I can't see how the actual plot would look like for continuous variable? For instance, for discrete variables you would have the number of regression lines correspond to the number of discrete variables. eg. for gender you'd have two lines on the graph. But what about continuous covariate? Should we first turn the continuous covariate into discrete by assigning quantiles to them? Or else I don't know how to visualize the graph. What are the pros and cons for doing so?
Thanks!
I think that could be of your interest to perform survival analysis with determination of optimal cutpoint on continuous covariate. It does the survical analysis with calculation of hazard ratio etc dividing patients into two groups according to the most significant cutpoint chosen from continuous covariate like gene expression. Check this paper out and corresponding software, maybe it will fit your needs https://www.sciencedirect.com/science/article/pii/S0169260718312252
• asked a question related to Survival Analysis
Question
I am facing a problem when I try to calculate the hr from two different survival curves, here is the problem: in the first plot the experimental group's curve is more close to the placebo group then the second plot, even if the first plot's hr is smaller than the second plot. I wonder what the possible reasons are. Can you guys help me to solve this problem? Thanks.
It's hard to help without seeing the survival curves (Kaplan-Meier plot). Do you have one you can show us? Here's a video if you are not sure what I am talking about: https://www.youtube.com/watch?v=XDdytnv6HYE&list=PL64SCLAD3d1GJJrZ63sJGWALO22ZY9erG&index=27&t=1s
• asked a question related to Survival Analysis
Question
In a cross-sectional studies, if I am collecting different outcomes in terms of complications of a therapy (e.g. FDP, RDP or CD) and take their follow-up complaints and findings, can I utilize Kaplan Meir plots to also predict the survival rate?
No . KM curves compare rates of change; by looking at the units you can see which group is moving the fastest. See:
for details. Best, David Booth
• asked a question related to Survival Analysis
Question
I am using survminer and survival packages in R for survival analysis. For some of the variables I get a significantly large HR value (with p~1). What does such a situation imply in terms of risk groups?
Dear Chakit,
I assume you are writing about the cox proportional hazards model (coxph).
First of all, a large HR with a p value of about 1 is not a statistical significant hazard ratio (HR). It rather means that the standard error of the estimated HR is too large to make any reasonable conclusion about the true effect of your predictor. Please check the number of observed events in your data and read more about 'events per variable' , because I fear that your model might be overfitted that causes ridiculously large HR. Please do not forget to check the model assumptions, e.g. by computation Schoenfeld residuals: have a look at the R function cox.zph.
Cheers,
Marcus
• asked a question related to Survival Analysis
Question
Dear Colleagues
Happy holidays
I have done a survival analysis Kaplan meier curves with at risk table reported
I have attached the at risk table to this message
However, when I calculate the number of events at 5 years period,
I find that number of events in the no-induction group is (38 out of 740 patient)
And in the induction group is 110 out of 2860 patients.
My questions are:
-Why there is a discrepancy between number of events in each group and the at risk table??
-How can I calculate the 5 years incidence in each group? Is there a way on stata to calculate this?
Can I just divide the number of events in 5 years by the total number of patients in the group to get the incidence?
N.B there is no loss to follow up in any of the groups
Thank you very much for your support
Looking forward to hearing back from you
The risk set reduces at any time when either event happened or the individual censored. According to your risk table and reported data, it looks like that along with happening of event, risk set is also decreasing. Reason for censoring the risk set is not only the lost to follow up. It may that your study has completed but the event not happened to all so some of the individual with varying follow-up, could not experience the event. For example your study completed in Jan 2020, the person who recruited in Jan 2018, Jan 2019 and Aug 2019 who have not experience any event has follow-up of 2 year, 1 year and 0.5 year without any event, i.e., censoring. It is the reason of discrepancy between risk table and events.
• asked a question related to Survival Analysis
Question
Can anyone help with the procedure and steps to be maintained in doing a survival
analysis of two different regimen of drugs ?
You should have data with at least three variables (time, event, drug). Plot and kaplan meier survival curve on the basis of drug. Use long rank test to assess effectiveness of drug. If you have other covariates, use cox PH model after testing for proportionality.
• asked a question related to Survival Analysis
Question
I am currently working on my final year undergraduate research project on " A comparison between Cox Proportional Hazard and Accelerated Failure Time Models: an application to Beta-Thalassemia " and would like to ask are there any datasets suitable for the analysis?
Hi,
Finding free dataset specifically for beta thalassemia, is really hard. Anyway, I am attaching a file. You will find there a dataset, that can be used for practicing survival analysis. I hope it will be fruitful for you.
• asked a question related to Survival Analysis
Question
Hello Everyone,
I want to use PH Cox regression based on 24 covariate, but found it is not possible that all the 24 covariates meet PH assumption [actually 10 violated]. I think it is common that PH could be violated if lots of covariates are included. But when I read relative papers, some of them just mentioned that the interest variable didn't violate PH assumption and PH cox was taken into use.
So my questions are:
1. Could we still use standard PH Cox if interest exposure variable doesn't violate PH assumption while lots of other covariates do?
2. For question 1, I don't think so, and I choose weighted Cox (too many covariates violated PH, stratified Cox is not feasible) and accelerated failure time (AFT), and try further to compare these two. Is it the right strategy?
3. For competing risk analysis (cause-specific hazard and Fine Gray model), I think it is still based on PH assumption. Does anyone know what kind of competing risk analysis I could use as for weighted Cox or AFT?
I tried to search for the answers on-line, but results were so limited.
Looking forward to any feedbacks and suggestions from you guys!
Thanks very much.
Jing
David Eugene Booth Thank you David. Your references are really useful!
• asked a question related to Survival Analysis
Question
Hi all,
I've been reading up on parametric survival analysis, especially on accelerated failure time models, and I am having trouble wrapping my head around the family of distributions.
I am aware that there are the exponential/weibull/log-normal/log-logistic distributions. But what I can't seem to find a clear and consistent answer on is which of the following is the one that is actually assumed to follow one of those distributions? Is it the survival time T, the log survival time ln(T), the hazard function h(t), the survival function S(t), or the residuals ε?
Dear Shaun,
Yes, if you have enough data then the empirical density estimates will be very helpful. If you know enough about the process(es) by which T was generated this may be enough on its own. In other cases one might make decisions by comparing various models.
Take care,
• asked a question related to Survival Analysis
Question
We are doing an observational research using real world data, and we plan to compare the results with the trial results. The primary analysis is a survival analysis, and the KM curve has been generated. How can I add the KM curve from the clinical trial to the KM curve plot I generated? I attached an example I found from literature. Thank you!
Yes, of course, Junjie Ma
You need to follow these steps:
1) Extracting raw data from the published KM by using software like WebPlotDigitizer (I used it because it is free and easy to use)
2) Combine your data and extracted data on the same scale to make new KM.
• asked a question related to Survival Analysis
Question
Hello,
I am new to survival analysis and need advice on how to correctly analyze some data (with R).
Design
Along an elevation gradient, we have 5 different sites located at 5 different altitudes (600, 1000, 1400, 1600, 2000 m asl). At each site we have 30 different species (belonging to the Brassicaceae family) coming from 3 different altitudes (high-, mid- and low- elevation species). Each species is represented by 20 individuals (10 maternal lines (5 for each population) x 2 replicates).
The plants were germinated in controlled conditions (about 3 weeks before being moved in field) and moved at the different sites when the predicted environmental temperature had similar values (so at 2000m the plants were placed in August, at 600m in October).
For each individual, survival was checked each week (binary, 0/1 where 1 died). In addition, for another study we also recorded flowering (0/1) and fruiting (0/1). Measurements stopped when the sites were inaccessible due to the snow (sites covered and/or road blocked), and started again when sites were are accessible again (i.e. the measurement time is different for each site depending on the snow cover). At the end of the experiment, numerous individuals are still alive.
Finally, at each site we have hourly temperature measurements.
What we want to test
i) if the interaction site_elevation * species_origin influences mortality (e.g., lowland species have a higher mortality ad higher sites) and ii) if mortality is affected by a specific temperature (and if different temperatures affect different elevation class).
The data file structure Site (factor, 5 levels): "600", "1000", "1400", "1600", "2000"; Species (factor, 30 levels): 30 species; phylogeny is available (.nexus); Elev_class (factor, 3 levels): "high", "mid", "low", is the altitudinal class of the 30 species; Elev_m (numeric): median altitude of the 30 species; ID (factor, 3000 levels): unique id for each individual (e.g., 600_1_1_1); Pop (factor): unique id for each population; Fam (factor): a factor (1, 2, 3, ...) indicates the maternal line for each population, it is not unique (e.g., both in pop1 and in pop2 we can have fam 1); Measurement_Date (date): date on which the binary response was measured, in standard format (e.g., 2019-08-31); Week_field (integer): this is a temporal measure assigns 1 to the first week where the plants have been moved to the respective sites (1, 2, 3 ...); Temp (numeric): a variable that represents the temperature (e.g., median) of the week preceding the measurement (e.g., at survival of week 3 is associated the temperature value between week 2 and week 3); Survival: binary variable 0/1 which describes whether the plant is alive or dead on the respective date / week
Problems
i) I need to correct for phylogeny. This prevents me from using models like Kaplan-Meier. I am currently trying with MCMCglmm. Is there another possibility?
ii) From what I understand I have to model the response (survival) using a longitudinal mixed effect model (since each individual is measured weekly, pseudo-replicate). For example, to check if a temperature affects survival, the model is: survival ~ Elev_class * Temp * Week_field random = phylogeny + Site + us (Week_field):ID family = "threshold" However, not all individuals suffer the event (death) during the time of the experiment and from what I understand they are right censored data. Does this also apply to a logistic model or only for a time-to-event model? If yes how should it be corrected?
iii) In lowlands class I have some species that are annual, so mortality after fruiting is not necessarily linked to environmental factors. I am currently trying to make two models, one including only the data up to fruiting (but keeping all the species), and one with the totality of the measures but excluding the annuals. Is it a good procedure? Is there a better way or a way to have a single model that considers this difference?
iv) The temperature varies with time ( in autumn the temperature decreases with the wintering, while in spring this increases with the time). Is this correlation a problem in the model having both time (longitudinal mixed model) and temperature?
Any other suggestions, ideas and / or references are welcome. Thanks in advance!
Hi Alessio,
I've never gone that much further on survival analysis to solve clearly your problems. For now, I can only recommend these papers:
*How to analyze seed germination data using statistical time­to­event
analysis: non­parametric and semi­parametric methods
James N. McNair, Anusha Sunkara and Daniel Frobish
DOI: 10.1017/S0960258511000547, Published online: 07 February 2012
Cracking the case: Seed traits and phylogeny predict time to
germination in prairie restoration species
Rebecca S. Barak1,2 | Taran M. Lichtenberger1,2 | Alyssa Wellman-Houde3,4 |
Andrea T. Kramer1 | Daniel J. Larkin5
DOI: 10.1002/ece3.4083
*The first one has a really good supplemental material.
I hope it work out for you.
Best regards,
Alex.
• asked a question related to Survival Analysis
Question
Hello there,
for my research I am looking at the effect of radiotherapy for treatment of head and neck cancer. For this I conducted a retrospective study. In quite a lot of the cases there is residual disease which is spotted immediately after completion of the therapy either locally or regionally. This makes for a very steep dive in my survival curves in the way I coded it now.
I now used the date of last radiation as t0 for my survival curves and coded residual disease as a survival of 1 day.
I can not really find good references on how to handle this. Should my t0 be earlier (e.g. date of first visit), should I leave residual disease out of locoregional control analysis alltogether or should I maybe use the date of pathologically positive sampling.
I am using Stata 15 for analysis
Kind regards
Thank you Olga, makes it clearer. Response would be complete response, anything but that I now coded as failure, but I would than have to exclude these cases from further locoregional control analysis
• asked a question related to Survival Analysis
Question