Survival Analysis - Science topic
A class of statistical procedures for estimating the survival function (function of time, starting with a population 100% well at a given time and providing the percentage of the population still well at later times). The survival analysis is then used for making inferences about the effects of treatments, prognostic factors, exposures, and other covariates on the function.
Questions related to Survival Analysis
Please take a look at the attached file.
I irradiated cells using a fractionation regime of 3 x 1 Gy after exposure to a substance in different concentrations.
I made an XY table with the determined SFs and plotted a graph using the LQ-model.
The equation I used was Y=exp(-3*(A*3*X + B*3*X^2)). Its an edition of the provided equation Y=exp(-1*(A*X + B*X^2)) in regard to the fractionation regime.
To determine the AUC I used the standard analyzing tool that Graphpad provided.
Could someone tell me, if this is right or if I mistaken somewhere?
Tank you very much in advance!
Hi, I am currently conducting a survival study to investigate the role of several potential biomarkers as prognostic factors in certain cancer. First, I perform Kaplan-Meier analysis for all the biomarkers and other relevant clinicopathologic data. However, only one biomarker fulfilled the proportional hazard criteria from the Kaplan-Meier curve. Other biomarkers and clinicopathologic variables do not fulfill the criteria.
I am wondering, do I still need to proceed to Cox Regression analysis? Can I include the other biomarkers and relevant clinicopathologic data in Cox Regression, even though they do not fulfill proportional hazard criteria during Kaplan-Meier analysis? Thank you.
I am doing survival analysis, and I know that some of the variables with a significant impact on survival on univariate analysis are closely related to each other, as they have been calculated from one other. Can I include them all in a Cox PH model to see which of the variables is/are an independent risk factor?
I developed probability of default model using cox PH approach. I use survival package in R. I have panel data with time-varying covariates. I made prediction with following code: predict(fit, newdata, type='survival'). However, predicted survival probability is not decreasing over time for each individual (see picture).
I wonder if prediction is marginal survival probability or cumulative survival probability?
If this is cumulative survival probability, why are it not decreasing across time?
In my study, I am using propensity score matching to balance the effects of covariates on the impact of prednisolone on death outcomes among COVID-19 patients. 92 covariates have been considered, including demographic factors, signs and symptoms, other drugs, laboratory tests, and vital indicators. A problem arises when I remove one of the variables (the ALT). This changes the final results significantly.
How can I ensure whether I should remove that variable?
I have been performing survival analysis using Cox regression models, and I encountered situation when after adding time-varying effect for a variable X (X*time; variable violated the PH assumption), the added interaction with time was significant in the model, but the main effect of variable X was not, as illustarted below:
Model without interaction with time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 0.4633 1.5894 0.1625 2.852 0.004 **
Model with interaction between X and time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 -0.3978 0.6718 0.4444 -0.895 0.371
tt(factor(X)) 0.6230 1.8645 0.2816 2.212 0.027 *
In the study we are interested in the effect of the X variable on the survival outcome, and after inclusion of the time-varying effect X*time, I am no longer sure about the value of the variable X in describing the risk of the outcome, as the main effect is now not significant.
Is the significance of time-varying effect of variable X enough to assume that the variable X is significant for the outcome risk, even though the main effect is no longer significant in such scenario?
Or, do both of them, the main effect of X and the time-varying effect of X have to be significant in the model to be able to say that X is significant for the outcome?
Any help in interpreting these is very welcome.
Hi there, i am using the survival package in R studio with the following code to make a Kaplan-Meier-analysis:
surv_object <- Surv(time = cabos$sg, event = cabos$ï..status)
survfit(Surv(time = cabos$sg, event = cabos$ï..status) ~ 1, data = cabos)
fit1 <- survfit(surv_object ~loc , data = cabos) #p=0.97
ggsurvplot(fit1, data = cabos, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",risk.table.y.text.col = T,
risk.table.y.text = F, xlab= "Months", ylab= "Overall survival",legend.labs =
c("C500", "C501", "C502", "C503","C504", "C505",
"C508", "C509"), title = "Location")
This resulted in a Kaplan-Meier Curve with a p value of 0.97.
Yesterday, I added some labels so i could translate it to english, but the p value changed to 0.87. There were no changes made to the code or the dataset. This concerned me so I ran the statistical analysis in SPSS and obtained the same result as my previous KM curve (p=0.97).
I have tried to run the analysis without the translation, without converting my variables to factors, converting my variables to factors, runing new code from scratch but i can't get the previous value obtained.
What could be the problem with R studio?
Thanks in advance for your time.
I recently aimed to plot a kaplan meier curve with spss however although I have done all the steps according to standards, the software seems to have a problem with ploting the curve and doesnt plot it at all. (Is shown in the picture) Does anybody know how I can solve this issue?
I'm currently working on survival analysis for gendered effects of executive exit. The aim is to investigate if female leaders have higher chances of turnover compared to their male counterparts. I've ran all of the basic analysis, but am now stuck on an issue of interaction in Cox models.
The issue: I'm trying to find out if any (or all) of the control variables in my Cox model have different effects by gender. For example: In my original models executive age is a control variable, but maybe the hazard of leaving is more related to age for women than for men. To do this, I wanted to run ALL of the control variables with an interaction term of gender. My questions:
1. Should I do this within the same model (e.g. fit1 <- coxph(Surv(Time, Censoring) ~ age:gender + nationality:gender + ....)) or in separate models for each interaction? What makes more sense here?
2. In both cases, results look something like the attached picture (the variable American measures whether a person is american (1) or not (0))
Table shows coef, exp(coef)=HR, se(coef), z, Pr(>|z|)
How should I interpret this?
I apologize if my question is a bit too simple
I'm currently working on a study concerning patients with myelodysplastic syndromes (MDS)
most patients were followed up for a good period of time (median was about 3 years)
but since MDS isn't that deadly of a disease, most patients (thankfully) lived, so not many events occured which led to a lot of censoring
in this case
is it suitable to do a survival analysis?
and maybe patients also were followed up for a lot of years (10+), but they aren't the majority
in this case, could i cut the data and interpret it for 3 years for example ? like the survival at 3 years was 70%
or how else could i deal with the censoring
(about 30# of patients died in the study)
The main independent variable is A, but I want to adjust the model with a covariate B. When I check the PH assumption, A holds but B does not. Do I need to run a Proportional odds or an AFT model for that?
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
I am doing survival analysis to predict the risk of specific complication after surgery. We did KM analysis, and the risk was clearly higher immediately after surgery and then flattened after 120-160 days. Using the lognormal model, I tried the parametric survival analysis to consider the model as accelerated failure time. So my questions:
- Is it appropriate to use the parametric model in this condition?
- How to get the Hazard Ratio from the parametric survival analysis? I was able to get estimates, but no clear HR.
- How to interpret the hazard vs. time plot? As a shape, it is very nice looking to tell that the hazard is decreasing over time, but the hazard value on the Y-axis ranged between 0.00015 to 0, I don’t know how to interpret it.
- Can I get cut off point on the time where the hazard will change significantly?
Thank you very much,
Hope you all are doing well.
I am trying my hand on the Fine-Gray model to calculate subdistribution hazard ratios for cardiovascular death using the NHANES dataset in R. It was not that difficult to calculate cumulative incidence function(CIF), but since it is a nationally representative sample I am having issues accounting for the survey weights and Stratum.
I tried different using the Survival and 'cmprsk' packages but none has a provision to incorporate weights and stratum in the regression model.
Any suggestion will be appreciable.
This is my first time of attempting to run survival analysis. I have a data on patients with End-stage Kidney Diseases (ESKD). I would want to run survival probability by treatment method (eg conservative therapy vs dialysis), duration of diagnosis etc
I'm using a cox proportional hazards regression to analyze latency data, i.e. time until behavior x occurs. The model I'm running is fitted with the "coxme" package in R (which is a wrapper to the "survival" package if I'm not mistaken) because I wanted to fit a mixed model for the survival analysis. The model converges without problems, however, when I'm trying to test the assumptions of the model I get a fatal error, and my R session crashes. Specifically, when I try to test the "proportional hazard" assumption of the model using the "cox.zph" call, the R session crashes and I don't know why, because the function is supposed to work with both a mixed model (coxme) and a standard, non-mixed, model (which is a "coxph" object in the package terminology). I've tried the non-mixed version of my model and it provides the desired output, but it won't work for my intended model. I've also tried updating RStudio, so I have the latest version, but it didn't help. Finally, I've tried to manually increase the working memory dedicated to RStudio, in case the function was memory demanding, but it didn't help. Looking around at different forums has provided no answers either, both with general search parameters like "causes for R session crash" and more specific, like "cox.zph cause R session crash", but I could not find any help.
Has anyone experienced this error? Were you able to solve it, and if so, how?
I appreciate any advise I can get on this issue.
I have encountered a problem that I would like to ask for help: My purpose is to use competing risk (cr) model to train a survival data that contains multiple records within one patient. I tried to use a bunch of R packages and funtions that are able to train mixed-effect cr model with a cluster(ID) (e.g. patient ID) added.
so far the R functions I tried and the results are: (1). FGR() worked fine for fixed-effect only cr model but cannot train a mixed-effect cr model; (2). CSC() is able to train a mixed-effect model but is unable to predict the risk probability using predict(); (3) comp.risk() worked fine for training and predicting a mixed-effect cr model, but it doesn't allow cindex calculation using cindex() or concordance.index().
Now the question is, how can I calculate the cindex for my validation set from the result of comp.risk() and predict(). I've gotten a fitted model and predicted risks in certain times (for instance, 3, 5, 10 year). Do I need to have predicted risks in all possible times in order to calculate the cindex? Is there a better way or simple solution for this?
Thank you very much for you all help.
Lifetime data with bathtub-shaped hazard rate function are often encountered in survival analysis and the classical lifetime distributions mostly used cannot adequately describe the data. Is there any parametric survival model suitable for modelling such type of data?
I'm trying to develop a prediction model for a type of cancer. My end-point is cancer specific survival. I have around 180 events, and an event-per-parameter analysis gave that I can consider about 9-10 parameters to include in the model.
Now, I have dozens of candidate variables, many of which are collinear and some are more truly independent from each other. Some variables are well known risk factors for cancer death, some are more novel but still merit more research.
In my field there is a well-established risk classification system that uses cancer histological grade, stage, lymph node status and tumor size. This classifies cases into 4 risk categories, with increasing risk of death. These four variables have not previously been included in a survival model and there is no published survival regression formula/function with beta-coefficients and intercept for a model with this variables included. Instead, the four risk categories are based on mostly clinical reasoning, expert opinion, experience, and survival rates of the four groups.
My question is if when I'm developing my model if I should include this four variables as separate independent variables, and also add another 4-5 candidate variables that I want to investigate or can/should I include these four variables as a singel composite independent four-tier categorical variable and thus save up degrees of freedom to include more candidate variables? What pros and cons are there with each approach?
At the current point, I am wondering about which survival analysis to use for my data. My data is consisting of one control group and three exposure groups. One of the exposure groups is known to have a difference in toxicity over time (in my case, taken into account by making a gradient over distance). Therefore it is necessary to divide the groups into three stages, making 12 survival curves (with eight individuals per stage) per experiment. The data is only right-censored, but the amount of censoring in the different groups is wearing much. That being, the control and one of exposure group has almost no censoring (three of the six stages in the two groups has only censored data, the rest of the stages only one or two deaths) The survival curves between two of the exposure groups are crossing.
Also, many of the survival curves inside the different exposure groups (two of the exposure groups and the control) are crossings, as they do not have a difference in toxicity over time.
What would be a soothing survival analysis for such an experiment? Moreover, how problematic is the crossing of the curves if it's expected?
I'm trying to work on a meta-analysis of survival data (PFS, OS).
I'm aware that the most common practice for analysis of time-to-event data is hazard ratio,
but some articles provide median survival with chi-square value (for example: χ2 =4.701，P=0.030 as a result of Log-Rank test).
Would there be some way to integrate this into meta-analysis of HR?
Thanks in advance
I built cox proportional hazard model with time-dependent covariates. when I predict survival probability, it wasn't monotonically decreasing for each ID. what is problem and how to handle it?
I have a methodological question regarding time-to-event analysis (TTE).
I want to compare several medication therapies (4 groups with different drugs) in a TTE in a cohort study. My colleague wants, that I only include those subjects, who suffered from the event/endpoint. At the time point of suffering, the group assignment should be chosen, means, if a subject suffer from the event I have to look for its current medication and use the start point of that medication to define the person-time until the event. Actually, the aim is to compare the time to event among those medication groups. It is not a drug efficiency study.
To be honest I'm not sure whether the above described approach is methodological correct. In my opinion there will be no censoring as only patients with events would be included and then just compared due to their time to event under certain medication.
I cannot find a lot of information in the literature. All resources I reviewed only addressed the issue of the censoring type but not how to deal with the approach above.
I would be very grateful, if somebody could give advice to this issue.
Thanks a lot,
During the analysis, I get the following error during the drawing of the survival curve in R. Do you have a similar problem or suggest a solution?
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 232, 0, 464
Package : Survival. Function : ggsurvplot
I’m conducting a meta analysis for my dissertation and have issues running my data. It utilises median overall survival (months) and does not have usable controls. I’m counteracting that by using a second treatment method fir comparison. I’ve noticed some studies use historic controls, and form hazard ratios from them. Is it possible to treat the secondary treatment as a historic control and form hazard ratios across studies?
Otherwise single arm survival studies are awful to try and run analysis on. (Oncology is a pain).
At first please see the attachment.
This experiment was performed for a period of 7 days (n=100). After 7 days, we found different level of survival rate (%) for each treatment. Now, I want to visualize this result with Cox proportional hazard analysis or Kaplan-meier curve survival analysis in SPSS where we want to report the cumulative hazard (OR with a 95% CI) and the time period assessed for survival as part of the analysis.
How can I analyze this result?
I am currently faced with the challenge to measure integration speed of an acquisition or a merger with secondary data. Usually companies do not communicate once a target has been integrated successfully. They only announce for instance that an M&A deal has been signed (this would be the starting date and easy to find online). However, how can I find out when the company was fully integrated? I am focusing on Dutch M&A deals and it seems that Dutch companies do not communicate about it much apart from when a deal has been signed. Sometimes it is possible to find out when a target ceased to exist but this too is quite difficult. I saw in most papers that surveys were used but due to my time constraint it is a too high risk to send out surveys to Dutch corporate M&A teams due to the high possibility of a low response rate. I also saw in one paper that they have used survival analysis. However, I could not make sense in that paper how they specifically calculated integration speed for each deal. If anyone has an idea or knows a paper that has done that with secondary data, I would appreciate any help.
I am conducting a Meta-analysis where I have data of only survival plots. I am stuck on performing meta-analysis on it because there is no control group for the studies. How should I perform pooled survival analysis on it. Are there any studies that I could use for reference?
I have sequentially measured the time-series Electronic Health Record patient dataset. This is 7 years of research and each patient's blood pressure, Hba1c, LDL, Hb, Baseline Age,...were measured (as the independent variable for modeling) every 4 months and at the same time, each patient was classified as healthy/not healthy (output/dependent variable). Before modeling, we assigned the output of each patient only as 0 /1 (0= if the patient is assigned 0 at least once, 1= otherwise). So we have time-series (HBA, fpg, ...) and not time-series measures (race, baseline_age) for independent variables , but only one output (0/1) for each patient. I want to model this data, there are some methods used for such kind of dataset, such as; using baseline values or using the mean value of each independent variable. In addition to these two methods, I want to use time-series analysis, but I am not sure what am I going to use? Looks like a survival analysis, but it is not, since the research didn't end when we see 0 value. You can see the visualization of the data structure below. Thanks for all your responses in advance.
Is someone familiar with a way to plot survival curves for a model created with the coxme function?
(i.e. ggsurvplot but for coxme objects instead of coxph objects)
I am relatively new to survival analysis so correct me if I am misunderstanding something. I know it is more complex because the "frailty" (i.e. the random effect) modifies the baseline hazard function, but is there a way to for example produce a "predicted" survival curve?
We are trying to find if there is an association between postoperative pulmonary complications (PPCs) and overall survival in a cohort of 417 patients. In Kaplan-Meier there is a significant difference in overall survival between patients with and without PPCs. After testing the proportional hazards assumption in cox regression (both through visual analysis and through log minus log plot) we found that our data failed to meet the assumptions. The way I interpret this, it means that the hazard of dying due to a postoperative pulmonary complication varies over time? I'm trying to figure out how to perform a survival analysis now that I can't use the standard cox regression.
From what I understand I could use time dependent coefficients (the same as variables in this example?) but I don't really understand what is meant by that or how I would do it in SPSS. Does it mean I turn my PPCs variable into a time dependent variable and then run the cox regression analysis the way I would if my data would have met the assumptions or how do I do it?
I would be really thankful for guidance or corrections if I have misunderstood something! I'm a PhD student and I don't have much experience in statistics so explain it to me like I'm 5 (a smart 5-year-old!)
Data with 40 eyes and 20 subjects. Plan to do a survival curve (KM curve). Question is how to cluster the eyes. I tried using the same ID for the same subjects. But the thing is, for few subjects the begin time is "0" i.e. time0=0 and the end time is say for example 2 months (i.e. event happened at month2). While running the below command
stset time, fail(outcome) id(ID)
it excludes the observations for the subjects (both eyes) with same start time and end time. what is the option to include that both eyes with same study time while clustering between eyes?
I would like to know whether it is possible to model competing events while including the type of observation period as a covariate? For example, if I wanted to model competing events in two different tasks, one that lasted 3 mins and the other 5 mins. These are two different observation periods, but can events from both be included in the same analysis?
For a prognostically relevant gene (HR<1 or HR>1 with p<0.05) in terms of survival, is it necessary that the overall survival time and gene expression have a good positive/negative correlation?
We are using TCGA RNA-seq data and clinical information though what we observe is bad (pearson) correlation and/or an insignificant p-value for genes having a significant HR. We have also tried normalising the data and employing spearman correlation.
I am using survival analysis for repeated events and I want to see if the effect of the time-dependent covariate differs by group.
I have data from an experiment where we looked at the time-to-death of pseudoscorpions under three treatment conditions: control, heat, and submersion underwater. Both the control and heat groups were checked daily and those that survived were right-censored. However, the pseudoscorpions in the underwater group were unresponsive while submerged and would only become active again when they were removed from the water and allowed to dry. Therefore, we removed 10 individuals per day and checked how many of the 10 were dead. All individuals removed from the water on a given day were also removed from the study on that day i.e. we did not re-submerge individuals that were observed to be alive after they were removed from the water the first time. The underwater group is therefore left-censored. We have run Kaplan-Meier curves for all three groups and the underwater group has a much steeper curve and non-overlapping 95% confidence intervals compared to the control and heat groups.
Is there a way to further analyze all three groups in one model given that one level of the treatment is left-censored and the other two levels are both right-censored? Can a Cox regression be run on the left-censored group by itself to produce a hazard rate with 95% CI for the rate? I am a biologist so try to make your answer intelligible to a statistical novice.
I fitted a Cox proportional hazard model and checked the proportionality assumption using R's cox.zph function, with the following output:
chisq df p
Var1 0.0324 1 0.857
Var2 0.1972 1 0.657
log(var3) 4.1552 1 0.042
Var4 4.6903 1 0.030
Var5 0.6472 1 0.421
Var6 1.2257 1 0.268
Var7 4.9311 1 0.026
Var8 0.3684 1 0.544
Var9 2.0905 1 0.148
Var10 0.0319 1 0.858
Var11 4.0771 1 0.043
GLOBAL 14.2625 11 0.219
In the study, Variable 1 and 2 are the ones I'm actually interested in, whereas the others are only controls. As you can see, the PH assumptions seems to hold for these two covariates, but most prominently not for Var3. Can I still interpret my findings on Var1 and 2 and talking about Var3 add, that this is displays the average effect over the study-time? Or will my coefficients for Var1 and 2 be biased/incorrect?
Thanks in advance!
I fitted a Cox PH Model and upon examination of the Schoenfeld residuals it became apparent that the proportionality assumption was violated by several variables. Following Box-Steffensmeier & Jones (2004) I included interactions with time for these covariates. However I'm not sure how to interpret this. So its obvious that this indicates time-dependency of the covariate in the sense that the effect inflates/deflates over some function of time, but I work with sociological data and my theory indicates no time-dependency of the effects in whatever direction (Also it would not make sense in any way). So if I get that right I should therefore consider the time-dependency to come from some kind of unobserved heterogeneity? Due to the nature of the data I can also not implement frailty or fixed effects to account for this. So how do I interpret a coefficient that increases/decreases as time progresses given that the theory does not indicate this?
I'm currently working with event history data studying how long after implementation countries abolish certain policies. Regarding the policies I also have an index on how far the countries went with their policies ranging from 0 to 100.
I wanted to control for this, in order be able to control for their point of departure. However the coefficient violates the proportionality assumption.
Can I stratify for the continuous variable of that index? I understand it so, that this would allow every country to have a different baseline hazard with respect to their point of departure. Playing around with the data this didn't produce an error.
Could anyone tell me if I can trust these results or if I have to categorize the variable first?
I want to perform a GEE analysis with repeated-measured factors and survival analysis (like Cox regression but with repeated measured variables) in SPSS. How do I proceed with this dataset? I had transformed into long form format but I can't work properly.
Is multivariable logistic regression instead of Cox proportional hazards model acceptable for survival analysis with 2 year follow-up? Some papers are based on such methodology. I wonder whether this makes sense.
We're working on meta-analysis of studies that provide median OS data. We conducted other meta-analyses using objective response rates (ORR) and used the inverse-variance method. But we're not sure what method to use for OS. I'm comfortable using R on packages like "meta" or "metafor" and I can use a new package if necessary.
Second problem is that some studies did not provide a confidence interval, and I read a systematic review (attached below) and they dealt with this problem by "repeating the analysis 1000 times on a bootstrap sample", I need some explanation on how we can do this.
Thanks a lot!
For instance, I have A treatment modality and I want to analyze overall and progression-free survival rates in 2 groups of patients : young and old ones. These two groups have two corresponding subgroups of patients according to disease-related factor. Is Cox-regression analysis appropriate in this case?
TCGA brings data regarding head and neck cancer...does anyone know if it's possible to analyze subtypes separately such as tongue or oropharynx?
My second question is: is there any simple tutorial for beginners on how to run survival analysis using this dataset? I would appreciate any tips and thoughts.
For example, I have a dataset of HE4 in the form of a numerical variable. and I want to apply it to a survival analysis for ovarian cancer prognosis. How can I find a cut off of HE4 value which is the best for prognosis of survival outcome?
Hello dear professors and colleagues,
I have a data set where I want to applied three methods: Logistic regression , linear regression and survival models, of couse each method focalise on an information part of the data set, my questions are:
- It's coherent to incorporat all in the same study?
- are they complementary methods ?
- After estimation, can we select the best method and if so, which criterion we should use?
I am performing a univariate survival analysis (cox-ph regression) wherein I am trying to find the association of expression of some genes with patient survival in a specific cancer. For some gene 'X' I am getting a very good HR value (~5) but the p-value is insignificant (p=0.19). What does this imply?
I have many similar cases
I've run a cox proportional hazards model for survival analysis in a cohort of pancreatic cancer patients with SPSS v. 25 and I want to
1. compare the accuracy (with the Harrell's C) of my model with classical staging
2. to measure the Cs of my model after bootstrapping
I've tried the macro available in the IBM web site, but it does not work (many errors)
...anyone can help?
I am doing a survival analysis of insects to various sub-zero temperatures where I want to correct mortality in my treatment groups by mortality in my control groups (not exposed to low temperatures). I have corrected mortality in all of my treatment groups to the control using the Henderson-Tilton formula (as opposed to Abbott's because some of my groups have unequal numbers of individuals) where:
Corrected % Mortality = (1-((n in control before treatment*n in treatment group after treatment)/(n in control after treatment*n in treatment group before treatment)))*100
I next want to analyze the data using probit regression, so I need to convert these corrected %s back into counts of binary 1s or 0s. The problem is that most of my "normalized count data" are not whole numbers (eg. in one treatment group I have 0.27 dead and 9.729 alive). The only way I can think of to correct this issue is by rounding to the nearest whole number, so in my example I would have 0 dead and 10 alive. Is this consistent with the standard of how this analysis is done? Am I missing something here? Any advice would be greatly appreciated!!
I'm doing survival analysis using cox regression model. to that end I have 2 different dataset, one for training and one for testing. once I apply cox regression on training dataset I'll have a set of coefficients each related to the corresponding feature and a C-index to report. when I want to test my model in the test dataset, should I use the same coefficients and extract the C-index or should I apply cox regression model again on the test dataset and extract the C-index??
Hi, Very new to survival analysis here. I am now trying to correlate the gene expression level with survival and prognosis for patients with lung cancer, and I want to run a cox regression analysis on it. However most of the example I've encountered so far are based on discrete covariate such as sex and I know we can analyze continuous covariate using the coxph function, but I can't see how the actual plot would look like for continuous variable? For instance, for discrete variables you would have the number of regression lines correspond to the number of discrete variables. eg. for gender you'd have two lines on the graph. But what about continuous covariate? Should we first turn the continuous covariate into discrete by assigning quantiles to them? Or else I don't know how to visualize the graph. What are the pros and cons for doing so?
I am facing a problem when I try to calculate the hr from two different survival curves, here is the problem: in the first plot the experimental group's curve is more close to the placebo group then the second plot, even if the first plot's hr is smaller than the second plot. I wonder what the possible reasons are. Can you guys help me to solve this problem? Thanks.
In a cross-sectional studies, if I am collecting different outcomes in terms of complications of a therapy (e.g. FDP, RDP or CD) and take their follow-up complaints and findings, can I utilize Kaplan Meir plots to also predict the survival rate?
I have done a survival analysis Kaplan meier curves with at risk table reported
I have attached the at risk table to this message
However, when I calculate the number of events at 5 years period,
I find that number of events in the no-induction group is (38 out of 740 patient)
And in the induction group is 110 out of 2860 patients.
My questions are:
-Why there is a discrepancy between number of events in each group and the at risk table??
-How can I calculate the 5 years incidence in each group? Is there a way on stata to calculate this?
Can I just divide the number of events in 5 years by the total number of patients in the group to get the incidence?
N.B there is no loss to follow up in any of the groups
Thank you very much for your support
Looking forward to hearing back from you
Can anyone help with the procedure and steps to be maintained in doing a survival
analysis of two different regimen of drugs ?
I am currently working on my final year undergraduate research project on " A comparison between Cox Proportional Hazard and Accelerated Failure Time Models: an application to Beta-Thalassemia " and would like to ask are there any datasets suitable for the analysis?
I want to use PH Cox regression based on 24 covariate, but found it is not possible that all the 24 covariates meet PH assumption [actually 10 violated]. I think it is common that PH could be violated if lots of covariates are included. But when I read relative papers, some of them just mentioned that the interest variable didn't violate PH assumption and PH cox was taken into use.
So my questions are:
1. Could we still use standard PH Cox if interest exposure variable doesn't violate PH assumption while lots of other covariates do?
2. For question 1, I don't think so, and I choose weighted Cox (too many covariates violated PH, stratified Cox is not feasible) and accelerated failure time (AFT), and try further to compare these two. Is it the right strategy?
3. For competing risk analysis (cause-specific hazard and Fine Gray model), I think it is still based on PH assumption. Does anyone know what kind of competing risk analysis I could use as for weighted Cox or AFT?
I tried to search for the answers on-line, but results were so limited.
Looking forward to any feedbacks and suggestions from you guys!
Thanks very much.
I've been reading up on parametric survival analysis, especially on accelerated failure time models, and I am having trouble wrapping my head around the family of distributions.
I am aware that there are the exponential/weibull/log-normal/log-logistic distributions. But what I can't seem to find a clear and consistent answer on is which of the following is the one that is actually assumed to follow one of those distributions? Is it the survival time T, the log survival time ln(T), the hazard function h(t), the survival function S(t), or the residuals ε?
Thanks in advance.
We are doing an observational research using real world data, and we plan to compare the results with the trial results. The primary analysis is a survival analysis, and the KM curve has been generated. How can I add the KM curve from the clinical trial to the KM curve plot I generated? I attached an example I found from literature. Thank you!
I am new to survival analysis and need advice on how to correctly analyze some data (with R).
Along an elevation gradient, we have 5 different sites located at 5 different altitudes (600, 1000, 1400, 1600, 2000 m asl). At each site we have 30 different species (belonging to the Brassicaceae family) coming from 3 different altitudes (high-, mid- and low- elevation species). Each species is represented by 20 individuals (10 maternal lines (5 for each population) x 2 replicates).
The plants were germinated in controlled conditions (about 3 weeks before being moved in field) and moved at the different sites when the predicted environmental temperature had similar values (so at 2000m the plants were placed in August, at 600m in October).
For each individual, survival was checked each week (binary, 0/1 where 1 died). In addition, for another study we also recorded flowering (0/1) and fruiting (0/1). Measurements stopped when the sites were inaccessible due to the snow (sites covered and/or road blocked), and started again when sites were are accessible again (i.e. the measurement time is different for each site depending on the snow cover). At the end of the experiment, numerous individuals are still alive.
Finally, at each site we have hourly temperature measurements.
What we want to test
i) if the interaction site_elevation * species_origin influences mortality (e.g., lowland species have a higher mortality ad higher sites) and ii) if mortality is affected by a specific temperature (and if different temperatures affect different elevation class).
The data file structure Site (factor, 5 levels): "600", "1000", "1400", "1600", "2000"; Species (factor, 30 levels): 30 species; phylogeny is available (.nexus); Elev_class (factor, 3 levels): "high", "mid", "low", is the altitudinal class of the 30 species; Elev_m (numeric): median altitude of the 30 species; ID (factor, 3000 levels): unique id for each individual (e.g., 600_1_1_1); Pop (factor): unique id for each population; Fam (factor): a factor (1, 2, 3, ...) indicates the maternal line for each population, it is not unique (e.g., both in pop1 and in pop2 we can have fam 1); Measurement_Date (date): date on which the binary response was measured, in standard format (e.g., 2019-08-31); Week_field (integer): this is a temporal measure assigns 1 to the first week where the plants have been moved to the respective sites (1, 2, 3 ...); Temp (numeric): a variable that represents the temperature (e.g., median) of the week preceding the measurement (e.g., at survival of week 3 is associated the temperature value between week 2 and week 3); Survival: binary variable 0/1 which describes whether the plant is alive or dead on the respective date / week
i) I need to correct for phylogeny. This prevents me from using models like Kaplan-Meier. I am currently trying with MCMCglmm. Is there another possibility?
ii) From what I understand I have to model the response (survival) using a longitudinal mixed effect model (since each individual is measured weekly, pseudo-replicate). For example, to check if a temperature affects survival, the model is: survival ~ Elev_class * Temp * Week_field random = phylogeny + Site + us (Week_field):ID family = "threshold" However, not all individuals suffer the event (death) during the time of the experiment and from what I understand they are right censored data. Does this also apply to a logistic model or only for a time-to-event model? If yes how should it be corrected?
iii) In lowlands class I have some species that are annual, so mortality after fruiting is not necessarily linked to environmental factors. I am currently trying to make two models, one including only the data up to fruiting (but keeping all the species), and one with the totality of the measures but excluding the annuals. Is it a good procedure? Is there a better way or a way to have a single model that considers this difference?
iv) The temperature varies with time ( in autumn the temperature decreases with the wintering, while in spring this increases with the time). Is this correlation a problem in the model having both time (longitudinal mixed model) and temperature?
Any other suggestions, ideas and / or references are welcome. Thanks in advance!
for my research I am looking at the effect of radiotherapy for treatment of head and neck cancer. For this I conducted a retrospective study. In quite a lot of the cases there is residual disease which is spotted immediately after completion of the therapy either locally or regionally. This makes for a very steep dive in my survival curves in the way I coded it now.
I now used the date of last radiation as t0 for my survival curves and coded residual disease as a survival of 1 day.
I can not really find good references on how to handle this. Should my t0 be earlier (e.g. date of first visit), should I leave residual disease out of locoregional control analysis alltogether or should I maybe use the date of pathologically positive sampling.
I am using Stata 15 for analysis
I'm trying to do a partitioned survival analysis (PartSA) to model a cohort of patients directly from survival data in the trial and evaluate cost-effectiveness of two different treatments. There are Kaplan-Meier OS and PFS curves published for one of these trials, so I have fitted both curves separately with Weibull distribution and extrapolated them to the time horizon of interest (15 years). However, the extrapolated PFS curve turned out to be higher than extrapolated OS curve, which obviously does not make sense for the analysis.
I'm sure this is happening quite a lot because the fact that correlation between time-to-event outcomes is not considered is one of the major limitations of this model. I was wondering if there any practical solutions on how to troubleshoot this. Is it possible to transform the data in any way to overcome this problem?