Science topic
Survival Analysis - Science topic
A class of statistical procedures for estimating the survival function (function of time, starting with a population 100% well at a given time and providing the percentage of the population still well at later times). The survival analysis is then used for making inferences about the effects of treatments, prognostic factors, exposures, and other covariates on the function.
Questions related to Survival Analysis
What is the alternative name for hazard ratio when we have success event instead of failure?
Can we use inverse hazards ratio?
I encountered an unusual observation while constructing a nomogram using the rms package with the Cox proportional hazards model. Specifically, when Karnofsky Performance Status (KPS) is used as a alone predictor, the nomogram points for KPS decrease from high to low. However, when KPS is combined with other variables in a multivariable model, the points for KPS increase from low to high. Additionally, I've noticed that the total points vary from low to high for all variables, while the 1-year survival probability shifts from high to low.
Could anyone help clarify why this directional shift in points occurs? Are there known factors, such as interactions, scaling differences, or confounding effects, that might explain this pattern?
Logistic regression can be adapted for survival analysis by modeling grouped event times to estimate parameters similar to those in proportional hazards models. This approach helps when analyzing intervals for event occurrences (Abbott, 1985).
For instance if I want to include the variable "Number of IPOs" in the year of event, I can include it for the transactions where the event occurred but what should be the value for the transactions where the event did not occur and therefore are censored observations?
I did linear regression of X (independent variable) to M (Mediator)
then I used survival regression to fit X to Y (dependent variable)
With these questions:
a. HOW to correctly do a mediation analysis from X to Y through M with survival regression?
b. If the Mediation() function is available, why the results are so weird? ie. ACME and ADE are so large and have negative values.
C. if the negative values are fine, how to explain them? As I know, they might be explained as the suppressing effects.
I'm new to mediation analysis and I'm using mediation() with R. My results are very strange and I'm not sure if they are correct. I haven't found a very detailed mediation analysis on survival regression, any discussion is very welcome and if anyone can give me some hints I would appreciate it!
Here is the code:
# Mediator model
mediator_formula <- paste("scale(", mediator_var, ") ~ ", iv_name, " + ", paste(covariates, collapse = " + "))
model_mediator <- lm(as.formula(mediator_formula), data = data_with_residuals)
lm_sum <- summary(model_mediator)
# dependent model
model_dv_formula <- paste("Surv(time, status) ~ ", iv_name, " + ", "scale(", mediator_var, ")", " + ", paste(covariates, collapse = " + "))
model_dv <- survreg(as.formula(model_dv_formula), data = data_with_residuals)
surv_sum<-summary(model_dv)
# Mediation
mediator_name <- paste("scale(", mediator_var, ")", sep="")
mediation_results <- mediate(model_mediator, model_dv, treat = iv_name, mediator = mediator_name, sims = 500)
------------------------------------------------------------------------------
________________________________________________________________________________
Validating a psychological therapy involves a process similar to validating assessment tools, but with some differences given the dynamic nature of therapy. Here's a general outline of the steps involved:
- Theory and Rationale: Clearly define the theoretical framework underlying the therapy and articulate the rationale for how it is expected to work. This step involves synthesizing existing research and theory to establish the conceptual basis for the therapy.
- Manual Development: Develop a treatment manual that outlines the procedures, techniques, and protocols of the therapy. The manual should provide detailed instructions for therapists on how to deliver the intervention consistently.
- Pilot Testing: Conduct pilot testing of the therapy with a small sample of participants to assess its feasibility, acceptability, and initial efficacy. This step helps identify any logistical or practical issues with delivering the therapy and informs adjustments to the manual or procedures.
- Randomized Controlled Trials (RCTs): Conduct well-designed RCTs to evaluate the efficacy of the therapy compared to control conditions (e.g., waitlist, placebo, alternative therapy). Randomization helps ensure that any observed effects are due to the therapy itself rather than other factors.
- Outcome Measures: Select appropriate outcome measures to assess the effects of the therapy on relevant variables (e.g., symptoms, functioning, quality of life). These measures should have established reliability and validity and be sensitive to changes expected from the therapy.
- Assessment Points: Determine the timing of assessments to capture changes in outcomes over the course of therapy and follow-up periods. Multiple assessment points allow for the examination of both short-term and long-term effects.
- Statistical Analysis: Analyze the data using appropriate statistical methods to compare outcomes between the therapy and control groups. This may involve techniques such as analysis of covariance (ANCOVA), mixed-effects modeling, or survival analysis, depending on the study design and outcome variables.
- Clinical Significance: Assess the clinical significance of treatment effects by considering not only statistical significance but also the magnitude of change and its practical relevance for patients' lives.
- Mediation and Moderation Analysis: Explore potential mechanisms of change (mediators) and factors that influence treatment outcomes (moderators) through mediation and moderation analyses. Understanding these processes can inform refinements to the therapy and help personalize treatment approaches.
- Replication and Extension: Replicate findings in independent samples and settings to establish the generalizability of the therapy's effects. Additionally, conduct studies to examine the effectiveness of the therapy when delivered in real-world clinical settings and by community providers.
- Meta-Analysis: Synthesize findings from multiple studies using meta-analysis to provide a comprehensive overview of the therapy's efficacy across diverse populations and contexts.
- Dissemination and Implementation: Disseminate the findings through publication in peer-reviewed journals, presentations at conferences, and outreach to clinicians and policymakers. Provide training and support for clinicians interested in implementing the therapy in their practice.
By following these steps, researchers can rigorously evaluate the efficacy of psychological therapies and contribute to the evidence base supporting their use in clinical practice.
To give reference
Singha, R. (2024).How to validate a psychological therapy? Retrieved from https://www.researchgate.net/post/How_to_validate_a_psychological_therapy
🔬 Exciting Announcement! 🧠 Our upcoming Special Issue: "Artificial Intelligence and Machine Learning approaches for Survival Analysis in Neurological and Neurodegenerative diseases" is now accepting contributions!
🌐 Dive into the intersection of cutting-edge technology and healthcare as we explore the transformative potential of AI and ML in predicting disease progression.
📚 Explore the full scope of this Research Topic here https://www.frontiersin.org/research-topics/62589/artificial-intelligence-and-machine-learning-approaches-for-survival-analysis-in-neurological-and-neurodegenerative-diseases
🚀 Key Focus Areas:
- Comparison & evaluation of ML approaches
- Integration of diverse data types
- Advanced methods for handling censoring
- Strategies for imbalanced datasets
- Early biomarkers in neurological diseases
🎓 Call for Papers: We invite researchers to contribute novel works in AI and ML methods tailored for Survival Analysis. Share your insights on predicting clinical events, assessing survival probabilities, or predicting risk scores in Neurological and Neurodegenerative diseases.
Hi,
I have to re-create graphics that found at the articles. I did not understand Figure 3. I couldnot find raw data? How they were create the graphics? Can you explain?
Thank you...
From my understanding, the baseline hazard or baseline survival function is unkown because cox regression is semi-parametric model. So why and how can we use it as a prediction model, for example using it to predict the 10 years survival probability.
We have 4-8 treatment groups and would want to compare the survival curves of the experiments. The log rank test in Prism seemingly doesn't work for more than two groups. Would it be right to compare them in pairs & plot that significance? Or is there another robust way to do so?
How likely do the Yacuruna in Amazonian Myths Represent Europeans?
1)Europeans generally are more hairy then Native Americans. In myths, the Yacuruna sometimes disguise themselves as hairy people.
2)The Yacuruna come from the sea like European sailors did.
3) Similar to the Yacuruna, MAYBE Europeans have been worshipped as gods by the Amazonian people.
4)When a Yacuruna and an indigenous Amazonian reproduce the child would sometimes become more Yacuruna. Similarly, Castizos(75% European and 25% Native American) are sometimes as privileged as just pure Europeans.
Hello everyone,
I have performed a Survival Analysis in R. I have 13 patients with 5 events.
If I calculate my survival rate manually, I got 8/13 = 0.615
In my output in R (Screenshot) this value is different (0.598) and I can't get my head around why. Do you have any suggestions?
Thank you.
I am working on a meta-analysis where i have extracted the data directly from KM curves via web plot digitizer to calculate HRs for the studies that reported only KM Curves. One of the study has three curves and web plot digitizer would give me a total of three groups. I was wondering if it is appropriate to combine the data for two of those groups and calculate an overall HR for a meta-analysis? Keeping in view that it is a time-event data and there's censoring too. I tried using the method elaborated by Cochrane but it gave me a really wide confidence Interval.
Anyone having any lead of how to deal with this?
I'm planning to use a cox regression in a future study exploring time-to-event or survival analysis comparing a control with an experimental group. I've seen sample size calculated through several packages, but I prefer G*Power and wanted to know if anyone's done this. Any resources would be appreciated.
Dear all !
I'm working on historical data related to events and like to carry out a survival analysis. In the dataset there are typically some right censored data, when the individuals are still living. But there are also individuals with missing birth dates, which means that they are left censored.
While survival analysis with right or left censoring can be carried out in most professional statistic software, I found no solution to include right and left censoring in one survival analyis.
Does anyone have an idea?
Thanks
May board committees reduce the probability of financial distress? A survival analysis on Italian listed companies
If I have to draw consort chart in research on cancer patients, and there are died patients, and I made survival analysis, the last step of consort analysis, I have to write the number of patients analyzed.
Do I add the died patients in the analysis step of consort chart, as they were included in survial analysis, or I do nt inclue died patients in the analysis step of consort chart,?
Hi
I would want to ask if there is a Stata or R package to calculate a "risk horizon" for a binary outcome after a survival analysis.
Basically, like in this article:
Many thanks
I am working on a small mammal detection-non detection data using dynamic occupancy models. The parameters I have are probabilities of occupancy, detection, colonization and extinction. The same dataset has been used for estimating survival, recruitment and other demographic parameters of species using Capture Mark Recapture models which show a positive influence of rainfall on survival probability. My analysis also show positive influence of rainfall on colonization of the small mammal community which makes a lot of sense. Now I want to connect my results to the previous results but am struggling to find a way or a study that links survival and colonization that I can refer to? It could be confusing because survival analysis is done at individual level and colonization analysis is done at species level. Probably survival would benefit colonization but I'd require an empirical notation or a study that has done both and shows that indeed these both are directly related? Thank you!
Hi there,
I want to perform different Machine Learning approaches for survival analysis in cancer.
I have a database of 330 patients. I have some unique evaluated features so I can't use online database to look for other cohorts of patients to use as train or test.
How can I manage this? Should I skip the training cohort? Or can I use my test cohort as training and then generate a new random cohort? How? Is this possible? Is there any protocol or recommendation to follow?
Or should I split my database in 150 and 180 patients for example?
Thanks in advance,
Carlo
Hi!
Can someone tell me how to do a Weibull distribution (Survival analysis) on STATA? I want to determine the association between antibody levels and the risk of disease, Similar to what Fabbrini et al (attached). However when I try to do it, my Cases curve is a straight line (attached capture). My data is case-control where the Cases have the disease and the controls don't have the disease.
Any step-by -step suggestions, please?
Thanks
Is someone familiar with a way to plot survival curves for a model created with the coxme function?
(i.e. ggsurvplot but for coxme objects instead of coxph objects)
I am relatively new to survival analysis so correct me if I am misunderstanding something. I know it is more complex because the "frailty" (i.e. the random effect) modifies the baseline hazard function, but is there a way to for example produce a "predicted" survival curve?
Hi
I am working on a biomarker problem similar to the PSA for prostate cancer.
PSA is a blood test (continuous numerical variable) that can be used to follow patients with prostate cancer and predict the course of disease. Say someone was treated for prostate cancer and we see that the PSA levels are rising over time, we are worried about a disease recurrence.
I am working on a similar project, but am using multiple such continuous variables (instead of just PSA, I have A1, A2, A3, ... A15). These tests are obtained at multiple timepoints.
I have follow-up data for these patients (time to event - recurrence or censorship), as well as the date of diagnosis and date of each test.
I have used a Cox proportional hazards model with time-varying covariates in which each test acts as a predictor for the time period that follows - until the next test is obtained. However, given the large number of independent variables (A1->15) and the relatively small sample size (around 100 patients), the model is unstable. (if I remove some samples or some variables, I get wildly different results).
That being said, there is evidence in the literature that for instance the A2/A7 ratio correlates with disease recurrence - and I can replicate this in my dataset if I look specifically at the A2/A7 ratio. (It's not a great predictor, but there's a signal).
However, I would like to use all 15 variables (or at least find whether we can add to the A2/A7 ratio). ex. Maybe the (A2+A4)/A7 ratio has superior predictive power?
Obviously, it's not reasonable to try all possible combinations of variables and their interaction terms as the number quickly gets out of hand and will likely result in overfitting.
My questions (I use R):
1. Any thoughts on a more organized/automated approach to feature selection? (R packages that you've tried etc.? - I read about SurvRank but haven't been able to get it to work..)
2. Any thoughts on dimensionality reduction and whether it could be applied to this situation?
3. ((??Any machine learning techniques - ideally in R??))
PS. - I am working on getting a larger sample size (work in progress) - hopefully another ~400 or so patients, so I'm hoping that will help.
1- In the attached figure, does B mean we had two failed patients at time 9, and A mean we had one failed patient at time 5? I mean the failed patients in B is more than A?
2- What does "10" mean? 10 = maximum value of the horizontal axis.
Does "10" mean the sum of the follow-up times of all patients were "10"?
I ran a KM survival analysis on a database in SPSS. In the means and medians for survival times, I only get results for the mean. Could this be caused by the fact that in the database the follow-up is only 3 days, reported as event/censored on either day 1, 2 or 3?
Please take a look at the attached file.
I irradiated cells using a fractionation regime of 3 x 1 Gy after exposure to a substance in different concentrations.
I made an XY table with the determined SFs and plotted a graph using the LQ-model.
The equation I used was Y=exp(-3*(A*3*X + B*3*X^2)). Its an edition of the provided equation Y=exp(-1*(A*X + B*X^2)) in regard to the fractionation regime.
To determine the AUC I used the standard analyzing tool that Graphpad provided.
Could someone tell me, if this is right or if I mistaken somewhere?
Tank you very much in advance!
Hi, I am currently conducting a survival study to investigate the role of several potential biomarkers as prognostic factors in certain cancer. First, I perform Kaplan-Meier analysis for all the biomarkers and other relevant clinicopathologic data. However, only one biomarker fulfilled the proportional hazard criteria from the Kaplan-Meier curve. Other biomarkers and clinicopathologic variables do not fulfill the criteria.
I am wondering, do I still need to proceed to Cox Regression analysis? Can I include the other biomarkers and relevant clinicopathologic data in Cox Regression, even though they do not fulfill proportional hazard criteria during Kaplan-Meier analysis? Thank you.
Hi there,
I am doing survival analysis, and I know that some of the variables with a significant impact on survival on univariate analysis are closely related to each other, as they have been calculated from one other. Can I include them all in a Cox PH model to see which of the variables is/are an independent risk factor?
Hi,
I developed probability of default model using cox PH approach. I use survival package in R. I have panel data with time-varying covariates. I made prediction with following code: predict(fit, newdata, type='survival'). However, predicted survival probability is not decreasing over time for each individual (see picture).
I wonder if prediction is marginal survival probability or cumulative survival probability?
If this is cumulative survival probability, why are it not decreasing across time?
Thanks
In my study, I am using propensity score matching to balance the effects of covariates on the impact of prednisolone on death outcomes among COVID-19 patients. 92 covariates have been considered, including demographic factors, signs and symptoms, other drugs, laboratory tests, and vital indicators. A problem arises when I remove one of the variables (the ALT). This changes the final results significantly.
How can I ensure whether I should remove that variable?
Hi,
I have been performing survival analysis using Cox regression models, and I encountered situation when after adding time-varying effect for a variable X (X*time; variable violated the PH assumption), the added interaction with time was significant in the model, but the main effect of variable X was not, as illustarted below:
Model without interaction with time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 0.4633 1.5894 0.1625 2.852 0.004 **
Model with interaction between X and time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 -0.3978 0.6718 0.4444 -0.895 0.371
tt(factor(X)) 0.6230 1.8645 0.2816 2.212 0.027 *
In the study we are interested in the effect of the X variable on the survival outcome, and after inclusion of the time-varying effect X*time, I am no longer sure about the value of the variable X in describing the risk of the outcome, as the main effect is now not significant.
Is the significance of time-varying effect of variable X enough to assume that the variable X is significant for the outcome risk, even though the main effect is no longer significant in such scenario?
Or, do both of them, the main effect of X and the time-varying effect of X have to be significant in the model to be able to say that X is significant for the outcome?
Any help in interpreting these is very welcome.
I found in statistical books that to verify the linear assumption of a Cox model I need to plot Martingale residuals.
However, I cannot find any explanation about interpretation of the plot!
So, if I plot predicted values versus Martingale residuals what have I to expect if linearity is satisfied?
Thank you in advance... please help me!
Hi there, i am using the survival package in R studio with the following code to make a Kaplan-Meier-analysis:
surv_object <- Surv(time = cabos$sg, event = cabos$ï..status)
survfit(Surv(time = cabos$sg, event = cabos$ï..status) ~ 1, data = cabos)
fit1 <- survfit(surv_object ~loc , data = cabos) #p=0.97
ggsurvplot(fit1, data = cabos, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",risk.table.y.text.col = T,
risk.table.y.text = F, xlab= "Months", ylab= "Overall survival",legend.labs =
c("C500", "C501", "C502", "C503","C504", "C505",
"C508", "C509"), title = "Location")
This resulted in a Kaplan-Meier Curve with a p value of 0.97.
Yesterday, I added some labels so i could translate it to english, but the p value changed to 0.87. There were no changes made to the code or the dataset. This concerned me so I ran the statistical analysis in SPSS and obtained the same result as my previous KM curve (p=0.97).
I have tried to run the analysis without the translation, without converting my variables to factors, converting my variables to factors, runing new code from scratch but i can't get the previous value obtained.
What could be the problem with R studio?
Thanks in advance for your time.
I recently aimed to plot a kaplan meier curve with spss however although I have done all the steps according to standards, the software seems to have a problem with ploting the curve and doesnt plot it at all. (Is shown in the picture) Does anybody know how I can solve this issue?
I'm currently working on survival analysis for gendered effects of executive exit. The aim is to investigate if female leaders have higher chances of turnover compared to their male counterparts. I've ran all of the basic analysis, but am now stuck on an issue of interaction in Cox models.
The issue: I'm trying to find out if any (or all) of the control variables in my Cox model have different effects by gender. For example: In my original models executive age is a control variable, but maybe the hazard of leaving is more related to age for women than for men. To do this, I wanted to run ALL of the control variables with an interaction term of gender. My questions:
1. Should I do this within the same model (e.g. fit1 <- coxph(Surv(Time, Censoring) ~ age:gender + nationality:gender + ....)) or in separate models for each interaction? What makes more sense here?
2. In both cases, results look something like the attached picture (the variable American measures whether a person is american (1) or not (0))
Table shows coef, exp(coef)=HR, se(coef), z, Pr(>|z|)
How should I interpret this?
Hello Researchers,
I apologize if my question is a bit too simple
I'm currently working on a study concerning patients with myelodysplastic syndromes (MDS)
most patients were followed up for a good period of time (median was about 3 years)
but since MDS isn't that deadly of a disease, most patients (thankfully) lived, so not many events occured which led to a lot of censoring
in this case
is it suitable to do a survival analysis?
and maybe patients also were followed up for a lot of years (10+), but they aren't the majority
in this case, could i cut the data and interpret it for 3 years for example ? like the survival at 3 years was 70%
or how else could i deal with the censoring
(about 30# of patients died in the study)
The main independent variable is A, but I want to adjust the model with a covariate B. When I check the PH assumption, A holds but B does not. Do I need to run a Proportional odds or an AFT model for that?
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
Good morning,
I am doing survival analysis to predict the risk of specific complication after surgery. We did KM analysis, and the risk was clearly higher immediately after surgery and then flattened after 120-160 days. Using the lognormal model, I tried the parametric survival analysis to consider the model as accelerated failure time. So my questions:
- Is it appropriate to use the parametric model in this condition?
- How to get the Hazard Ratio from the parametric survival analysis? I was able to get estimates, but no clear HR.
- How to interpret the hazard vs. time plot? As a shape, it is very nice looking to tell that the hazard is decreasing over time, but the hazard value on the Y-axis ranged between 0.00015 to 0, I don’t know how to interpret it.
- Can I get cut off point on the time where the hazard will change significantly?
Thank you very much,
Amro
Hello everyone,
Hope you all are doing well.
I am trying my hand on the Fine-Gray model to calculate subdistribution hazard ratios for cardiovascular death using the NHANES dataset in R. It was not that difficult to calculate cumulative incidence function(CIF), but since it is a nationally representative sample I am having issues accounting for the survey weights and Stratum.
I tried different using the Survival and 'cmprsk' packages but none has a provision to incorporate weights and stratum in the regression model.
Any suggestion will be appreciable.
Thank you
Hi everyone,
This is my first time of attempting to run survival analysis. I have a data on patients with End-stage Kidney Diseases (ESKD). I would want to run survival probability by treatment method (eg conservative therapy vs dialysis), duration of diagnosis etc
Thank you
Let us start to discuss survival analysis uses particularly in medicine and health sciences.
To which extent it will be helpful...
Its application using SPSS
I'm using a cox proportional hazards regression to analyze latency data, i.e. time until behavior x occurs. The model I'm running is fitted with the "coxme" package in R (which is a wrapper to the "survival" package if I'm not mistaken) because I wanted to fit a mixed model for the survival analysis. The model converges without problems, however, when I'm trying to test the assumptions of the model I get a fatal error, and my R session crashes. Specifically, when I try to test the "proportional hazard" assumption of the model using the "cox.zph" call, the R session crashes and I don't know why, because the function is supposed to work with both a mixed model (coxme) and a standard, non-mixed, model (which is a "coxph" object in the package terminology). I've tried the non-mixed version of my model and it provides the desired output, but it won't work for my intended model. I've also tried updating RStudio, so I have the latest version, but it didn't help. Finally, I've tried to manually increase the working memory dedicated to RStudio, in case the function was memory demanding, but it didn't help. Looking around at different forums has provided no answers either, both with general search parameters like "causes for R session crash" and more specific, like "cox.zph cause R session crash", but I could not find any help.
Has anyone experienced this error? Were you able to solve it, and if so, how?
I appreciate any advise I can get on this issue.
I have encountered a problem that I would like to ask for help: My purpose is to use competing risk (cr) model to train a survival data that contains multiple records within one patient. I tried to use a bunch of R packages and funtions that are able to train mixed-effect cr model with a cluster(ID) (e.g. patient ID) added.
so far the R functions I tried and the results are: (1). FGR() worked fine for fixed-effect only cr model but cannot train a mixed-effect cr model; (2). CSC() is able to train a mixed-effect model but is unable to predict the risk probability using predict(); (3) comp.risk() worked fine for training and predicting a mixed-effect cr model, but it doesn't allow cindex calculation using cindex() or concordance.index().
Now the question is, how can I calculate the cindex for my validation set from the result of comp.risk() and predict(). I've gotten a fitted model and predicted risks in certain times (for instance, 3, 5, 10 year). Do I need to have predicted risks in all possible times in order to calculate the cindex? Is there a better way or simple solution for this?
Thank you very much for you all help.
Dear research gate members, would you please let me know the specific assumptions and conditions to use these phrases?
Lifetime data with bathtub-shaped hazard rate function are often encountered in survival analysis and the classical lifetime distributions mostly used cannot adequately describe the data. Is there any parametric survival model suitable for modelling such type of data?
Hi!
I'm trying to develop a prediction model for a type of cancer. My end-point is cancer specific survival. I have around 180 events, and an event-per-parameter analysis gave that I can consider about 9-10 parameters to include in the model.
Now, I have dozens of candidate variables, many of which are collinear and some are more truly independent from each other. Some variables are well known risk factors for cancer death, some are more novel but still merit more research.
In my field there is a well-established risk classification system that uses cancer histological grade, stage, lymph node status and tumor size. This classifies cases into 4 risk categories, with increasing risk of death. These four variables have not previously been included in a survival model and there is no published survival regression formula/function with beta-coefficients and intercept for a model with this variables included. Instead, the four risk categories are based on mostly clinical reasoning, expert opinion, experience, and survival rates of the four groups.
My question is if when I'm developing my model if I should include this four variables as separate independent variables, and also add another 4-5 candidate variables that I want to investigate or can/should I include these four variables as a singel composite independent four-tier categorical variable and thus save up degrees of freedom to include more candidate variables? What pros and cons are there with each approach?
Hi
At the current point, I am wondering about which survival analysis to use for my data. My data is consisting of one control group and three exposure groups. One of the exposure groups is known to have a difference in toxicity over time (in my case, taken into account by making a gradient over distance). Therefore it is necessary to divide the groups into three stages, making 12 survival curves (with eight individuals per stage) per experiment.
The data is only right-censored, but the amount of censoring in the different groups is wearing much. That being, the control and one of exposure group has almost no censoring (three of the six stages in the two groups has only censored data, the rest of the stages only one or two deaths)
The survival curves between two of the exposure groups are crossing.
Also, many of the survival curves inside the different exposure groups (two of the exposure groups and the control) are crossings, as they do not have a difference in toxicity over time.
What would be a soothing survival analysis for such an experiment? Moreover, how problematic is the crossing of the curves if it's expected?
I'm trying to work on a meta-analysis of survival data (PFS, OS).
I'm aware that the most common practice for analysis of time-to-event data is hazard ratio,
but some articles provide median survival with chi-square value (for example: χ2 =4.701,P=0.030 as a result of Log-Rank test).
Would there be some way to integrate this into meta-analysis of HR?
Thanks in advance
Hello,
I built cox proportional hazard model with time-dependent covariates. when I predict survival probability, it wasn't monotonically decreasing for each ID. what is problem and how to handle it?
thanks
Hello,
I have a methodological question regarding time-to-event analysis (TTE).
I want to compare several medication therapies (4 groups with different drugs) in a TTE in a cohort study. My colleague wants, that I only include those subjects, who suffered from the event/endpoint. At the time point of suffering, the group assignment should be chosen, means, if a subject suffer from the event I have to look for its current medication and use the start point of that medication to define the person-time until the event. Actually, the aim is to compare the time to event among those medication groups. It is not a drug efficiency study.
To be honest I'm not sure whether the above described approach is methodological correct. In my opinion there will be no censoring as only patients with events would be included and then just compared due to their time to event under certain medication.
I cannot find a lot of information in the literature. All resources I reviewed only addressed the issue of the censoring type but not how to deal with the approach above.
I would be very grateful, if somebody could give advice to this issue.
Thanks a lot,
Florian Beese
Hi,
During the analysis, I get the following error during the drawing of the survival curve in R. Do you have a similar problem or suggest a solution?
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 232, 0, 464
Package : Survival. Function : ggsurvplot
I’m conducting a meta analysis for my dissertation and have issues running my data. It utilises median overall survival (months) and does not have usable controls. I’m counteracting that by using a second treatment method fir comparison. I’ve noticed some studies use historic controls, and form hazard ratios from them. Is it possible to treat the secondary treatment as a historic control and form hazard ratios across studies?
Otherwise single arm survival studies are awful to try and run analysis on. (Oncology is a pain).
I am currently faced with the challenge to measure integration speed of an acquisition or a merger with secondary data. Usually companies do not communicate once a target has been integrated successfully. They only announce for instance that an M&A deal has been signed (this would be the starting date and easy to find online). However, how can I find out when the company was fully integrated? I am focusing on Dutch M&A deals and it seems that Dutch companies do not communicate about it much apart from when a deal has been signed. Sometimes it is possible to find out when a target ceased to exist but this too is quite difficult. I saw in most papers that surveys were used but due to my time constraint it is a too high risk to send out surveys to Dutch corporate M&A teams due to the high possibility of a low response rate. I also saw in one paper that they have used survival analysis. However, I could not make sense in that paper how they specifically calculated integration speed for each deal. If anyone has an idea or knows a paper that has done that with secondary data, I would appreciate any help.
I am conducting a Meta-analysis where I have data of only survival plots. I am stuck on performing meta-analysis on it because there is no control group for the studies. How should I perform pooled survival analysis on it. Are there any studies that I could use for reference?
I have sequentially measured the time-series Electronic Health Record patient dataset. This is 7 years of research and each patient's blood pressure, Hba1c, LDL, Hb, Baseline Age,...were measured (as the independent variable for modeling) every 4 months and at the same time, each patient was classified as healthy/not healthy (output/dependent variable). Before modeling, we assigned the output of each patient only as 0 /1 (0= if the patient is assigned 0 at least once, 1= otherwise). So we have time-series (HBA, fpg, ...) and not time-series measures (race, baseline_age) for independent variables , but only one output (0/1) for each patient. I want to model this data, there are some methods used for such kind of dataset, such as; using baseline values or using the mean value of each independent variable. In addition to these two methods, I want to use time-series analysis, but I am not sure what am I going to use? Looks like a survival analysis, but it is not, since the research didn't end when we see 0 value. You can see the visualization of the data structure below. Thanks for all your responses in advance.
In survival analysis, some data are censored. How do we incorporate it into ANN
We are trying to find if there is an association between postoperative pulmonary complications (PPCs) and overall survival in a cohort of 417 patients. In Kaplan-Meier there is a significant difference in overall survival between patients with and without PPCs. After testing the proportional hazards assumption in cox regression (both through visual analysis and through log minus log plot) we found that our data failed to meet the assumptions. The way I interpret this, it means that the hazard of dying due to a postoperative pulmonary complication varies over time? I'm trying to figure out how to perform a survival analysis now that I can't use the standard cox regression.
From what I understand I could use time dependent coefficients (the same as variables in this example?) but I don't really understand what is meant by that or how I would do it in SPSS. Does it mean I turn my PPCs variable into a time dependent variable and then run the cox regression analysis the way I would if my data would have met the assumptions or how do I do it?
I would be really thankful for guidance or corrections if I have misunderstood something! I'm a PhD student and I don't have much experience in statistics so explain it to me like I'm 5 (a smart 5-year-old!)
Data with 40 eyes and 20 subjects. Plan to do a survival curve (KM curve). Question is how to cluster the eyes. I tried using the same ID for the same subjects. But the thing is, for few subjects the begin time is "0" i.e. time0=0 and the end time is say for example 2 months (i.e. event happened at month2). While running the below command
stset time, fail(outcome) id(ID)
it excludes the observations for the subjects (both eyes) with same start time and end time. what is the option to include that both eyes with same study time while clustering between eyes?
I would like to know whether it is possible to model competing events while including the type of observation period as a covariate? For example, if I wanted to model competing events in two different tasks, one that lasted 3 mins and the other 5 mins. These are two different observation periods, but can events from both be included in the same analysis?
For a prognostically relevant gene (HR<1 or HR>1 with p<0.05) in terms of survival, is it necessary that the overall survival time and gene expression have a good positive/negative correlation?
We are using TCGA RNA-seq data and clinical information though what we observe is bad (pearson) correlation and/or an insignificant p-value for genes having a significant HR. We have also tried normalising the data and employing spearman correlation.
I am using survival analysis for repeated events and I want to see if the effect of the time-dependent covariate differs by group.
I have data from an experiment where we looked at the time-to-death of pseudoscorpions under three treatment conditions: control, heat, and submersion underwater. Both the control and heat groups were checked daily and those that survived were right-censored. However, the pseudoscorpions in the underwater group were unresponsive while submerged and would only become active again when they were removed from the water and allowed to dry. Therefore, we removed 10 individuals per day and checked how many of the 10 were dead. All individuals removed from the water on a given day were also removed from the study on that day i.e. we did not re-submerge individuals that were observed to be alive after they were removed from the water the first time. The underwater group is therefore left-censored. We have run Kaplan-Meier curves for all three groups and the underwater group has a much steeper curve and non-overlapping 95% confidence intervals compared to the control and heat groups.
Is there a way to further analyze all three groups in one model given that one level of the treatment is left-censored and the other two levels are both right-censored? Can a Cox regression be run on the left-censored group by itself to produce a hazard rate with 95% CI for the rate? I am a biologist so try to make your answer intelligible to a statistical novice.
I fitted a Cox proportional hazard model and checked the proportionality assumption using R's cox.zph function, with the following output:
chisq df p
Var1 0.0324 1 0.857
Var2 0.1972 1 0.657
log(var3) 4.1552 1 0.042
Var4 4.6903 1 0.030
Var5 0.6472 1 0.421
Var6 1.2257 1 0.268
Var7 4.9311 1 0.026
Var8 0.3684 1 0.544
Var9 2.0905 1 0.148
Var10 0.0319 1 0.858
Var11 4.0771 1 0.043
GLOBAL 14.2625 11 0.219
In the study, Variable 1 and 2 are the ones I'm actually interested in, whereas the others are only controls. As you can see, the PH assumptions seems to hold for these two covariates, but most prominently not for Var3. Can I still interpret my findings on Var1 and 2 and talking about Var3 add, that this is displays the average effect over the study-time? Or will my coefficients for Var1 and 2 be biased/incorrect?
Thanks in advance!
I fitted a Cox PH Model and upon examination of the Schoenfeld residuals it became apparent that the proportionality assumption was violated by several variables. Following Box-Steffensmeier & Jones (2004) I included interactions with time for these covariates. However I'm not sure how to interpret this. So its obvious that this indicates time-dependency of the covariate in the sense that the effect inflates/deflates over some function of time, but I work with sociological data and my theory indicates no time-dependency of the effects in whatever direction (Also it would not make sense in any way). So if I get that right I should therefore consider the time-dependency to come from some kind of unobserved heterogeneity? Due to the nature of the data I can also not implement frailty or fixed effects to account for this. So how do I interpret a coefficient that increases/decreases as time progresses given that the theory does not indicate this?
Can anybody guide me regarding minimum required events (recurrence) for a successful recurrence free survival analysis in oral SCC? I'm expecting 30-35 events in my cohort of 108 patients at 3 years. Will it be good enough?
I'm currently working with event history data studying how long after implementation countries abolish certain policies. Regarding the policies I also have an index on how far the countries went with their policies ranging from 0 to 100.
I wanted to control for this, in order be able to control for their point of departure. However the coefficient violates the proportionality assumption.
Can I stratify for the continuous variable of that index? I understand it so, that this would allow every country to have a different baseline hazard with respect to their point of departure. Playing around with the data this didn't produce an error.
Could anyone tell me if I can trust these results or if I have to categorize the variable first?