Science topic

Survival Analysis - Science topic

A class of statistical procedures for estimating the survival function (function of time, starting with a population 100% well at a given time and providing the percentage of the population still well at later times). The survival analysis is then used for making inferences about the effects of treatments, prognostic factors, exposures, and other covariates on the function.
Questions related to Survival Analysis
  • asked a question related to Survival Analysis
Question
3 answers
What is the alternative name for hazard ratio when we have success event instead of failure?
Can we use inverse hazards ratio?
Relevant answer
Answer
  • asked a question related to Survival Analysis
Question
2 answers
I encountered an unusual observation while constructing a nomogram using the rms package with the Cox proportional hazards model. Specifically, when Karnofsky Performance Status (KPS) is used as a alone predictor, the nomogram points for KPS decrease from high to low. However, when KPS is combined with other variables in a multivariable model, the points for KPS increase from low to high. Additionally, I've noticed that the total points vary from low to high for all variables, while the 1-year survival probability shifts from high to low.
Could anyone help clarify why this directional shift in points occurs? Are there known factors, such as interactions, scaling differences, or confounding effects, that might explain this pattern?
Relevant answer
Answer
Thank you
  • asked a question related to Survival Analysis
Question
1 answer
Logistic regression can be adapted for survival analysis by modeling grouped event times to estimate parameters similar to those in proportional hazards models. This approach helps when analyzing intervals for event occurrences (Abbott, 1985).
Relevant answer
Answer
In survival analysis, logistic regression is mostly used to model binary outcomes, such as whether an event has occurred by a given period. Additionally, it is modified for competing risks, models, in which many events are possible, and discrete-time survival analysis by calculating the likelihood of an event in time intervals. Furthermore, logistic regression is linked to continuous-time data by variants such as the complementary log-log model, which approximates models like Cox proportional hazards.
  • asked a question related to Survival Analysis
Question
3 answers
For instance if I want to include the variable "Number of IPOs" in the year of event, I can include it for the transactions where the event occurred but what should be the value for the transactions where the event did not occur and therefore are censored observations?
Relevant answer
Answer
When you include a variable like "Number of IPOs" in the year of the event, you can use the actual number of IPOs for transactions where the event occurred. For censored observations (where the event did not occur), you can assign a value of zero or missing for the "Number of IPOs" variable, indicating that no IPO took place for those transactions during the observation period. Here are some research papers and books that might be helpful. "Credit Scoring with Macroeconomic Variables Using Survival Analysis" by Tony Bellotti & Jonathan Crook which explores the application of survival analysis to model default on a large dataset of credit card accounts, incorporating macroeconomic variables as time-varying covariates.
Another one is "Time-Varying Effects in Survival Analysis: A Novel Data-Driven Method for Drift Identification and Variable Selection" by Zakaria Babutsidze, Marco Guerzoni, and Luigi Riso. This paper discusses variable selection and stability over time in survival models with high-dimensional panel data. I have read Survival Analysis: A Self-Learning Text, 3rd Edition by David G. Kleinbaum and Mitchel Klein & An Introduction to Survival Analysis Using Stata, Revised Third Edition by Joseph H. Ibrahim which will also help you with your query.
  • asked a question related to Survival Analysis
Question
6 answers
I did linear regression of X (independent variable) to M (Mediator)
then I used survival regression to fit X to Y (dependent variable)
With these questions:
a. HOW to correctly do a mediation analysis from X to Y through M with survival regression?
b. If the Mediation() function is available, why the results are so weird? ie. ACME and ADE are so large and have negative values.
C. if the negative values are fine, how to explain them? As I know, they might be explained as the suppressing effects.
I'm new to mediation analysis and I'm using mediation() with R. My results are very strange and I'm not sure if they are correct. I haven't found a very detailed mediation analysis on survival regression, any discussion is very welcome and if anyone can give me some hints I would appreciate it!
Here is the code:
# Mediator model
mediator_formula <- paste("scale(", mediator_var, ") ~ ", iv_name, " + ", paste(covariates, collapse = " + "))
model_mediator <- lm(as.formula(mediator_formula), data = data_with_residuals)
lm_sum <- summary(model_mediator)
# dependent model
model_dv_formula <- paste("Surv(time, status) ~ ", iv_name, " + ", "scale(", mediator_var, ")", " + ", paste(covariates, collapse = " + "))
model_dv <- survreg(as.formula(model_dv_formula), data = data_with_residuals)
surv_sum<-summary(model_dv)
# Mediation
mediator_name <- paste("scale(", mediator_var, ")", sep="")
mediation_results <- mediate(model_mediator, model_dv, treat = iv_name, mediator = mediator_name, sims = 500)
------------------------------------------------------------------------------
________________________________________________________________________________
Relevant answer
Answer
Becareful when you are using ratio scale especially for parametric test in any inferetial data analysis. The rule is that the dependent variable must be in ratio scale and must be normally distributed. If the dependent variable is not normally distributed, then adjust the data using either log transformation or other method.
  • asked a question related to Survival Analysis
Question
1 answer
Validating a psychological therapy involves a process similar to validating assessment tools, but with some differences given the dynamic nature of therapy. Here's a general outline of the steps involved:
  1. Theory and Rationale: Clearly define the theoretical framework underlying the therapy and articulate the rationale for how it is expected to work. This step involves synthesizing existing research and theory to establish the conceptual basis for the therapy.
  2. Manual Development: Develop a treatment manual that outlines the procedures, techniques, and protocols of the therapy. The manual should provide detailed instructions for therapists on how to deliver the intervention consistently.
  3. Pilot Testing: Conduct pilot testing of the therapy with a small sample of participants to assess its feasibility, acceptability, and initial efficacy. This step helps identify any logistical or practical issues with delivering the therapy and informs adjustments to the manual or procedures.
  4. Randomized Controlled Trials (RCTs): Conduct well-designed RCTs to evaluate the efficacy of the therapy compared to control conditions (e.g., waitlist, placebo, alternative therapy). Randomization helps ensure that any observed effects are due to the therapy itself rather than other factors.
  5. Outcome Measures: Select appropriate outcome measures to assess the effects of the therapy on relevant variables (e.g., symptoms, functioning, quality of life). These measures should have established reliability and validity and be sensitive to changes expected from the therapy.
  6. Assessment Points: Determine the timing of assessments to capture changes in outcomes over the course of therapy and follow-up periods. Multiple assessment points allow for the examination of both short-term and long-term effects.
  7. Statistical Analysis: Analyze the data using appropriate statistical methods to compare outcomes between the therapy and control groups. This may involve techniques such as analysis of covariance (ANCOVA), mixed-effects modeling, or survival analysis, depending on the study design and outcome variables.
  8. Clinical Significance: Assess the clinical significance of treatment effects by considering not only statistical significance but also the magnitude of change and its practical relevance for patients' lives.
  9. Mediation and Moderation Analysis: Explore potential mechanisms of change (mediators) and factors that influence treatment outcomes (moderators) through mediation and moderation analyses. Understanding these processes can inform refinements to the therapy and help personalize treatment approaches.
  10. Replication and Extension: Replicate findings in independent samples and settings to establish the generalizability of the therapy's effects. Additionally, conduct studies to examine the effectiveness of the therapy when delivered in real-world clinical settings and by community providers.
  11. Meta-Analysis: Synthesize findings from multiple studies using meta-analysis to provide a comprehensive overview of the therapy's efficacy across diverse populations and contexts.
  12. Dissemination and Implementation: Disseminate the findings through publication in peer-reviewed journals, presentations at conferences, and outreach to clinicians and policymakers. Provide training and support for clinicians interested in implementing the therapy in their practice.
By following these steps, researchers can rigorously evaluate the efficacy of psychological therapies and contribute to the evidence base supporting their use in clinical practice.
To give reference
Singha, R. (2024).How to validate a psychological therapy? Retrieved from https://www.researchgate.net/post/How_to_validate_a_psychological_therapy
Relevant answer
Answer
Thanks, this is a clear and very well laid out set of steps.
I do feel that for non-pharmaceutical studies that removing or minimising the placebo effect may be a disservice to the methodology that is studied.
I believe that the placebo and nocebo effects are integral to understanding treatment outcomes. Some treatments derive their efficacy from enhancing the placebo effect, which is a well-documented and beneficial phenomenon. Conversely, the nocebo effect can exacerbate perceptions of danger and elicit exaggerated responses to perceived threats. The challenge lies in isolating and accurately recording these cognitive influences.
There is a crucial intersection between science and innovation where research should focus on understanding why certain treatments yield positive results, rather than solely aiming to disprove hypotheses through traditional falsifiability methods. Both approaches—proving a treatment's efficacy and attempting to disprove it—can introduce bias.
In my own research, I have found that excluding placebo and nocebo effects might be counterproductive. Instead, we should explore ways to harness and enhance these natural healing phenomena to alleviate chronic pain. Investigating the mechanisms behind treatment efficacy can accelerate the development of effective cures more efficiently than traditional hypothesis testing.
  • asked a question related to Survival Analysis
Question
1 answer
🔬 Exciting Announcement! 🧠 Our upcoming Special Issue: "Artificial Intelligence and Machine Learning approaches for Survival Analysis in Neurological and Neurodegenerative diseases" is now accepting contributions!
🌐 Dive into the intersection of cutting-edge technology and healthcare as we explore the transformative potential of AI and ML in predicting disease progression.
🚀 Key Focus Areas:
  • Comparison & evaluation of ML approaches
  • Integration of diverse data types
  • Advanced methods for handling censoring
  • Strategies for imbalanced datasets
  • Early biomarkers in neurological diseases
🎓 Call for Papers: We invite researchers to contribute novel works in AI and ML methods tailored for Survival Analysis. Share your insights on predicting clinical events, assessing survival probabilities, or predicting risk scores in Neurological and Neurodegenerative diseases.
Relevant answer
Answer
En tant qu'enseignant chercheur en mathématiques je veux me reconvertir dans le domaine informatique spécialement dans l'intelligence artificielle
  • asked a question related to Survival Analysis
Question
3 answers
Hi,
I have to re-create graphics that found at the articles. I did not understand Figure 3. I couldnot find raw data? How they were create the graphics? Can you explain?
Thank you...
Relevant answer
Answer
We can recreate the data If you have the analysis output. All the data recorded in the survival table
  • asked a question related to Survival Analysis
Question
1 answer
From my understanding, the baseline hazard or baseline survival function is unkown because cox regression is semi-parametric model. So why and how can we use it as a prediction model, for example using it to predict the 10 years survival probability.
Relevant answer
Answer
We can. If you use Stata you can make predictive model after cox regression analysis by applying generalized linear model in poisson distribution.
  • asked a question related to Survival Analysis
Question
4 answers
We have 4-8 treatment groups and would want to compare the survival curves of the experiments. The log rank test in Prism seemingly doesn't work for more than two groups. Would it be right to compare them in pairs & plot that significance? Or is there another robust way to do so?
Relevant answer
Answer
This can be achieved with a survival regression model (if you have a reasonable assumption about the hazard function) or Cox proportional hazards regression model (if you don't like to specify a hazard function but can assume that the hazards between the groups are proportional). There are Wald z-tests available to compare hazards between groups as for ANOVA models (which are also regression models under the hood).
Take care about error inflation by multiple testing. If you are testing a family of tests (that is: if you screen for any difference between a couple of groups), then you'll need to correct for that (-> Bonferroni, Holm).
  • asked a question related to Survival Analysis
Question
10 answers
How likely do the Yacuruna in Amazonian Myths Represent Europeans?
1)Europeans generally are more hairy then Native Americans. In myths, the Yacuruna sometimes disguise themselves as hairy people.
2)The Yacuruna come from the sea like European sailors did.
3) Similar to the Yacuruna, MAYBE Europeans have been worshipped as gods by the Amazonian people.
4)When a Yacuruna and an indigenous Amazonian reproduce the child would sometimes become more Yacuruna. Similarly, Castizos(75% European and 25% Native American) are sometimes as privileged as just pure Europeans.
Relevant answer
Answer
Interview is not as straightforward as many folks think. When I have asked "why do you do this" or "tell me about this" questions, I often got responses that simply said "because that is what we do!" or their description of something I have asked about is from an insider perspective and does not address my "why" question. Essentially, I look like an adult with a beard or now white hair, but I was asking a child's question. The answer was what would be said to a child: "because!". It often took 1.5 years to get a volunteered response that was the answer to what I had been asking for 18-24 months.
  • asked a question related to Survival Analysis
Question
3 answers
Hello everyone,
I have performed a Survival Analysis in R. I have 13 patients with 5 events.
If I calculate my survival rate manually, I got 8/13 = 0.615
In my output in R (Screenshot) this value is different (0.598) and I can't get my head around why. Do you have any suggestions?
Thank you.
Relevant answer
Answer
As the risk set drops from 11 to 9 (ie 1 observation has left the risk set without an event), the numbers are correct:
0.598 = [1 - 1/13]. [1 - 1/12] . [1 - 1/11] . [1 - 1/9] . [1 - 1/8]
  • asked a question related to Survival Analysis
Question
2 answers
I am working on a meta-analysis where i have extracted the data directly from KM curves via web plot digitizer to calculate HRs for the studies that reported only KM Curves. One of the study has three curves and web plot digitizer would give me a total of three groups. I was wondering if it is appropriate to combine the data for two of those groups and calculate an overall HR for a meta-analysis? Keeping in view that it is a time-event data and there's censoring too. I tried using the method elaborated by Cochrane but it gave me a really wide confidence Interval.
Anyone having any lead of how to deal with this?
Relevant answer
Answer
In my experience, estimated data (e.g.: data retrieved from KM) will usually give you a "really wide" confidence interval as you say. Personally, I would have computed all three groups as three individual studies for the meta-analysis (in addition to other studies).
Cheers!
  • asked a question related to Survival Analysis
Question
1 answer
I'm planning to use a cox regression in a future study exploring time-to-event or survival analysis comparing a control with an experimental group. I've seen sample size calculated through several packages, but I prefer G*Power and wanted to know if anyone's done this. Any resources would be appreciated.
Relevant answer
Answer
Hello Eden,
No, g*power doesn't have a direct method to estimate N for Cox regression models. However, here are some links that will get you on your way:
1. A previous RGate reply to a similar question, with links to formulae as well as a couple of online sample size calculators:
2. Guidance for degree to which a target covariate is independent of any other covariates in a model, and general sample size estimation:
3. Guidance for minimum events per variable, based on a simulation study performed on a large, real data set:
Good luck with your work.
  • asked a question related to Survival Analysis
Question
3 answers
Dear all !
I'm working on historical data related to events and like to carry out a survival analysis. In the dataset there are typically some right censored data, when the individuals are still living. But there are also individuals with missing birth dates, which means that they are left censored.
While survival analysis with right or left censoring can be carried out in most professional statistic software, I found no solution to include right and left censoring in one survival analyis.
Does anyone have an idea?
Thanks
Relevant answer
Answer
Yes, there are methods to handle both right and left censoring in survival analysis. Censoring occurs when the event of interest (e.g., death, failure) is not observed for some individuals within the study period. Right, censoring refers to cases where the event has not yet occurred by the end of the study, while left censoring occurs when the event occurred before the study started and is only known to have happened within a certain time frame.
In the presence of right and left censoring, you can use a statistical technique called interval-censored survival analysis. This approach takes into account the time intervals within which the event of interest occurred, rather than the precise event times. Interval-censored survival analysis allows for the estimation of survival probabilities and the comparison of survival curves when the event times are only known within certain intervals.
There are several methods available for interval-censored survival analysis, including:
  1. Turnbull's estimator: This nonparametric method estimates the survival probabilities by assuming that the event time lies uniformly within the observed interval for each censored individual.
  2. Parametric models: These models assume a specific distribution for the event times and estimate the parameters using maximum likelihood estimation. Common parametric models for interval-censored data include the Weibull, log-normal, and exponential distributions.
  3. Nonparametric maximum likelihood estimation (NPMLE): This method directly estimates the survival probabilities without making specific distributional assumptions. The NPMLE approach, such as the Kaplan-Meier estimator, is commonly used in interval-censored survival analysis.
  4. Bayesian methods: Bayesian approaches provide a flexible framework for interval-censored survival analysis, allowing for the incorporation of prior information and the estimation of survival probabilities based on posterior distributions.
The choice of method depends on the specific characteristics of your data and the assumptions you are willing to make. It is important to consider the underlying distribution of the event times, the nature of censoring, and the sample size.
Implementing interval-censored survival analysis typically requires specialized software or programming packages that offer appropriate functions or procedures for this type of analysis. Consult the documentation of statistical software packages such as R, SAS, or Stata, as they often provide specific functions for interval-censored survival analysis.
Remember to carefully interpret and report the results, acknowledging the presence of censoring and the methods used to handle it in your analysis.
  • asked a question related to Survival Analysis
Question
3 answers
May board committees reduce the probability of financial distress? A survival analysis on Italian listed companies
Relevant answer
Answer
Many thanks, one of the authors of the paper has replied
  • asked a question related to Survival Analysis
Question
2 answers
If I have to draw consort chart in research on cancer patients, and there are died patients, and I made survival analysis, the last step of consort analysis, I have to write the number of patients analyzed.
Do I add the died patients in the analysis step of consort chart, as they were included in survial analysis, or I do nt inclue died patients in the analysis step of consort chart,?
Relevant answer
Answer
The CONSORT (Consolidated Standards of Reporting Trials) statement provides a checklist of items to be included in reporting randomized controlled trials (RCTs) to ensure transparency and completeness. While CONSORT does not specifically address the reporting of survival analysis, it does recommend reporting the number of participants who experienced each outcome, including deaths.
To include information on deaths in a CONSORT chart, you could add a row or column to the table to report the number of participants who died in each group. For example, if you are comparing two treatment groups in survival analysis, you could add a row to the table to report the number of participants who died in each group during the study period. This information could be presented alongside the number of participants who completed the study or withdrew from the study.
It is important to report deaths and other outcomes consistently across all study arms to enable comparisons between groups. If relevant to your study, you may also want to consider including additional information, such as the cause of death or time of death.
When reporting survival analysis results, you should also report relevant summary measures, such as survival curves, hazard ratios, and confidence intervals, and provide details of the statistical methods used for the analysis.
In summary, to include information on deaths in a CONSORT chart, you can add a row or column to report the number of participants who died in each group. It is important to report deaths and other outcomes consistently across all study arms and to provide relevant summary measures and details of the statistical methods used.
  • asked a question related to Survival Analysis
Question
2 answers
Hi
I would want to ask if there is a Stata or R package to calculate a "risk horizon" for a binary outcome after a survival analysis.
Basically, like in this article:
Many thanks
Relevant answer
Answer
Hi David. Yes, I am awaiting for a reply. Thanks.
  • asked a question related to Survival Analysis
Question
4 answers
I am working on a small mammal detection-non detection data using dynamic occupancy models. The parameters I have are probabilities of occupancy, detection, colonization and extinction. The same dataset has been used for estimating survival, recruitment and other demographic parameters of species using Capture Mark Recapture models which show a positive influence of rainfall on survival probability. My analysis also show positive influence of rainfall on colonization of the small mammal community which makes a lot of sense. Now I want to connect my results to the previous results but am struggling to find a way or a study that links survival and colonization that I can refer to? It could be confusing because survival analysis is done at individual level and colonization analysis is done at species level. Probably survival would benefit colonization but I'd require an empirical notation or a study that has done both and shows that indeed these both are directly related? Thank you!
Relevant answer
Answer
There are classic sources that should help direct you in your quest for a unitary colonization model. Clearly, colonization cannot occur without survival (and I would say the Allee Effect is more a model of population decline and extirpation/extinction than of population colonization).
The environmental context in which a given possible colonization event is occurring is key. Island Biogeographic Theory would be informative here -- see: MacArthur & Wilson (1967) - The Theory of Island Biogeography; MacArthur (1972) - Geographical Ecology; Brown (1995) - Macroecology. Lots of graphical figures in these sources that model possible survival trajectories and outcomes of population colonizations. But, the context of any given population colonization event/process is pivotal -- this is likely a key reason you are finding it difficult to conceptualize a unitary model for population colonizations. The classic MacArthur & Wilson colonization scenario where a founding population reaches an off-shore island is just one conceptualization (e.g., there would also be need to consider whether the island was located on a continental shelf adjacent to the mainland source region for a colonizing population, or whether it was an oceanic island). Of course, the "island" conceptualization also applies to habitat blocks (e.g., protected areas) in continental regions. The taxon/taxa involved in the colonization(s) you are looking to model would be another significant variable to consider... you mention a small mammal community -- how many mammalian orders are represented? What is the body size range? Such model parameters would undoubtedly be different for mammalian taxa (as opposed to amphibian, reptile, avian, or other taxonomic groups), and could also vary considerably across different mammalian taxa (with body size being a prime limiting factor in colonization events and outcomes).
  • asked a question related to Survival Analysis
Question
3 answers
Hi there,
I want to perform different Machine Learning approaches for survival analysis in cancer.
I have a database of 330 patients. I have some unique evaluated features so I can't use online database to look for other cohorts of patients to use as train or test.
How can I manage this? Should I skip the training cohort? Or can I use my test cohort as training and then generate a new random cohort? How? Is this possible? Is there any protocol or recommendation to follow?
Or should I split my database in 150 and 180 patients for example?
Thanks in advance,
Carlo
Relevant answer
Answer
Train/Test is a method to measure the accuracy of your model. It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. You train the model using the training set.
  • asked a question related to Survival Analysis
Question
3 answers
Hi!
Can someone tell me how to do a Weibull distribution (Survival analysis) on STATA? I want to determine the association between antibody levels and the risk of disease, Similar to what Fabbrini et al (attached). However when I try to do it, my Cases curve is a straight line (attached capture). My data is case-control where the Cases have the disease and the controls don't have the disease.
Any step-by -step suggestions, please?
Thanks
Relevant answer
Answer
This may help!
streg x1 x2, distribution(weibull)
  • asked a question related to Survival Analysis
Question
7 answers
Is someone familiar with a way to plot survival curves for a model created with the coxme function?
(i.e. ggsurvplot but for coxme objects instead of coxph objects)
I am relatively new to survival analysis so correct me if I am misunderstanding something. I know it is more complex because the "frailty" (i.e. the random effect) modifies the baseline hazard function, but is there a way to for example produce a "predicted" survival curve?
Relevant answer
Answer
I have found this paper provides a script to deal with the problem . I am still looking for a simpler solution, please share while I try to figure it out
  • asked a question related to Survival Analysis
Question
4 answers
Hi
I am working on a biomarker problem similar to the PSA for prostate cancer.
PSA is a blood test (continuous numerical variable) that can be used to follow patients with prostate cancer and predict the course of disease. Say someone was treated for prostate cancer and we see that the PSA levels are rising over time, we are worried about a disease recurrence.
I am working on a similar project, but am using multiple such continuous variables (instead of just PSA, I have A1, A2, A3, ... A15). These tests are obtained at multiple timepoints.
I have follow-up data for these patients (time to event - recurrence or censorship), as well as the date of diagnosis and date of each test.
I have used a Cox proportional hazards model with time-varying covariates in which each test acts as a predictor for the time period that follows - until the next test is obtained. However, given the large number of independent variables (A1->15) and the relatively small sample size (around 100 patients), the model is unstable. (if I remove some samples or some variables, I get wildly different results).
That being said, there is evidence in the literature that for instance the A2/A7 ratio correlates with disease recurrence - and I can replicate this in my dataset if I look specifically at the A2/A7 ratio. (It's not a great predictor, but there's a signal).
However, I would like to use all 15 variables (or at least find whether we can add to the A2/A7 ratio). ex. Maybe the (A2+A4)/A7 ratio has superior predictive power?
Obviously, it's not reasonable to try all possible combinations of variables and their interaction terms as the number quickly gets out of hand and will likely result in overfitting.
My questions (I use R):
1. Any thoughts on a more organized/automated approach to feature selection? (R packages that you've tried etc.? - I read about SurvRank but haven't been able to get it to work..)
2. Any thoughts on dimensionality reduction and whether it could be applied to this situation?
3. ((??Any machine learning techniques - ideally in R??))
PS. - I am working on getting a larger sample size (work in progress) - hopefully another ~400 or so patients, so I'm hoping that will help.
Relevant answer
Answer
Welcome to the jungle. Here is a web site that can help about feature selection packages
  • asked a question related to Survival Analysis
Question
4 answers
1- In the attached figure, does B mean we had two failed patients at time 9, and A mean we had one failed patient at time 5? I mean the failed patients in B is more than A?
2- What does "10" mean? 10 = maximum value of the horizontal axis.
Does "10" mean the sum of the follow-up times of all patients were "10"?
Relevant answer
Answer
At the beginning of the study (time = 0), none of the subjects (or study participants) had experienced the event of interest you are considering in the research (that is why the survival probability is 1.0). As time continued to increase, the subjects started experiencing event of interest and the survival began to drop. Points A and B represent drop in the survival of the subjects. Point A is the drop in the survival in the 5th (year/month/ day/ hour etc) while point B is the drop in the survival in the 9th (year/month/ day/ hour etc). 10 represents the follow-up time (10 years/months/days/hours etc) for the study
subjects OR 10 (years/months/days/hours etc) is the study period. Also from the Kaplan Meier curve, the median survival time is approximately 6 years. That is the time at which 50% of the study subjects had experienced event of interest.
  • asked a question related to Survival Analysis
Question
3 answers
I ran a KM survival analysis on a database in SPSS. In the means and medians for survival times, I only get results for the mean. Could this be caused by the fact that in the database the follow-up is only 3 days, reported as event/censored on either day 1, 2 or 3?
Relevant answer
Answer
If survival rate did not reach 50%. Then it can't display the median. Look at the axis.
  • asked a question related to Survival Analysis
Question
1 answer
Please take a look at the attached file.
I irradiated cells using a fractionation regime of 3 x 1 Gy after exposure to a substance in different concentrations.
I made an XY table with the determined SFs and plotted a graph using the LQ-model.
The equation I used was Y=exp(-3*(A*3*X + B*3*X^2)). Its an edition of the provided equation Y=exp(-1*(A*X + B*X^2)) in regard to the fractionation regime.
To determine the AUC I used the standard analyzing tool that Graphpad provided.
Could someone tell me, if this is right or if I mistaken somewhere?
Tank you very much in advance!
Relevant answer
Answer
There are two, very different, ways to model an LQ model. The first assumes that the fractionated curve continues along the single dose curve. The second assumes that there is full recovery from each fraction and therefore the initial curve is repeated from the previous dose SF. The area between these curves is called the "envelope of additivity". See G.G. Steel or Peckham and Steel for more on this addition of survival curves for multifractionated doses. Interestingly, ionizing radiations (with shouldered survival curves) tend to repeat the initial portion of the curve (so-called repair of sublethal damage, but actually split-dose recovery), while some alkalizing agents, such as Bleomycin, have their curve continue along the single-dose curve (no split-dose recovery).
  • asked a question related to Survival Analysis
Question
1 answer
Hi, I am currently conducting a survival study to investigate the role of several potential biomarkers as prognostic factors in certain cancer. First, I perform Kaplan-Meier analysis for all the biomarkers and other relevant clinicopathologic data. However, only one biomarker fulfilled the proportional hazard criteria from the Kaplan-Meier curve. Other biomarkers and clinicopathologic variables do not fulfill the criteria.
I am wondering, do I still need to proceed to Cox Regression analysis? Can I include the other biomarkers and relevant clinicopathologic data in Cox Regression, even though they do not fulfill proportional hazard criteria during Kaplan-Meier analysis? Thank you.
Relevant answer
Answer
Your question does not make sense.run the model that you wish to run.then look at the schoenfeld residuals for lack of pattern. See the attached screenshot reference for full details. Best wishes David Booth
  • asked a question related to Survival Analysis
Question
1 answer
Hi there,
I am doing survival analysis, and I know that some of the variables with a significant impact on survival on univariate analysis are closely related to each other, as they have been calculated from one other. Can I include them all in a Cox PH model to see which of the variables is/are an independent risk factor?
Relevant answer
Answer
It sounds like you think some of your predictors may be confounded. Including multiple predictors is a way for adjusting for confounding between a predictor and the outcome. however, when the predictors themselves are confounded including multiple related predictors may not necessarily highlight THE most important ones.
as a simple example what happens if you include the same predictor twice(maybe with a little noise added in), do they get the same hazard ratio as if you only include it once?
  • asked a question related to Survival Analysis
Question
4 answers
Hi,
I developed probability of default model using cox PH approach. I use survival package in R. I have panel data with time-varying covariates. I made prediction with following code: predict(fit, newdata, type='survival'). However, predicted survival probability is not decreasing over time for each individual (see picture).
I wonder if prediction is marginal survival probability or cumulative survival probability?
If this is cumulative survival probability, why are it not decreasing across time?
Thanks
Relevant answer
Answer
David Eugene Booth I did not know the answer when I asked question. Then I conduct some research and find out the answer. I wrote the correct answer in the comment section, since, if someone else will be interested in such a issue, this answer will help them.
  • asked a question related to Survival Analysis
Question
6 answers
In my study, I am using propensity score matching to balance the effects of covariates on the impact of prednisolone on death outcomes among COVID-19 patients. 92 covariates have been considered, including demographic factors, signs and symptoms, other drugs, laboratory tests, and vital indicators. A problem arises when I remove one of the variables (the ALT). This changes the final results significantly.
How can I ensure whether I should remove that variable?
Relevant answer
Answer
I'm not sure I comprehended your query. This is the article that may be of assistance to you. "Adelson JL, McCoach DB, Rogers HJ, Adelson JA, Sauer TM. Developing and applying the propensity score to make causal inferences: variable selection and stratification. Frontiers in psychology. 2017 Aug 17;8:1413."
  • asked a question related to Survival Analysis
Question
2 answers
Hi,
I have been performing survival analysis using Cox regression models, and I encountered situation when after adding time-varying effect for a variable X (X*time; variable violated the PH assumption), the added interaction with time was significant in the model, but the main effect of variable X was not, as illustarted below:
Model without interaction with time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 0.4633 1.5894 0.1625 2.852 0.004 **
Model with interaction between X and time:
coef exp(coef) se(coef) z Pr(>|z|)
factor(X)1 -0.3978 0.6718 0.4444 -0.895 0.371
tt(factor(X)) 0.6230 1.8645 0.2816 2.212 0.027 *
In the study we are interested in the effect of the X variable on the survival outcome, and after inclusion of the time-varying effect X*time, I am no longer sure about the value of the variable X in describing the risk of the outcome, as the main effect is now not significant.
Is the significance of time-varying effect of variable X enough to assume that the variable X is significant for the outcome risk, even though the main effect is no longer significant in such scenario?
Or, do both of them, the main effect of X and the time-varying effect of X have to be significant in the model to be able to say that X is significant for the outcome?
Any help in interpreting these is very welcome.
Relevant answer
Answer
Thanks David,
I ran a Kaplan-Meier: the two lines cross early in the study, and only then we can see a quite ood separation. If I understand well, perhaps it isn't surprising that the variable has a time-varying effect (increasing with time).
I also inspected the proportional hazard (PH) assumption for this variable using the cox.zph function in survival R package - the red horizontal line represents the averaged coefficient (beta) as in the Cox model (model that operates under assumption of PH), while the thin, black line is the "real" beta for the variable. The test for proportional hazard for variable X did not indicate significant departure from PH (p = 0.079, PH is violated when p < 0.05), but the added time-varying effect is significant, which is reflected by the Kaplan-Meier and the plotted coefficient over time.
Regarding the interpretation of significance of the terms in the model, I received some advise from a biostatistician: in a model where a time-varying effect is included as an interaction term (as opposed to splitting the time and calculating HR for time intervals), the main effect represents the HR where the interaction term is equal to 0 (if it's a simple X*time interaction, the inyteraction is equal 0 when time=0; it might differ in situations where there is a time transformation, for example log(time), etc.). Bottom line, even if the main effect isn't significant in the model, the (time-varying) effect of the variable is still interesting.
  • asked a question related to Survival Analysis
Question
11 answers
I found in statistical books that to verify the linear assumption of a Cox model I need to plot Martingale residuals.
However, I cannot find any explanation about interpretation of the plot!
So, if I plot predicted values versus Martingale residuals what have I to expect if linearity is satisfied?
Thank you in advance... please help me!
Relevant answer
Answer
This has been useful to me too. Thanks Emilio
  • asked a question related to Survival Analysis
Question
2 answers
Hi there, i am using the survival package in R studio with the following code to make a Kaplan-Meier-analysis:
surv_object <- Surv(time = cabos$sg, event = cabos$ï..status)
survfit(Surv(time = cabos$sg, event = cabos$ï..status) ~ 1, data = cabos)
fit1 <- survfit(surv_object ~loc , data = cabos) #p=0.97
ggsurvplot(fit1, data = cabos, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",risk.table.y.text.col = T,
risk.table.y.text = F, xlab= "Months", ylab= "Overall survival",legend.labs =
c("C500", "C501", "C502", "C503","C504", "C505",
"C508", "C509"), title = "Location")
This resulted in a Kaplan-Meier Curve with a p value of 0.97.
Yesterday, I added some labels so i could translate it to english, but the p value changed to 0.87. There were no changes made to the code or the dataset. This concerned me so I ran the statistical analysis in SPSS and obtained the same result as my previous KM curve (p=0.97).
I have tried to run the analysis without the translation, without converting my variables to factors, converting my variables to factors, runing new code from scratch but i can't get the previous value obtained.
What could be the problem with R studio?
Thanks in advance for your time.
Relevant answer
Answer
PS don't you have everything that you started with??????????
  • asked a question related to Survival Analysis
Question
4 answers
I recently aimed to plot a kaplan meier curve with spss however although I have done all the steps according to standards, the software seems to have a problem with ploting the curve and doesnt plot it at all. (Is shown in the picture) Does anybody know how I can solve this issue?
Relevant answer
Answer
I tried it with a different PC and it worked fine. Seems like its a problem with my system.
  • asked a question related to Survival Analysis
Question
11 answers
I'm currently working on survival analysis for gendered effects of executive exit. The aim is to investigate if female leaders have higher chances of turnover compared to their male counterparts. I've ran all of the basic analysis, but am now stuck on an issue of interaction in Cox models.
The issue: I'm trying to find out if any (or all) of the control variables in my Cox model have different effects by gender. For example: In my original models executive age is a control variable, but maybe the hazard of leaving is more related to age for women than for men. To do this, I wanted to run ALL of the control variables with an interaction term of gender. My questions:
1. Should I do this within the same model (e.g. fit1 <- coxph(Surv(Time, Censoring) ~ age:gender + nationality:gender + ....)) or in separate models for each interaction? What makes more sense here?
2.  In both cases, results look something like the attached picture (the variable American measures whether a person is american (1) or not (0))
Table shows coef, exp(coef)=HR, se(coef), z, Pr(>|z|)
How should I interpret this?
Relevant answer
Answer
The model should include the simple effects, as otherwise the interaction is not the difference in the effects. So the model should be like
full = coxph(Surv(Time, Censoring) ~ (age+nationality+...)*gender)
and there you would test the interaction terms via the analysis of deviance for the Cox models with and without the respective interaction term:
restricted = update(full, .~. - age:gender)
anova(restricted, full)
To check if there is a global interaction:
restricted = coxph(Surv(Time, Censoring) ~ age+nationality+...)
anova(restricted, full)
Note that it is important that the relationships between all the predictors and the log hazard rate is linear. If this is not the case, then the interaction will only catch non-linearities and this will lead to a misinterpretation.
  • asked a question related to Survival Analysis
Question
3 answers
Hello Researchers,
I apologize if my question is a bit too simple
I'm currently working on a study concerning patients with myelodysplastic syndromes (MDS)
most patients were followed up for a good period of time (median was about 3 years)
but since MDS isn't that deadly of a disease, most patients (thankfully) lived, so not many events occured which led to a lot of censoring
in this case
is it suitable to do a survival analysis?
and maybe patients also were followed up for a lot of years (10+), but they aren't the majority
in this case, could i cut the data and interpret it for 3 years for example ? like the survival at 3 years was 70%
or how else could i deal with the censoring
(about 30# of patients died in the study)
Relevant answer
Answer
Hi,
Any event occurring decreases the probability of the survival curve. The event is defined by the investigator. A patient improves and gets out of the treatment can be an event, likewise worsening or mortality..We can also compare across different survival curves.
  • asked a question related to Survival Analysis
Question
2 answers
The main independent variable is A, but I want to adjust the model with a covariate B. When I check the PH assumption, A holds but B does not. Do I need to run a Proportional odds or an AFT model for that?
Relevant answer
Answer
Here are some notes you may find useful:
The output was generated using Stata, in case you're wondering.
  • asked a question related to Survival Analysis
Question
6 answers
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
Relevant answer
Answer
The survAUC R package provides a number of ways to compare models link: https://stats.stackexchange.com/questions/181634/how-to-compare-predictive-power-of-survival-models
  • asked a question related to Survival Analysis
Question
3 answers
Good morning,
I am doing survival analysis to predict the risk of specific complication after surgery. We did KM analysis, and the risk was clearly higher immediately after surgery and then flattened after 120-160 days. Using the lognormal model, I tried the parametric survival analysis to consider the model as accelerated failure time. So my questions:
  1. Is it appropriate to use the parametric model in this condition?
  2. How to get the Hazard Ratio from the parametric survival analysis? I was able to get estimates, but no clear HR.
  3. How to interpret the hazard vs. time plot? As a shape, it is very nice looking to tell that the hazard is decreasing over time, but the hazard value on the Y-axis ranged between 0.00015 to 0, I don’t know how to interpret it.
  4. Can I get cut off point on the time where the hazard will change significantly?
Thank you very much,
Amro
Relevant answer
Answer
It sounds like your data demonstrate some feature of piecewise HR, I would suggest running analysis to accommodate this feature.
A very good reference is the book: Survival Analysis A Self-Learning Text
You can check the example on Section 6 (Starting from Page 593 to Page 597) to fit the piecewise HR Cox model, also, you can check the Stanford Heart Transplant Data on Page 265-269. The transplant may play a similar role in your data of surgery.
  • asked a question related to Survival Analysis
Question
3 answers
Hello everyone,
Hope you all are doing well.
I am trying my hand on the Fine-Gray model to calculate subdistribution hazard ratios for cardiovascular death using the NHANES dataset in R. It was not that difficult to calculate cumulative incidence function(CIF), but since it is a nationally representative sample I am having issues accounting for the survey weights and Stratum.
I tried different using the Survival and 'cmprsk' packages but none has a provision to incorporate weights and stratum in the regression model.
Any suggestion will be appreciable.
Thank you
Relevant answer
Answer
Fine-Gray model accounting for clusters apologies this is the one that I meant to send earlier. Best wishes, David Booth
  • asked a question related to Survival Analysis
Question
5 answers
Hi everyone,
This is my first time of attempting to run survival analysis. I have a data on patients with End-stage Kidney Diseases (ESKD). I would want to run survival probability by treatment method (eg conservative therapy vs dialysis), duration of diagnosis etc
Thank you
Relevant answer
Answer
@dorcas before you start your analysis, always make sure KM assumptions are met in your data.
This article is helpful
Best
MB
  • asked a question related to Survival Analysis
Question
4 answers
Let us start to discuss survival analysis uses particularly in medicine and health sciences.
To which extent it will be helpful...
Its application using SPSS
Relevant answer
Answer
Let's see this video first:
  • asked a question related to Survival Analysis
Question
8 answers
I'm using a cox proportional hazards regression to analyze latency data, i.e. time until behavior x occurs. The model I'm running is fitted with the "coxme" package in R (which is a wrapper to the "survival" package if I'm not mistaken) because I wanted to fit a mixed model for the survival analysis. The model converges without problems, however, when I'm trying to test the assumptions of the model I get a fatal error, and my R session crashes. Specifically, when I try to test the "proportional hazard" assumption of the model using the "cox.zph" call, the R session crashes and I don't know why, because the function is supposed to work with both a mixed model (coxme) and a standard, non-mixed, model (which is a "coxph" object in the package terminology). I've tried the non-mixed version of my model and it provides the desired output, but it won't work for my intended model. I've also tried updating RStudio, so I have the latest version, but it didn't help. Finally, I've tried to manually increase the working memory dedicated to RStudio, in case the function was memory demanding, but it didn't help. Looking around at different forums has provided no answers either, both with general search parameters like "causes for R session crash" and more specific, like "cox.zph cause R session crash", but I could not find any help.
Has anyone experienced this error? Were you able to solve it, and if so, how?
I appreciate any advise I can get on this issue.
Relevant answer
Answer
I am not familiar with that package but you may wish to look at a similar process In our attached paper. Best wishes, David Booth
  • asked a question related to Survival Analysis
Question
6 answers
I have encountered a problem that I would like to ask for help: My purpose is to use competing risk (cr) model to train a survival data that contains multiple records within one patient. I tried to use a bunch of R packages and funtions that are able to train mixed-effect cr model with a cluster(ID) (e.g. patient ID) added.
so far the R functions I tried and the results are: (1). FGR() worked fine for fixed-effect only cr model but cannot train a mixed-effect cr model; (2). CSC() is able to train a mixed-effect model but is unable to predict the risk probability using predict(); (3) comp.risk() worked fine for training and predicting a mixed-effect cr model, but it doesn't allow cindex calculation using cindex() or concordance.index().
Now the question is, how can I calculate the cindex for my validation set from the result of comp.risk() and predict(). I've gotten a fitted model and predicted risks in certain times (for instance, 3, 5, 10 year). Do I need to have predicted risks in all possible times in order to calculate the cindex? Is there a better way or simple solution for this?
Thank you very much for you all help.
Relevant answer
  • asked a question related to Survival Analysis
Question
8 answers
Dear research gate members, would you please let me know the specific assumptions and conditions to use these phrases?
Relevant answer
Answer
Kaplan–Meier provides a method for estimating the survival curve, the log rank test provides a statistical comparison of two groups, and Cox's proportional hazards model allows additional covariates to be included. Both of the latter two methods assume that the hazard ratio comparing two groups is constant over time. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065034/
  • asked a question related to Survival Analysis
Question
6 answers
Lifetime data with bathtub-shaped hazard rate function are often encountered in survival analysis and the classical lifetime distributions mostly used cannot adequately describe the data. Is there any parametric survival model suitable for modelling such type of data?
Relevant answer
Answer
OK @Idrasen. Thank you .
  • asked a question related to Survival Analysis
Question
3 answers
Hi!
I'm trying to develop a prediction model for a type of cancer. My end-point is cancer specific survival. I have around 180 events, and an event-per-parameter analysis gave that I can consider about 9-10 parameters to include in the model.
Now, I have dozens of candidate variables, many of which are collinear and some are more truly independent from each other. Some variables are well known risk factors for cancer death, some are more novel but still merit more research.
In my field there is a well-established risk classification system that uses cancer histological grade, stage, lymph node status and tumor size. This classifies cases into 4 risk categories, with increasing risk of death. These four variables have not previously been included in a survival model and there is no published survival regression formula/function with beta-coefficients and intercept for a model with this variables included. Instead, the four risk categories are based on mostly clinical reasoning, expert opinion, experience, and survival rates of the four groups.
My question is if when I'm developing my model if I should include this four variables as separate independent variables, and also add another 4-5 candidate variables that I want to investigate or can/should I include these four variables as a singel composite independent four-tier categorical variable and thus save up degrees of freedom to include more candidate variables? What pros and cons are there with each approach?
Relevant answer
Answer
Hi Take a look at this method. It really works well for your application.. program available from my co-author Ozgur. Best wishes, David Booth
  • asked a question related to Survival Analysis
Question
4 answers
Hi
At the current point, I am wondering about which survival analysis to use for my data. My data is consisting of one control group and three exposure groups. One of the exposure groups is known to have a difference in toxicity over time (in my case, taken into account by making a gradient over distance). Therefore it is necessary to divide the groups into three stages, making 12 survival curves (with eight individuals per stage) per experiment. The data is only right-censored, but the amount of censoring in the different groups is wearing much. That being, the control and one of exposure group has almost no censoring (three of the six stages in the two groups has only censored data, the rest of the stages only one or two deaths) The survival curves between two of the exposure groups are crossing.
Also, many of the survival curves inside the different exposure groups (two of the exposure groups and the control) are crossings, as they do not have a difference in toxicity over time.
What would be a soothing survival analysis for such an experiment? Moreover, how problematic is the crossing of the curves if it's expected?
Relevant answer
Answer
sorry for typo '"h."
  • asked a question related to Survival Analysis
Question
2 answers
I'm trying to work on a meta-analysis of survival data (PFS, OS).
I'm aware that the most common practice for analysis of time-to-event data is hazard ratio,
but some articles provide median survival with chi-square value (for example: χ2 =4.701,P=0.030 as a result of Log-Rank test).
Would there be some way to integrate this into meta-analysis of HR?
Thanks in advance
Relevant answer
Answer
When doing a meta-analysis using survival data, you want to establish the Hazard Ratio, but you also want to give context for how that effect in the model translates into the survival of the examined patients via the risk scores that most survival models give each patient.
I would advise to report the HR for the models that you are analysing with a 95 % confidence interval for each HR and then include the p-values of the log-rank tests of the Kaplan-Meier curves. This gives the reader an idea of the effect of the features on the probability of having an event and how that effect translates then to the whole cohort.
There is this paper I found from the PLOS One journal where the authors try to do a meta-analysis of the reproting styles of differenty Phase 3 medical trials with different endpoints and methods. Maybe it could prove useful to you :
best regards, and hope this helps!
  • asked a question related to Survival Analysis
Question
3 answers
Hello,
I built cox proportional hazard model with time-dependent covariates. when I predict survival probability, it wasn't monotonically decreasing for each ID. what is problem and how to handle it?
thanks
Relevant answer
Answer
As professor David Eugene Booth said, use the plots and check the presumptions of the Cox model about your data.
  • asked a question related to Survival Analysis
Question
2 answers
Hello,
I have a methodological question regarding time-to-event analysis (TTE).
I want to compare several medication therapies (4 groups with different drugs) in a TTE in a cohort study. My colleague wants, that I only include those subjects, who suffered from the event/endpoint. At the time point of suffering, the group assignment should be chosen, means, if a subject suffer from the event I have to look for its current medication and use the start point of that medication to define the person-time until the event. Actually, the aim is to compare the time to event among those medication groups. It is not a drug efficiency study.
To be honest I'm not sure whether the above described approach is methodological correct. In my opinion there will be no censoring as only patients with events would be included and then just compared due to their time to event under certain medication.
I cannot find a lot of information in the literature. All resources I reviewed only addressed the issue of the censoring type but not how to deal with the approach above.
I would be very grateful, if somebody could give advice to this issue.
Thanks a lot,
Florian Beese
Relevant answer
Answer
You're right! I was confused, too and I could convince my collague that it would be an efficacy comparison, indeed. In the meantime, we're decided to handle it another way.
Thank you very much for your response, anyway!
  • asked a question related to Survival Analysis
Question
5 answers
Hi,
During the analysis, I get the following error during the drawing of the survival curve in R. Do you have a similar problem or suggest a solution?
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 232, 0, 464
Package : Survival. Function : ggsurvplot
Relevant answer
Answer
I have faced this error as I forgot to convert the "survival status" variable to numeric. This error happens if we input "survival status" variable as categorical instead of numeric. Thanks for the above discussion.
  • asked a question related to Survival Analysis
Question
2 answers
Event History Analysis
Relevant answer
Answer
It is possible. This paper covers an extension of your question:
  • asked a question related to Survival Analysis
Question
4 answers
I’m conducting a meta analysis for my dissertation and have issues running my data. It utilises median overall survival (months) and does not have usable controls. I’m counteracting that by using a second treatment method fir comparison. I’ve noticed some studies use historic controls, and form hazard ratios from them. Is it possible to treat the secondary treatment as a historic control and form hazard ratios across studies?
Otherwise single arm survival studies are awful to try and run analysis on. (Oncology is a pain).
Relevant answer
Answer
Mathematically, nothing should stop you from doing so. Methodologically and clinically, however, I think that such a move can bring about more questions than answers in your analyses.
First and foremost, you have to justify convincingly that your uncontrolled sample and the historical comparator you are planning to benchmark it against are sufficiently similar in clinical context. However, that may even not be enough in light of unknown/unconsidered confounding.
If it is an option, perhaps a systematic review without meta-analysis, rather than with meta-analysis, may be a safer bet to synthesize the evidence.
  • asked a question related to Survival Analysis
Question
11 answers
I am currently faced with the challenge to measure integration speed of an acquisition or a merger with secondary data. Usually companies do not communicate once a target has been integrated successfully. They only announce for instance that an M&A deal has been signed (this would be the starting date and easy to find online). However, how can I find out when the company was fully integrated? I am focusing on Dutch M&A deals and it seems that Dutch companies do not communicate about it much apart from when a deal has been signed. Sometimes it is possible to find out when a target ceased to exist but this too is quite difficult. I saw in most papers that surveys were used but due to my time constraint it is a too high risk to send out surveys to Dutch corporate M&A teams due to the high possibility of a low response rate. I also saw in one paper that they have used survival analysis. However, I could not make sense in that paper how they specifically calculated integration speed for each deal. If anyone has an idea or knows a paper that has done that with secondary data, I would appreciate any help.
Relevant answer
Answer
There is route how to tackle this problem. You take all public data of merges with known internal data (stock market, profit, ...) and apply complexity measures.
The same will be done for companies without known internal data.
You will try to somehow decode the internal development from just external data using just complexity measure and statistics by comparing the above provided two classes of companies. Alternatively, you can use AI/ML as extension of statistics when it fails.
  • asked a question related to Survival Analysis
Question
3 answers
I am conducting a Meta-analysis where I have data of only survival plots. I am stuck on performing meta-analysis on it because there is no control group for the studies. How should I perform pooled survival analysis on it. Are there any studies that I could use for reference?
Relevant answer
Answer
Whenever you have a common comparator, there are methods to derive a valid group estimate by indirect comparison. You may find useful guidance here:
  • asked a question related to Survival Analysis
Question
9 answers
I have sequentially measured the time-series Electronic Health Record patient dataset. This is 7 years of research and each patient's blood pressure, Hba1c, LDL, Hb, Baseline Age,...were measured (as the independent variable for modeling) every 4 months and at the same time, each patient was classified as healthy/not healthy (output/dependent variable). Before modeling, we assigned the output of each patient only as 0 /1 (0= if the patient is assigned 0 at least once, 1= otherwise). So we have time-series (HBA, fpg, ...) and not time-series measures (race, baseline_age) for independent variables , but only one output (0/1) for each patient. I want to model this data, there are some methods used for such kind of dataset, such as; using baseline values or using the mean value of each independent variable. In addition to these two methods, I want to use time-series analysis, but I am not sure what am I going to use? Looks like a survival analysis, but it is not, since the research didn't end when we see 0 value. You can see the visualization of the data structure below. Thanks for all your responses in advance.
Relevant answer
Answer
Elvis Munyaradzi Ganyaupfu , yes we will use some ML algorithms in our analysis and we are also looking for alternative models. We use both Python and R. Our research question is on the long-term effects of hypoglycemia.
  • asked a question related to Survival Analysis
Question
3 answers
We are trying to find if there is an association between postoperative pulmonary complications (PPCs) and overall survival in a cohort of 417 patients. In Kaplan-Meier there is a significant difference in overall survival between patients with and without PPCs. After testing the proportional hazards assumption in cox regression (both through visual analysis and through log minus log plot) we found that our data failed to meet the assumptions. The way I interpret this, it means that the hazard of dying due to a postoperative pulmonary complication varies over time? I'm trying to figure out how to perform a survival analysis now that I can't use the standard cox regression.
From what I understand I could use time dependent coefficients (the same as variables in this example?) but I don't really understand what is meant by that or how I would do it in SPSS. Does it mean I turn my PPCs variable into a time dependent variable and then run the cox regression analysis the way I would if my data would have met the assumptions or how do I do it?
I would be really thankful for guidance or corrections if I have misunderstood something! I'm a PhD student and I don't have much experience in statistics so explain it to me like I'm 5 (a smart 5-year-old!)
Relevant answer
Answer
Dear Olivia Sand . The attached article provides step-by-step instructions on how to run Cox regression analysis in SPSS and I believe it would be a perfect fit for your needs. Please check it out!
  • asked a question related to Survival Analysis
Question
7 answers
Data with 40 eyes and 20 subjects. Plan to do a survival curve (KM curve). Question is how to cluster the eyes. I tried using the same ID for the same subjects. But the thing is, for few subjects the begin time is "0" i.e. time0=0 and the end time is say for example 2 months (i.e. event happened at month2). While running the below command
stset time, fail(outcome) id(ID)
it excludes the observations for the subjects (both eyes) with same start time and end time. what is the option to include that both eyes with same study time while clustering between eyes?
Relevant answer
Answer
Take a look at these: they may help: file:///C:/Users/user/AppData/Local/Temp/3-31106271.pdf
R for everyone: advanced analytics and graphics | Lander, Jared P | download (b-ok.cc)
Best wishes, David
  • asked a question related to Survival Analysis
Question
4 answers
I would like to know whether it is possible to model competing events while including the type of observation period as a covariate? For example, if I wanted to model competing events in two different tasks, one that lasted 3 mins and the other 5 mins. These are two different observation periods, but can events from both be included in the same analysis?
Relevant answer
Answer
Second-by-second observation should be more than adequate to characterize hazard rate functions. You will still need to have a sufficient number of events of each type to characterize hazard rate functions with any precision, particularly as you add parameters to convey time dependence. Hopefully this is helpful to you.
  • asked a question related to Survival Analysis
Question
4 answers
For a prognostically relevant gene (HR<1 or HR>1 with p<0.05) in terms of survival, is it necessary that the overall survival time and gene expression have a good positive/negative correlation?
We are using TCGA RNA-seq data and clinical information though what we observe is bad (pearson) correlation and/or an insignificant p-value for genes having a significant HR. We have also tried normalising the data and employing spearman correlation.
Relevant answer
Answer
The survival analysis is based on longitudinal time data. the expression of the genes should be correlated to the gene expression. The only thing I wonder about is the misunderstanding between Kaplan-Meier analysis and univariate Cox regression analysis. Are they the same? In fact, the Cox Proportional-Hazards Model is originally a multivariate statistical modeling.
  • asked a question related to Survival Analysis
Question
3 answers
I am using survival analysis for repeated events and I want to see if the effect of the time-dependent covariate differs by group.
Relevant answer
For fixed categorical covariates, such as a group membership indicator, Kaplan-Meier estimates (1958) can be used to display the curves. For time-dependent covariates this method may not be adequate. But, Simon and Makuch (1984) proposed a technique that evaluates the covariate status of the individuals remaining at risk at each event time.
  • asked a question related to Survival Analysis
Question
3 answers
I have data from an experiment where we looked at the time-to-death of pseudoscorpions under three treatment conditions: control, heat, and submersion underwater. Both the control and heat groups were checked daily and those that survived were right-censored. However, the pseudoscorpions in the underwater group were unresponsive while submerged and would only become active again when they were removed from the water and allowed to dry. Therefore, we removed 10 individuals per day and checked how many of the 10 were dead. All individuals removed from the water on a given day were also removed from the study on that day i.e. we did not re-submerge individuals that were observed to be alive after they were removed from the water the first time. The underwater group is therefore left-censored. We have run Kaplan-Meier curves for all three groups and the underwater group has a much steeper curve and non-overlapping 95% confidence intervals compared to the control and heat groups.
Is there a way to further analyze all three groups in one model given that one level of the treatment is left-censored and the other two levels are both right-censored? Can a Cox regression be run on the left-censored group by itself to produce a hazard rate with 95% CI for the rate? I am a biologist so try to make your answer intelligible to a statistical novice.
Relevant answer
Answer
In Stata you can perform a simultaneous right and left censoring.
For Poisson regression:
cpoisson outcome independent1 independent2 independent3, ll(?) ul(?)
For Linear regression:
tobit outcome independent1 independent2 independent3, ll(?) ul(?)
ll --> is the lower limit for left censor (in number)
ul --> is the lower upper for left censor (in number)
For cox regression I have never used, please check:
  • asked a question related to Survival Analysis
Question
4 answers
I fitted a Cox proportional hazard model and checked the proportionality assumption using R's cox.zph function, with the following output:
chisq df p
Var1 0.0324 1 0.857
Var2 0.1972 1 0.657
log(var3) 4.1552 1 0.042
Var4 4.6903 1 0.030
Var5 0.6472 1 0.421
Var6 1.2257 1 0.268
Var7 4.9311 1 0.026
Var8 0.3684 1 0.544
Var9 2.0905 1 0.148
Var10 0.0319 1 0.858
Var11 4.0771 1 0.043
GLOBAL 14.2625 11 0.219
In the study, Variable 1 and 2 are the ones I'm actually interested in, whereas the others are only controls. As you can see, the PH assumptions seems to hold for these two covariates, but most prominently not for Var3. Can I still interpret my findings on Var1 and 2 and talking about Var3 add, that this is displays the average effect over the study-time? Or will my coefficients for Var1 and 2 be biased/incorrect?
Thanks in advance!
Relevant answer
Answer
If what you are interested in is the effect (hazard ratio) for var 1 and var 2 while adjusting for others (covariates) then you are fine. You don't need every covariate to also fulfill the assumption.
  • asked a question related to Survival Analysis
Question
6 answers
I fitted a Cox PH Model and upon examination of the Schoenfeld residuals it became apparent that the proportionality assumption was violated by several variables. Following Box-Steffensmeier & Jones (2004) I included interactions with time for these covariates. However I'm not sure how to interpret this. So its obvious that this indicates time-dependency of the covariate in the sense that the effect inflates/deflates over some function of time, but I work with sociological data and my theory indicates no time-dependency of the effects in whatever direction (Also it would not make sense in any way). So if I get that right I should therefore consider the time-dependency to come from some kind of unobserved heterogeneity? Due to the nature of the data I can also not implement frailty or fixed effects to account for this. So how do I interpret a coefficient that increases/decreases as time progresses given that the theory does not indicate this?
Relevant answer
Answer
Variables with time-varying effects and the Cox model: Some ... In such case, the interpretation of the models is conditional on the length of the survival time, and results should thus be ... https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-10-20
  • asked a question related to Survival Analysis
Question
4 answers
Can anybody guide me regarding minimum required events (recurrence) for a successful recurrence free survival analysis in oral SCC? I'm expecting 30-35 events in my cohort of 108 patients at 3 years. Will it be good enough?
Relevant answer
Answer
The study power depends on the number of events, the total follow-up time in person years, and the ratio between the sizes of the groups being compared.
Simulation studies have tended to the conclusion that you need ten events or more per predictor variable in your model. More recently, bigger and more comprehensive simulation studies have cast doubt on this hard-and-fast rule. Vittinghoff and McCulloch (2007), in a very widely-cited paper, concluded that “problems are fairly frequent with 2–4 events per predictor variable, uncommon with 5–9 events per predictor variable, and still observed with 10–16 events per predictor variable. Cox models appear to be slightly more susceptible than logistic. The worst instances of each problem were not severe with 5–9 events per predictor variable and usually comparable to those with 10–16 events per predictor variable.”
Since then, further simulation studies where prediction models are validated against new datasets tend to confirm that 10 events per variable is a minimum requirement (see Wynants 2015) for logistic regression. These studies are important because they are concerned with the generalisability of findings.
The second factor that will influence sample size is the nature of the study. Where the predictor variables have low prevalence and you intend running a multivariable model with several predictors, then the number of events per variable required for Cox regression is of the order of 20. As you might imagine, increasing the number of predictor variables and decreasing their prevalence both require increases in the number of events per variable.
Based on current research, the sample should have at least 5 events per predictor variable ideally 10. Sample sizes will need to be larger than this if you are performing a multivariate analysis with predictor variables that have low prevalences. In this case, you may require up to 20 events per variable, and should probably read the paper by Ogundimu et al.
  • Courvoisier, D.S. et al., 2011. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. Journal of Clinical Epidemiology, 64(9), pp.993–1000.
  • Kocak M, Onar-Thomas A. A Simulation-Based Evaluation of the Asymptotic Power Formulas for Cox Models in Small Sample Cases. The American Statistician. 2012 Aug 1;66(3):173-9.
  • Ogundimu EO, Altman DG, Collins GS. Adequate sample size for developing prediction models is not simply related to events per variable. Journal of Clinical Epidemiology. Elsevier Inc; 2016 Aug 1;76(C):175–82.
  • Peduzzi, P. et al., 1996. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), pp.1373–1379.
  • asked a question related to Survival Analysis
Question
5 answers
I'm currently working with event history data studying how long after implementation countries abolish certain policies. Regarding the policies I also have an index on how far the countries went with their policies ranging from 0 to 100.
I wanted to control for this, in order be able to control for their point of departure. However the coefficient violates the proportionality assumption.
Can I stratify for the continuous variable of that index? I understand it so, that this would allow every country to have a different baseline hazard with respect to their point of departure. Playing around with the data this didn't produce an error.
Could anyone tell me if I can trust these results or if I have to categorize the variable first?
Relevant answer
Answer
The PH assumption relates to the entire model, including all predictors and covariables deemd interesting or relevant. "Restoring" PH by irgnoring or removing a covariable is not ok, as it likely demonstates that the inclusion of the covariable is relevant because it explains so much of the variance in the data that deviation from the PH assumption becomes clear.
If you have non-PH, you might start investigating if the effects of the predictors/covariables in the model are not linear. If this is not successful, an appropriate partition of the time axis might be the key (early effects are different from late effects, but the hazards are proportional within the early as well as within the late phases). If this also does not help, you might really think of stratification (what, by definition, is possible only for categorical variables). I don't consider it a good idea to categorize a continuous variable just to be able to use it for stratification. But if nothing else works, this might be a last rescue. But I would then check if the violation of the PH assumption really does more harm than the categorization of the continuous variable.
  • asked a question related to Survival Analysis